You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@vxquery.apache.org by Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk> on 2016/05/07 18:22:01 UTC

Re: JSONiq data model

Hi,

I am attempting to create a doc on the JSONiq data model for objects[1] (It
might be full of errors because I am doing the calculations manually).

This is what I have come up on the data model for objects:

The first byte would have the value tag, followed by the id (4 bytes) of
the object. Then 4 bytes to represent the size of the object. Then another
four bytes to represent the number of key-value pairs. Next few bytes
represent the offsets of keys which follow (each offset is represented by 4
bytes). Ids would be assigned to the keys. Next few bytes would be a sorted
list of ids for keys in alphabetical order. The following bytes would
represent the keys in the object.Each key is a StringPointable followed by
the id of the key. Each object would have a sequence pointable: the
following bytes would be the number of Items (items are the values for
keys) in the sequence. The next bytes would be the offset of each item in
the sequence. The last bytes would be the values for each key followed by
the respective id of the key.

Hope it makes sense.

My problem is,

I have not provided for the white spaces in the object. What can I use to
represent the white spaces? I cannot use a text node because object is not
a node.

[1]
https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0

Thank you.

Yours sincerely,
Riyafa

On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> wrote:

> We have two students working with us this summer through GSOC to complete
> JSONiq specification for arrays and objects. I think the first step is to
> define the data model used by JSONiq. The definition should be defined in
> our wiki [1] before coding starts this summer. The wiki will allow the
> community to discuss the JSON data model implementation in VXQuery.
>
> I updated the JSONiq wiki to help get the documentation started. Please
> fill in the JSON data model based on the examples seen on our website
> (links on the wiki page).
>
> Post here if you have any questions.
>
> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>

-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: riyafa.12@cse.mrt.ac.lk
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Till Westmann <ti...@apache.org>.

On 8 May 2016, at 9:59, Preston Carman wrote:

> On Sat, May 7, 2016 at 11:22 AM, Riyafa Abdul Hameed
> <ri...@cse.mrt.ac.lk> wrote:
>> Hi,
>>
>> I am attempting to create a doc on the JSONiq data model for 
>> objects[1] (It
>> might be full of errors because I am doing the calculations 
>> manually).
>>
>> This is what I have come up on the data model for objects:
>>
>> The first byte would have the value tag, followed by the id (4 bytes) 
>> of
>> the object. Then 4 bytes to represent the size of the object. Then 
>> another
>> four bytes to represent the number of key-value pairs. Next few bytes
>> represent the offsets of keys which follow (each offset is 
>> represented by 4
>> bytes). Ids would be assigned to the keys. Next few bytes would be a 
>> sorted
>> list of ids for keys in alphabetical order. The following bytes would
>> represent the keys in the object.Each key is a StringPointable 
>> followed by
>> the id of the key. Each object would have a sequence pointable: the
>> following bytes would be the number of Items (items are the values 
>> for
>> keys) in the sequence. The next bytes would be the offset of each 
>> item in
>> the sequence. The last bytes would be the values for each key 
>> followed by
>> the respective id of the key.
>>
>> Hope it makes sense.
>>
>> My problem is,
>>
>> I have not provided for the white spaces in the object. What can I 
>> use to
>> represent the white spaces? I cannot use a text node because object 
>> is not
>> a node.
>>
>
> The XML data model defines how to utilize white space. I don't believe
> JSON has the same idea. I think we can print the JSON object in a
> standard fashion and do not need to track white space in the object
> definition.

I agree. The only information available in the data model as specified 
[1]
are the keys and the values, so whitespace is not a problem. Taking one 
step
back, one of the differences between XML and JSON is that
- XML is used to represent human readable documents (often after
   transformation), so line breaks, indentation, and other whitespace 
are
   important while
- JSON is a notation for objects in a programming language.

Cheers,
Till

[1] 
http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880

>>
>> [1]
>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>
>> Thank you.
>>
>> Yours sincerely,
>> Riyafa
>>
>>
>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> 
>> wrote:
>>
>>> We have two students working with us this summer through GSOC to 
>>> complete
>>> JSONiq specification for arrays and objects. I think the first step 
>>> is to
>>> define the data model used by JSONiq. The definition should be 
>>> defined in
>>> our wiki [1] before coding starts this summer. The wiki will allow 
>>> the
>>> community to discuss the JSON data model implementation in VXQuery.
>>>
>>> I updated the JSONiq wiki to help get the documentation started. 
>>> Please
>>> fill in the JSON data model based on the examples seen on our 
>>> website
>>> (links on the wiki page).
>>>
>>> Post here if you have any questions.
>>>
>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>
>>
>>
>>
>> --
>> Riyafa Abdul Hameed
>> Undergraduate, University of Moratuwa
>>
>> Email: riyafa.12@cse.mrt.ac.lk
>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
>> <http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Preston Carman <pr...@apache.org>.

On Sat, May 7, 2016 at 11:22 AM, Riyafa Abdul Hameed
<ri...@cse.mrt.ac.lk> wrote:
> Hi,
>
> I am attempting to create a doc on the JSONiq data model for objects[1] (It
> might be full of errors because I am doing the calculations manually).
>
> This is what I have come up on the data model for objects:
>
> The first byte would have the value tag, followed by the id (4 bytes) of
> the object. Then 4 bytes to represent the size of the object. Then another
> four bytes to represent the number of key-value pairs. Next few bytes
> represent the offsets of keys which follow (each offset is represented by 4
> bytes). Ids would be assigned to the keys. Next few bytes would be a sorted
> list of ids for keys in alphabetical order. The following bytes would
> represent the keys in the object.Each key is a StringPointable followed by
> the id of the key. Each object would have a sequence pointable: the
> following bytes would be the number of Items (items are the values for
> keys) in the sequence. The next bytes would be the offset of each item in
> the sequence. The last bytes would be the values for each key followed by
> the respective id of the key.
>
> Hope it makes sense.
>
> My problem is,
>
> I have not provided for the white spaces in the object. What can I use to
> represent the white spaces? I cannot use a text node because object is not
> a node.
>

The XML data model defines how to utilize white space. I don't believe
JSON has the same idea. I think we can print the JSON object in a
standard fashion and do not need to track white space in the object
definition.

>
> [1]
> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>
> Thank you.
>
> Yours sincerely,
> Riyafa
>
>
> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> wrote:
>
>> We have two students working with us this summer through GSOC to complete
>> JSONiq specification for arrays and objects. I think the first step is to
>> define the data model used by JSONiq. The definition should be defined in
>> our wiki [1] before coding starts this summer. The wiki will allow the
>> community to discuss the JSON data model implementation in VXQuery.
>>
>> I updated the JSONiq wiki to help get the documentation started. Please
>> fill in the JSON data model based on the examples seen on our website
>> (links on the wiki page).
>>
>> Post here if you have any questions.
>>
>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>
>
>
>
> --
> Riyafa Abdul Hameed
> Undergraduate, University of Moratuwa
>
> Email: riyafa.12@cse.mrt.ac.lk
> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
> <http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>.

Hi,

+1 for the binary search--I have been thinking of that as well. I shall
update the wiki based on these.

Thank you,
Riyafa

On 10 May 2016 at 10:49, Till Westmann <ti...@apache.org> wrote:

> :)
>
>
> On 9 May 2016, at 21:55, Michael Carey wrote:
>
> +1 to first get things working (and presumably modularizing the code so
>> that other/later streamlined implementations can be plugged in instead).
>>
>>
>> On 5/9/16 6:24 AM, Till Westmann wrote:
>>
>>> All of this looks pretty good!
>>>
>>> Wrt. the question of the dictionary for the fields, I think that we
>>> should
>>> consider the 2 ways that we can access an object:
>>> 1. Either we get all keys (jdm:keys) or
>>> 2. we get a value for a key (jdm:value).
>>>
>>> To get all the keys efficiently and to be able to skip huge nested
>>> values a
>>> simple approach could be store a dictionary of the keys (in their
>>> original
>>> order) with pointers (offsets) to the values. That way we could get the
>>> keys
>>> quickly by scanning the dictionary and each value by scanning the
>>> dictionary
>>> + 1 hop to find the value. This certainly has the problem, that the
>>> access
>>> is linear in the number of the keys. But it is reasonably simple and it
>>> would allow us to get a correct + testable implementation relatively soon
>>> and to have a baseline for a more optimized representation.
>>>
>>> Thoughts?
>>>
>>> Cheers,
>>> Till
>>>
>>> [1]
>>> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
>>>
>>> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>>>
>>> Hi Preston,
>>>>
>>>> I have edited the wiki[1] and the doc[2] based on the comments. Thank
>>>> you
>>>> for the suggestions provided. I have removed the part that assigns an
>>>> id to
>>>> the keys and instead suggested that the keys be stored in the order they
>>>> appear in the json object. I am not sure I understand the concept of
>>>> hashcode--how to generate the hashcodes used for easy lookup?
>>>>
>>>>
>>>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>> [2]
>>>>
>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>
>>>> Thank you again.
>>>>
>>>> Yours sincerely,
>>>> Riyafa
>>>>
>>>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu> wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I updated the wiki page according to Preston's comments along with the
>>>>> json array example in [1].
>>>>>
>>>>> [1]
>>>>>
>>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>
>>>>> Thank you,
>>>>> Christina
>>>>>
>>>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>>>>
>>>>> Nice job guys. I can see you are picking up how to create a data
>>>>>> model. I have limited my comments to the wiki [1] for now. At a high
>>>>>> level, I was impressed with your detail and thoughtful layouts. It
>>>>>> reminds me of the age old trade off: speed vs space. At this time,
>>>>>> lets error on saving space. The data model should the as compact as
>>>>>> possible.
>>>>>>
>>>>>> I also found the AsterixDB serialization [2] we can use as a
>>>>>> reference. Even though the AsterixDB data model includes object
>>>>>> length, I would leave that out since all the XQuery data models do not
>>>>>> include this property.
>>>>>>
>>>>>> Riyafa, take a look at the method AsterixDB uses for quick look ups (a
>>>>>> hash value for the name). Consider the pros and cons between your
>>>>>> method and AsterixDB's method: a list hash value for name and a sorted
>>>>>> list of names.
>>>>>>
>>>>>> Also, take a look at my wiki comments. Its a great start!
>>>>>>
>>>>>> Mahalo,
>>>>>> Preston
>>>>>>
>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>> [2]
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>>>>
>>>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <
>>>>>> cpavl001@ucr.edu>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>>
>>>>>>> I, also, designed an example for the json array [1] given the
>>>>>>> description I
>>>>>>> wrote in the wiki page.
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Christina
>>>>>>>
>>>>>>>
>>>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am attempting to create a doc on the JSONiq data model for
>>>>>>>> objects[1]
>>>>>>>> (It
>>>>>>>> might be full of errors because I am doing the calculations
>>>>>>>> manually).
>>>>>>>>
>>>>>>>> This is what I have come up on the data model for objects:
>>>>>>>>
>>>>>>>> The first byte would have the value tag, followed by the id (4
>>>>>>>> bytes) of
>>>>>>>> the object. Then 4 bytes to represent the size of the object. Then
>>>>>>>> another
>>>>>>>> four bytes to represent the number of key-value pairs. Next few
>>>>>>>> bytes
>>>>>>>> represent the offsets of keys which follow (each offset is
>>>>>>>> represented
>>>>>>>> by
>>>>>>>> 4
>>>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>>>>>>>> sorted
>>>>>>>> list of ids for keys in alphabetical order. The following bytes
>>>>>>>> would
>>>>>>>> represent the keys in the object.Each key is a StringPointable
>>>>>>>> followed
>>>>>>>> by
>>>>>>>> the id of the key. Each object would have a sequence pointable: the
>>>>>>>> following bytes would be the number of Items (items are the values
>>>>>>>> for
>>>>>>>> keys) in the sequence. The next bytes would be the offset of each
>>>>>>>> item
>>>>>>>> in
>>>>>>>> the sequence. The last bytes would be the values for each key
>>>>>>>> followed
>>>>>>>> by
>>>>>>>> the respective id of the key.
>>>>>>>>
>>>>>>>> Hope it makes sense.
>>>>>>>>
>>>>>>>> My problem is,
>>>>>>>>
>>>>>>>> I have not provided for the white spaces in the object. What can I
>>>>>>>> use
>>>>>>>> to
>>>>>>>> represent the white spaces? I cannot use a text node because object
>>>>>>>> is
>>>>>>>> not
>>>>>>>> a node.
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> Yours sincerely,
>>>>>>>> Riyafa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> We have two students working with us this summer through GSOC to
>>>>>>>>
>>>>>>>>> complete
>>>>>>>>> JSONiq specification for arrays and objects. I think the first
>>>>>>>>> step is
>>>>>>>>> to
>>>>>>>>> define the data model used by JSONiq. The definition should be
>>>>>>>>> defined
>>>>>>>>> in
>>>>>>>>> our wiki [1] before coding starts this summer. The wiki will allow
>>>>>>>>> the
>>>>>>>>> community to discuss the JSON data model implementation in VXQuery.
>>>>>>>>>
>>>>>>>>> I updated the JSONiq wiki to help get the documentation started.
>>>>>>>>> Please
>>>>>>>>> fill in the JSON data model based on the examples seen on our
>>>>>>>>> website
>>>>>>>>> (links on the wiki page).
>>>>>>>>>
>>>>>>>>> Post here if you have any questions.
>>>>>>>>>
>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>> --
>>>> Riyafa Abdul Hameed
>>>> Undergraduate, University of Moratuwa
>>>>
>>>> Email: riyafa.12@cse.mrt.ac.lk
>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>> <http://twitter.com/Riyafa1>
>>>>
>>>


-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: riyafa.12@cse.mrt.ac.lk
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Till Westmann <ti...@apache.org>.

:)

On 9 May 2016, at 21:55, Michael Carey wrote:

> +1 to first get things working (and presumably modularizing the code 
> so that other/later streamlined implementations can be plugged in 
> instead).
>
>
> On 5/9/16 6:24 AM, Till Westmann wrote:
>> All of this looks pretty good!
>>
>> Wrt. the question of the dictionary for the fields, I think that we 
>> should
>> consider the 2 ways that we can access an object:
>> 1. Either we get all keys (jdm:keys) or
>> 2. we get a value for a key (jdm:value).
>>
>> To get all the keys efficiently and to be able to skip huge nested 
>> values a
>> simple approach could be store a dictionary of the keys (in their 
>> original
>> order) with pointers (offsets) to the values. That way we could get 
>> the keys
>> quickly by scanning the dictionary and each value by scanning the 
>> dictionary
>> + 1 hop to find the value. This certainly has the problem, that the 
>> access
>> is linear in the number of the keys. But it is reasonably simple and 
>> it
>> would allow us to get a correct + testable implementation relatively 
>> soon
>> and to have a baseline for a more optimized representation.
>>
>> Thoughts?
>>
>> Cheers,
>> Till
>>
>> [1] 
>> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
>>
>> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>>
>>> Hi Preston,
>>>
>>> I have edited the wiki[1] and the doc[2] based on the comments. 
>>> Thank you
>>> for the suggestions provided. I have removed the part that assigns 
>>> an id to
>>> the keys and instead suggested that the keys be stored in the order 
>>> they
>>> appear in the json object. I am not sure I understand the concept of
>>> hashcode--how to generate the hashcodes used for easy lookup?
>>>
>>>
>>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>> [2]
>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>
>>> Thank you again.
>>>
>>> Yours sincerely,
>>> Riyafa
>>>
>>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu> 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I updated the wiki page according to Preston's comments along with 
>>>> the
>>>> json array example in [1].
>>>>
>>>> [1]
>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>
>>>> Thank you,
>>>> Christina
>>>>
>>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>>>
>>>>> Nice job guys. I can see you are picking up how to create a data
>>>>> model. I have limited my comments to the wiki [1] for now. At a 
>>>>> high
>>>>> level, I was impressed with your detail and thoughtful layouts. It
>>>>> reminds me of the age old trade off: speed vs space. At this time,
>>>>> lets error on saving space. The data model should the as compact 
>>>>> as
>>>>> possible.
>>>>>
>>>>> I also found the AsterixDB serialization [2] we can use as a
>>>>> reference. Even though the AsterixDB data model includes object
>>>>> length, I would leave that out since all the XQuery data models do 
>>>>> not
>>>>> include this property.
>>>>>
>>>>> Riyafa, take a look at the method AsterixDB uses for quick look 
>>>>> ups (a
>>>>> hash value for the name). Consider the pros and cons between your
>>>>> method and AsterixDB's method: a list hash value for name and a 
>>>>> sorted
>>>>> list of names.
>>>>>
>>>>> Also, take a look at my wiki comments. Its a great start!
>>>>>
>>>>> Mahalo,
>>>>> Preston
>>>>>
>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>> [2]
>>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>>>
>>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou 
>>>>> <cp...@ucr.edu>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I, also, designed an example for the json array [1] given the
>>>>>> description I
>>>>>> wrote in the wiki page.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>>
>>>>>> Thank you,
>>>>>> Christina
>>>>>>
>>>>>>
>>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am attempting to create a doc on the JSONiq data model for 
>>>>>>> objects[1]
>>>>>>> (It
>>>>>>> might be full of errors because I am doing the calculations 
>>>>>>> manually).
>>>>>>>
>>>>>>> This is what I have come up on the data model for objects:
>>>>>>>
>>>>>>> The first byte would have the value tag, followed by the id (4 
>>>>>>> bytes) of
>>>>>>> the object. Then 4 bytes to represent the size of the object. 
>>>>>>> Then
>>>>>>> another
>>>>>>> four bytes to represent the number of key-value pairs. Next few 
>>>>>>> bytes
>>>>>>> represent the offsets of keys which follow (each offset is 
>>>>>>> represented
>>>>>>> by
>>>>>>> 4
>>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would 
>>>>>>> be a
>>>>>>> sorted
>>>>>>> list of ids for keys in alphabetical order. The following bytes 
>>>>>>> would
>>>>>>> represent the keys in the object.Each key is a StringPointable 
>>>>>>> followed
>>>>>>> by
>>>>>>> the id of the key. Each object would have a sequence pointable: 
>>>>>>> the
>>>>>>> following bytes would be the number of Items (items are the 
>>>>>>> values for
>>>>>>> keys) in the sequence. The next bytes would be the offset of 
>>>>>>> each item
>>>>>>> in
>>>>>>> the sequence. The last bytes would be the values for each key 
>>>>>>> followed
>>>>>>> by
>>>>>>> the respective id of the key.
>>>>>>>
>>>>>>> Hope it makes sense.
>>>>>>>
>>>>>>> My problem is,
>>>>>>>
>>>>>>> I have not provided for the white spaces in the object. What can 
>>>>>>> I use
>>>>>>> to
>>>>>>> represent the white spaces? I cannot use a text node because 
>>>>>>> object is
>>>>>>> not
>>>>>>> a node.
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Yours sincerely,
>>>>>>> Riyafa
>>>>>>>
>>>>>>>
>>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> 
>>>>>>> wrote:
>>>>>>>
>>>>>>> We have two students working with us this summer through GSOC to
>>>>>>>> complete
>>>>>>>> JSONiq specification for arrays and objects. I think the first 
>>>>>>>> step is
>>>>>>>> to
>>>>>>>> define the data model used by JSONiq. The definition should be 
>>>>>>>> defined
>>>>>>>> in
>>>>>>>> our wiki [1] before coding starts this summer. The wiki will 
>>>>>>>> allow the
>>>>>>>> community to discuss the JSON data model implementation in 
>>>>>>>> VXQuery.
>>>>>>>>
>>>>>>>> I updated the JSONiq wiki to help get the documentation 
>>>>>>>> started. Please
>>>>>>>> fill in the JSON data model based on the examples seen on our 
>>>>>>>> website
>>>>>>>> (links on the wiki page).
>>>>>>>>
>>>>>>>> Post here if you have any questions.
>>>>>>>>
>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>
>>>
>>> -- 
>>> Riyafa Abdul Hameed
>>> Undergraduate, University of Moratuwa
>>>
>>> Email: riyafa.12@cse.mrt.ac.lk
>>> Website: https://riyafa.wordpress.com/ 
>>> <http://riyafa.wordpress.com/>
>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>> <http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Michael Carey <mj...@ics.uci.edu>.

+1 to first get things working (and presumably modularizing the code so 
that other/later streamlined implementations can be plugged in instead).


On 5/9/16 6:24 AM, Till Westmann wrote:
> All of this looks pretty good!
>
> Wrt. the question of the dictionary for the fields, I think that we 
> should
> consider the 2 ways that we can access an object:
> 1. Either we get all keys (jdm:keys) or
> 2. we get a value for a key (jdm:value).
>
> To get all the keys efficiently and to be able to skip huge nested 
> values a
> simple approach could be store a dictionary of the keys (in their 
> original
> order) with pointers (offsets) to the values. That way we could get 
> the keys
> quickly by scanning the dictionary and each value by scanning the 
> dictionary
> + 1 hop to find the value. This certainly has the problem, that the 
> access
> is linear in the number of the keys. But it is reasonably simple and it
> would allow us to get a correct + testable implementation relatively soon
> and to have a baseline for a more optimized representation.
>
> Thoughts?
>
> Cheers,
> Till
>
> [1] 
> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
>
> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>
>> Hi Preston,
>>
>> I have edited the wiki[1] and the doc[2] based on the comments. Thank 
>> you
>> for the suggestions provided. I have removed the part that assigns an 
>> id to
>> the keys and instead suggested that the keys be stored in the order they
>> appear in the json object. I am not sure I understand the concept of
>> hashcode--how to generate the hashcodes used for easy lookup?
>>
>>
>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>> [2]
>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0 
>>
>>
>> Thank you again.
>>
>> Yours sincerely,
>> Riyafa
>>
>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu> wrote:
>>
>>> Hi,
>>>
>>> I updated the wiki page according to Preston's comments along with the
>>> json array example in [1].
>>>
>>> [1]
>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit 
>>>
>>>
>>> Thank you,
>>> Christina
>>>
>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>>
>>>> Nice job guys. I can see you are picking up how to create a data
>>>> model. I have limited my comments to the wiki [1] for now. At a high
>>>> level, I was impressed with your detail and thoughtful layouts. It
>>>> reminds me of the age old trade off: speed vs space. At this time,
>>>> lets error on saving space. The data model should the as compact as
>>>> possible.
>>>>
>>>> I also found the AsterixDB serialization [2] we can use as a
>>>> reference. Even though the AsterixDB data model includes object
>>>> length, I would leave that out since all the XQuery data models do not
>>>> include this property.
>>>>
>>>> Riyafa, take a look at the method AsterixDB uses for quick look ups (a
>>>> hash value for the name). Consider the pros and cons between your
>>>> method and AsterixDB's method: a list hash value for name and a sorted
>>>> list of names.
>>>>
>>>> Also, take a look at my wiki comments. Its a great start!
>>>>
>>>> Mahalo,
>>>> Preston
>>>>
>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>> [2]
>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference 
>>>>
>>>>
>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou 
>>>> <cp...@ucr.edu>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I, also, designed an example for the json array [1] given the
>>>>> description I
>>>>> wrote in the wiki page.
>>>>>
>>>>> [1]
>>>>>
>>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit 
>>>>>
>>>>>
>>>>> Thank you,
>>>>> Christina
>>>>>
>>>>>
>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am attempting to create a doc on the JSONiq data model for 
>>>>>> objects[1]
>>>>>> (It
>>>>>> might be full of errors because I am doing the calculations 
>>>>>> manually).
>>>>>>
>>>>>> This is what I have come up on the data model for objects:
>>>>>>
>>>>>> The first byte would have the value tag, followed by the id (4 
>>>>>> bytes) of
>>>>>> the object. Then 4 bytes to represent the size of the object. Then
>>>>>> another
>>>>>> four bytes to represent the number of key-value pairs. Next few 
>>>>>> bytes
>>>>>> represent the offsets of keys which follow (each offset is 
>>>>>> represented
>>>>>> by
>>>>>> 4
>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>>>>>> sorted
>>>>>> list of ids for keys in alphabetical order. The following bytes 
>>>>>> would
>>>>>> represent the keys in the object.Each key is a StringPointable 
>>>>>> followed
>>>>>> by
>>>>>> the id of the key. Each object would have a sequence pointable: the
>>>>>> following bytes would be the number of Items (items are the 
>>>>>> values for
>>>>>> keys) in the sequence. The next bytes would be the offset of each 
>>>>>> item
>>>>>> in
>>>>>> the sequence. The last bytes would be the values for each key 
>>>>>> followed
>>>>>> by
>>>>>> the respective id of the key.
>>>>>>
>>>>>> Hope it makes sense.
>>>>>>
>>>>>> My problem is,
>>>>>>
>>>>>> I have not provided for the white spaces in the object. What can 
>>>>>> I use
>>>>>> to
>>>>>> represent the white spaces? I cannot use a text node because 
>>>>>> object is
>>>>>> not
>>>>>> a node.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0 
>>>>>>
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Yours sincerely,
>>>>>> Riyafa
>>>>>>
>>>>>>
>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> 
>>>>>> wrote:
>>>>>>
>>>>>> We have two students working with us this summer through GSOC to
>>>>>>> complete
>>>>>>> JSONiq specification for arrays and objects. I think the first 
>>>>>>> step is
>>>>>>> to
>>>>>>> define the data model used by JSONiq. The definition should be 
>>>>>>> defined
>>>>>>> in
>>>>>>> our wiki [1] before coding starts this summer. The wiki will 
>>>>>>> allow the
>>>>>>> community to discuss the JSON data model implementation in VXQuery.
>>>>>>>
>>>>>>> I updated the JSONiq wiki to help get the documentation started. 
>>>>>>> Please
>>>>>>> fill in the JSON data model based on the examples seen on our 
>>>>>>> website
>>>>>>> (links on the wiki page).
>>>>>>>
>>>>>>> Post here if you have any questions.
>>>>>>>
>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>
>>>>>>>
>>>>>>
>>>
>>
>>
>> -- 
>> Riyafa Abdul Hameed
>> Undergraduate, University of Moratuwa
>>
>> Email: riyafa.12@cse.mrt.ac.lk
>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>> <http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>.

Hi,

After going through alternative data models to represent objects including
more optimized methods of lookup, it has been decided to go along with the
most basic model which is Option 1 as suggested by Preston and recorded in
the wiki[1]. This was because after getting things to work using the simple
method further optimization could be carried out.

[1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq

Thank you.
Riyafa

On 10 May 2016 at 10:59, Michael Carey <mj...@ics.uci.edu> wrote:

> Sounds like a great plan!
>
>
>
> On 5/9/16 10:18 PM, Till Westmann wrote:
>
>>
>>
>> On 9 May 2016, at 12:02, Preston Carman wrote:
>>
>> I think we have three options: optimize for space, keys (jdm:keys) or
>>> field lookup (jdm:value). The optimization for keys and field lookup
>>> could be done independently. Lets consider the option currently in the
>>> wiki as option 1 (space). Don't remove this option from the wiki so we
>>> have a reference. The new options for keys and field lookup can be
>>> added as option 2 and 3.
>>>
>>> Option 1 (space): A tightly compact format that is optimized to save
>>> space.
>>> Option 2 (keys): A data model optimized for accessing a list of keys.
>>> Option 3 (lookup): A data model optimized for accessing a field in the
>>> object.
>>>
>>> For option 2 (keys):
>>> Consider the return value for jdm:keys: jdm:keys($o as object()) as
>>> xs:string*
>>> I am not sure I fully understand what xs:string* represents. Is this a
>>> sequence of string as in XQuery or an array in JSONiq or some other
>>> structure. The most optimal way to return the keys would be to store
>>> them in the same way they should be returned. This way you can do a
>>> simple copy to produce the result without processing the result. In
>>> this case, storing them as a sequence (or array) of string values
>>> might be the best option. The values would then need to be a separate
>>> sequence (or array) of typed values in the object data model. Pro:
>>> easy keys function. Con: added a list of offsets for the keys.
>>>
>>
>> xs:string* is indeed a sequence of strings
>>
>> For option 3 (lookup):
>>> This option is independent of option 2. As Till suggested we can
>>> implement this at a later date. We would need a method to improve the
>>> lookup of a field. Option 1 and 2 requires a sequential search of the
>>> keys and a string comparison at each field. The AsterixDB record data
>>> model is a little more complex than I first thought. Take a look a
>>> their record implementation: writing the record [1] (line 205 to 245
>>> are interesting) and field look up [2] (line 277 to 344) . We only
>>> need to consider the open part of the record. (The closed part can be
>>> ignored.)
>>>
>>
>> I had another idea for the implementation of the dictionary. We could
>> store the keys in sorted order - while we store the values in the original
>> order. If each key is then followed by the offset to the value, we would
>> get
>> a) a log n access for a value (as the keys are sorted and we can do binary
>>    search) and
>> b) the keys in their original order, if we sort them by the offsets.
>> Assuming that the value() access is quite a bit more common than the
>> keys() access this could be a reasonable trade-off.
>>
>> Comments?
>>>
>>
>> Sounds good to list the options on the Wiki page.
>>
>> Also, what is the actual result of jdm:keys?
>>>
>>
>> A sequence of strings.
>>
>> What is the requirement for the initial implementation?
>>>
>>
>> It should be correct and tested.
>>
>> My 2c,
>> Till
>>
>> [1]
>>> https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/builders/RecordBuilder.java
>>> [2]
>>> https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/ARecordSerializerDeserializer.java
>>>
>>> On Mon, May 9, 2016 at 8:35 AM, Riyafa Abdul Hameed
>>> <ri...@cse.mrt.ac.lk> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there any documentation I could go through to understand the
>>>> AsterixDB
>>>> Hash code implementation on the open fields? I am not sure I understand
>>>> enough from the AsterixDB serialization [1] to define the data model for
>>>> objects using it.
>>>>
>>>> Sorry about any confusion.
>>>>
>>>> [1]
>>>>
>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>>
>>>> Thank you.
>>>> Riyafa
>>>>
>>>> On 9 May 2016 at 20:16, Michael J. Carey <mj...@ics.uci.edu> wrote:
>>>>
>>>> I think Preston's suggestion of looking at the AsterixDB implementation
>>>>> of
>>>>> its binary data model is a good one, as it shares the efficient field
>>>>> access by name requirements and several VXQuery folks are experts in
>>>>> its
>>>>> details as well.  I believe it uses a sorted list instead of a hash
>>>>> table
>>>>> internally, perhaps - slightly simpler for updates perhaps.
>>>>> On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed" <riyafa.12@cse.mrt.ac.lk
>>>>> >
>>>>> wrote:
>>>>>
>>>>> Hi again,
>>>>>
>>>>> I have been thinking of Till's suggestion of using a dictionary, and I
>>>>> think it would be a better alternative because then we wouldn't have to
>>>>> process the valuetag of the value of a particular key before moving to
>>>>> the
>>>>> next key. Hence it would be easy to implement jdm:keys method. Any
>>>>> suggestions? Shall I updated the wiki and the doc based on this.
>>>>>
>>>>> Thank you.
>>>>> Riyafa
>>>>>
>>>>> On 9 May 2016 at 19:21, Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>
>>>>> wrote:
>>>>>
>>>>> Hi Till,
>>>>>>
>>>>>> Currently I have suggested storing each key followed by the value.
>>>>>> This
>>>>>> uses less space and is quite similar to storing the offset of the
>>>>>> values
>>>>>> and the access is also linear to the number of keys.
>>>>>>
>>>>>> Thanks.
>>>>>> Riyafa
>>>>>>
>>>>>> On 9 May 2016 at 18:54, Till Westmann <ti...@apache.org> wrote:
>>>>>>
>>>>>> All of this looks pretty good!
>>>>>>>
>>>>>>> Wrt. the question of the dictionary for the fields, I think that we
>>>>>>>
>>>>>> should
>>>>>
>>>>>> consider the 2 ways that we can access an object:
>>>>>>> 1. Either we get all keys (jdm:keys) or
>>>>>>> 2. we get a value for a key (jdm:value).
>>>>>>>
>>>>>>> To get all the keys efficiently and to be able to skip huge nested
>>>>>>>
>>>>>> values
>>>>>
>>>>>> a
>>>>>>> simple approach could be store a dictionary of the keys (in their
>>>>>>>
>>>>>> original
>>>>>
>>>>>> order) with pointers (offsets) to the values. That way we could get
>>>>>>> the
>>>>>>> keys
>>>>>>> quickly by scanning the dictionary and each value by scanning the
>>>>>>> dictionary
>>>>>>> + 1 hop to find the value. This certainly has the problem, that the
>>>>>>>
>>>>>> access
>>>>>
>>>>>> is linear in the number of the keys. But it is reasonably simple and
>>>>>>> it
>>>>>>> would allow us to get a correct + testable implementation relatively
>>>>>>>
>>>>>> soon
>>>>>
>>>>>> and to have a baseline for a more optimized representation.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>
>>>>> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
>>>>>
>>>>>>
>>>>>>> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>>>>>>>
>>>>>>> Hi Preston,
>>>>>>>
>>>>>>>>
>>>>>>>> I have edited the wiki[1] and the doc[2] based on the comments.
>>>>>>>> Thank
>>>>>>>>
>>>>>>> you
>>>>>
>>>>>> for the suggestions provided. I have removed the part that assigns an
>>>>>>>>
>>>>>>> id
>>>>>
>>>>>> to
>>>>>>>> the keys and instead suggested that the keys be stored in the order
>>>>>>>>
>>>>>>> they
>>>>>
>>>>>> appear in the json object. I am not sure I understand the concept of
>>>>>>>> hashcode--how to generate the hashcodes used for easy lookup?
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>> [2]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>
>>>>>>
>>>>>>>> Thank you again.
>>>>>>>>
>>>>>>>> Yours sincerely,
>>>>>>>> Riyafa
>>>>>>>>
>>>>>>>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu>
>>>>>>>>
>>>>>>> wrote:
>>>>>
>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I updated the wiki page according to Preston's comments along with
>>>>>>>>> the
>>>>>>>>> json array example in [1].
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>
>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Christina
>>>>>>>>>
>>>>>>>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>>>>>>>>
>>>>>>>>> Nice job guys. I can see you are picking up how to create a data
>>>>>>>>>
>>>>>>>>>> model. I have limited my comments to the wiki [1] for now. At a
>>>>>>>>>> high
>>>>>>>>>> level, I was impressed with your detail and thoughtful layouts. It
>>>>>>>>>> reminds me of the age old trade off: speed vs space. At this time,
>>>>>>>>>> lets error on saving space. The data model should the as compact
>>>>>>>>>> as
>>>>>>>>>> possible.
>>>>>>>>>>
>>>>>>>>>> I also found the AsterixDB serialization [2] we can use as a
>>>>>>>>>> reference. Even though the AsterixDB data model includes object
>>>>>>>>>> length, I would leave that out since all the XQuery data models do
>>>>>>>>>>
>>>>>>>>> not
>>>>>
>>>>>> include this property.
>>>>>>>>>>
>>>>>>>>>> Riyafa, take a look at the method AsterixDB uses for quick look
>>>>>>>>>> ups
>>>>>>>>>>
>>>>>>>>> (a
>>>>>
>>>>>> hash value for the name). Consider the pros and cons between your
>>>>>>>>>> method and AsterixDB's method: a list hash value for name and a
>>>>>>>>>>
>>>>>>>>> sorted
>>>>>
>>>>>> list of names.
>>>>>>>>>>
>>>>>>>>>> Also, take a look at my wiki comments. Its a great start!
>>>>>>>>>>
>>>>>>>>>> Mahalo,
>>>>>>>>>> Preston
>>>>>>>>>>
>>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>>> [2]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>>>
>>>>>>
>>>>>>>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <
>>>>>>>>>> cpavl001@ucr.edu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I, also, designed an example for the json array [1] given the
>>>>>>>>>>> description I
>>>>>>>>>>> wrote in the wiki page.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>
>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Christina
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am attempting to create a doc on the JSONiq data model for
>>>>>>>>>>>> objects[1]
>>>>>>>>>>>> (It
>>>>>>>>>>>> might be full of errors because I am doing the calculations
>>>>>>>>>>>> manually).
>>>>>>>>>>>>
>>>>>>>>>>>> This is what I have come up on the data model for objects:
>>>>>>>>>>>>
>>>>>>>>>>>> The first byte would have the value tag, followed by the id (4
>>>>>>>>>>>> bytes) of
>>>>>>>>>>>> the object. Then 4 bytes to represent the size of the object.
>>>>>>>>>>>> Then
>>>>>>>>>>>> another
>>>>>>>>>>>> four bytes to represent the number of key-value pairs. Next few
>>>>>>>>>>>>
>>>>>>>>>>> bytes
>>>>>
>>>>>> represent the offsets of keys which follow (each offset is
>>>>>>>>>>>> represented
>>>>>>>>>>>> by
>>>>>>>>>>>> 4
>>>>>>>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would
>>>>>>>>>>>> be
>>>>>>>>>>>>
>>>>>>>>>>> a
>>>>>
>>>>>> sorted
>>>>>>>>>>>> list of ids for keys in alphabetical order. The following bytes
>>>>>>>>>>>>
>>>>>>>>>>> would
>>>>>
>>>>>> represent the keys in the object.Each key is a StringPointable
>>>>>>>>>>>> followed
>>>>>>>>>>>> by
>>>>>>>>>>>> the id of the key. Each object would have a sequence pointable:
>>>>>>>>>>>> the
>>>>>>>>>>>> following bytes would be the number of Items (items are the
>>>>>>>>>>>> values
>>>>>>>>>>>> for
>>>>>>>>>>>> keys) in the sequence. The next bytes would be the offset of
>>>>>>>>>>>> each
>>>>>>>>>>>> item
>>>>>>>>>>>> in
>>>>>>>>>>>> the sequence. The last bytes would be the values for each key
>>>>>>>>>>>> followed
>>>>>>>>>>>> by
>>>>>>>>>>>> the respective id of the key.
>>>>>>>>>>>>
>>>>>>>>>>>> Hope it makes sense.
>>>>>>>>>>>>
>>>>>>>>>>>> My problem is,
>>>>>>>>>>>>
>>>>>>>>>>>> I have not provided for the white spaces in the object. What
>>>>>>>>>>>> can I
>>>>>>>>>>>> use
>>>>>>>>>>>> to
>>>>>>>>>>>> represent the white spaces? I cannot use a text node because
>>>>>>>>>>>> object
>>>>>>>>>>>> is
>>>>>>>>>>>> not
>>>>>>>>>>>> a node.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>
>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>
>>>>>>
>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>
>>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>> Riyafa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> We have two students working with us this summer through GSOC to
>>>>>>>>>>>>
>>>>>>>>>>>> complete
>>>>>>>>>>>>> JSONiq specification for arrays and objects. I think the first
>>>>>>>>>>>>>
>>>>>>>>>>>> step
>>>>>
>>>>>> is
>>>>>>>>>>>>> to
>>>>>>>>>>>>> define the data model used by JSONiq. The definition should be
>>>>>>>>>>>>> defined
>>>>>>>>>>>>> in
>>>>>>>>>>>>> our wiki [1] before coding starts this summer. The wiki will
>>>>>>>>>>>>> allow
>>>>>>>>>>>>> the
>>>>>>>>>>>>> community to discuss the JSON data model implementation in
>>>>>>>>>>>>>
>>>>>>>>>>>> VXQuery.
>>>>>
>>>>>>
>>>>>>>>>>>>> I updated the JSONiq wiki to help get the documentation
>>>>>>>>>>>>> started.
>>>>>>>>>>>>> Please
>>>>>>>>>>>>> fill in the JSON data model based on the examples seen on our
>>>>>>>>>>>>> website
>>>>>>>>>>>>> (links on the wiki page).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Post here if you have any questions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Riyafa Abdul Hameed
>>>>>>>> Undergraduate, University of Moratuwa
>>>>>>>>
>>>>>>>> Email: riyafa.12@cse.mrt.ac.lk
>>>>>>>> Website: https://riyafa.wordpress.com/ <
>>>>>>>> http://riyafa.wordpress.com/>
>>>>>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>>>>>> <http://twitter.com/Riyafa1>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Riyafa Abdul Hameed
>>>>>> Undergraduate, University of Moratuwa
>>>>>>
>>>>>> Email: riyafa.12@cse.mrt.ac.lk
>>>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>>>> <http://twitter.com/Riyafa1>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Riyafa Abdul Hameed
>>>>> Undergraduate, University of Moratuwa
>>>>>
>>>>> Email: riyafa.12@cse.mrt.ac.lk
>>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>>> <http://twitter.com/Riyafa1>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Riyafa Abdul Hameed
>>>> Undergraduate, University of Moratuwa
>>>>
>>>> Email: riyafa.12@cse.mrt.ac.lk
>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>> <http://twitter.com/Riyafa1>
>>>>
>>>
>


-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: riyafa.12@cse.mrt.ac.lk
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Michael Carey <mj...@ics.uci.edu>.

Sounds like a great plan!


On 5/9/16 10:18 PM, Till Westmann wrote:
>
>
> On 9 May 2016, at 12:02, Preston Carman wrote:
>
>> I think we have three options: optimize for space, keys (jdm:keys) or
>> field lookup (jdm:value). The optimization for keys and field lookup
>> could be done independently. Lets consider the option currently in the
>> wiki as option 1 (space). Don't remove this option from the wiki so we
>> have a reference. The new options for keys and field lookup can be
>> added as option 2 and 3.
>>
>> Option 1 (space): A tightly compact format that is optimized to save 
>> space.
>> Option 2 (keys): A data model optimized for accessing a list of keys.
>> Option 3 (lookup): A data model optimized for accessing a field in 
>> the object.
>>
>> For option 2 (keys):
>> Consider the return value for jdm:keys: jdm:keys($o as object()) as 
>> xs:string*
>> I am not sure I fully understand what xs:string* represents. Is this a
>> sequence of string as in XQuery or an array in JSONiq or some other
>> structure. The most optimal way to return the keys would be to store
>> them in the same way they should be returned. This way you can do a
>> simple copy to produce the result without processing the result. In
>> this case, storing them as a sequence (or array) of string values
>> might be the best option. The values would then need to be a separate
>> sequence (or array) of typed values in the object data model. Pro:
>> easy keys function. Con: added a list of offsets for the keys.
>
> xs:string* is indeed a sequence of strings
>
>> For option 3 (lookup):
>> This option is independent of option 2. As Till suggested we can
>> implement this at a later date. We would need a method to improve the
>> lookup of a field. Option 1 and 2 requires a sequential search of the
>> keys and a string comparison at each field. The AsterixDB record data
>> model is a little more complex than I first thought. Take a look a
>> their record implementation: writing the record [1] (line 205 to 245
>> are interesting) and field look up [2] (line 277 to 344) . We only
>> need to consider the open part of the record. (The closed part can be
>> ignored.)
>
> I had another idea for the implementation of the dictionary. We could
> store the keys in sorted order - while we store the values in the 
> original
> order. If each key is then followed by the offset to the value, we would
> get
> a) a log n access for a value (as the keys are sorted and we can do 
> binary
>    search) and
> b) the keys in their original order, if we sort them by the offsets.
> Assuming that the value() access is quite a bit more common than the
> keys() access this could be a reasonable trade-off.
>
>> Comments?
>
> Sounds good to list the options on the Wiki page.
>
>> Also, what is the actual result of jdm:keys?
>
> A sequence of strings.
>
>> What is the requirement for the initial implementation?
>
> It should be correct and tested.
>
> My 2c,
> Till
>
>> [1] 
>> https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/builders/RecordBuilder.java
>> [2] 
>> https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/ARecordSerializerDeserializer.java
>>
>> On Mon, May 9, 2016 at 8:35 AM, Riyafa Abdul Hameed
>> <ri...@cse.mrt.ac.lk> wrote:
>>> Hi,
>>>
>>> Is there any documentation I could go through to understand the 
>>> AsterixDB
>>> Hash code implementation on the open fields? I am not sure I understand
>>> enough from the AsterixDB serialization [1] to define the data model 
>>> for
>>> objects using it.
>>>
>>> Sorry about any confusion.
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference 
>>>
>>>
>>> Thank you.
>>> Riyafa
>>>
>>> On 9 May 2016 at 20:16, Michael J. Carey <mj...@ics.uci.edu> wrote:
>>>
>>>> I think Preston's suggestion of looking at the AsterixDB 
>>>> implementation of
>>>> its binary data model is a good one, as it shares the efficient field
>>>> access by name requirements and several VXQuery folks are experts 
>>>> in its
>>>> details as well.  I believe it uses a sorted list instead of a hash 
>>>> table
>>>> internally, perhaps - slightly simpler for updates perhaps.
>>>> On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed" 
>>>> <ri...@cse.mrt.ac.lk>
>>>> wrote:
>>>>
>>>> Hi again,
>>>>
>>>> I have been thinking of Till's suggestion of using a dictionary, and I
>>>> think it would be a better alternative because then we wouldn't 
>>>> have to
>>>> process the valuetag of the value of a particular key before moving 
>>>> to the
>>>> next key. Hence it would be easy to implement jdm:keys method. Any
>>>> suggestions? Shall I updated the wiki and the doc based on this.
>>>>
>>>> Thank you.
>>>> Riyafa
>>>>
>>>> On 9 May 2016 at 19:21, Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>
>>>> wrote:
>>>>
>>>>> Hi Till,
>>>>>
>>>>> Currently I have suggested storing each key followed by the value. 
>>>>> This
>>>>> uses less space and is quite similar to storing the offset of the 
>>>>> values
>>>>> and the access is also linear to the number of keys.
>>>>>
>>>>> Thanks.
>>>>> Riyafa
>>>>>
>>>>> On 9 May 2016 at 18:54, Till Westmann <ti...@apache.org> wrote:
>>>>>
>>>>>> All of this looks pretty good!
>>>>>>
>>>>>> Wrt. the question of the dictionary for the fields, I think that we
>>>> should
>>>>>> consider the 2 ways that we can access an object:
>>>>>> 1. Either we get all keys (jdm:keys) or
>>>>>> 2. we get a value for a key (jdm:value).
>>>>>>
>>>>>> To get all the keys efficiently and to be able to skip huge nested
>>>> values
>>>>>> a
>>>>>> simple approach could be store a dictionary of the keys (in their
>>>> original
>>>>>> order) with pointers (offsets) to the values. That way we could 
>>>>>> get the
>>>>>> keys
>>>>>> quickly by scanning the dictionary and each value by scanning the
>>>>>> dictionary
>>>>>> + 1 hop to find the value. This certainly has the problem, that the
>>>> access
>>>>>> is linear in the number of the keys. But it is reasonably simple 
>>>>>> and it
>>>>>> would allow us to get a correct + testable implementation relatively
>>>> soon
>>>>>> and to have a baseline for a more optimized representation.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> [1]
>>>>>>
>>>>
>>>> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880 
>>>>
>>>>>>
>>>>>> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>>>>>>
>>>>>> Hi Preston,
>>>>>>>
>>>>>>> I have edited the wiki[1] and the doc[2] based on the comments. 
>>>>>>> Thank
>>>> you
>>>>>>> for the suggestions provided. I have removed the part that 
>>>>>>> assigns an
>>>> id
>>>>>>> to
>>>>>>> the keys and instead suggested that the keys be stored in the order
>>>> they
>>>>>>> appear in the json object. I am not sure I understand the 
>>>>>>> concept of
>>>>>>> hashcode--how to generate the hashcodes used for easy lookup?
>>>>>>>
>>>>>>>
>>>>>>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>> [2]
>>>>>>>
>>>>>>>
>>>>
>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0 
>>>>
>>>>>>>
>>>>>>> Thank you again.
>>>>>>>
>>>>>>> Yours sincerely,
>>>>>>> Riyafa
>>>>>>>
>>>>>>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu>
>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I updated the wiki page according to Preston's comments along 
>>>>>>>> with the
>>>>>>>> json array example in [1].
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>
>>>>
>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit 
>>>>
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Christina
>>>>>>>>
>>>>>>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>>>>>>>
>>>>>>>> Nice job guys. I can see you are picking up how to create a data
>>>>>>>>> model. I have limited my comments to the wiki [1] for now. At 
>>>>>>>>> a high
>>>>>>>>> level, I was impressed with your detail and thoughtful 
>>>>>>>>> layouts. It
>>>>>>>>> reminds me of the age old trade off: speed vs space. At this 
>>>>>>>>> time,
>>>>>>>>> lets error on saving space. The data model should the as 
>>>>>>>>> compact as
>>>>>>>>> possible.
>>>>>>>>>
>>>>>>>>> I also found the AsterixDB serialization [2] we can use as a
>>>>>>>>> reference. Even though the AsterixDB data model includes object
>>>>>>>>> length, I would leave that out since all the XQuery data 
>>>>>>>>> models do
>>>> not
>>>>>>>>> include this property.
>>>>>>>>>
>>>>>>>>> Riyafa, take a look at the method AsterixDB uses for quick 
>>>>>>>>> look ups
>>>> (a
>>>>>>>>> hash value for the name). Consider the pros and cons between your
>>>>>>>>> method and AsterixDB's method: a list hash value for name and a
>>>> sorted
>>>>>>>>> list of names.
>>>>>>>>>
>>>>>>>>> Also, take a look at my wiki comments. Its a great start!
>>>>>>>>>
>>>>>>>>> Mahalo,
>>>>>>>>> Preston
>>>>>>>>>
>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>> [2]
>>>>>>>>>
>>>>>>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference 
>>>>
>>>>>>>>>
>>>>>>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <
>>>>>>>>> cpavl001@ucr.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I, also, designed an example for the json array [1] given the
>>>>>>>>>> description I
>>>>>>>>>> wrote in the wiki page.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit 
>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Christina
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am attempting to create a doc on the JSONiq data model for
>>>>>>>>>>> objects[1]
>>>>>>>>>>> (It
>>>>>>>>>>> might be full of errors because I am doing the calculations
>>>>>>>>>>> manually).
>>>>>>>>>>>
>>>>>>>>>>> This is what I have come up on the data model for objects:
>>>>>>>>>>>
>>>>>>>>>>> The first byte would have the value tag, followed by the id (4
>>>>>>>>>>> bytes) of
>>>>>>>>>>> the object. Then 4 bytes to represent the size of the 
>>>>>>>>>>> object. Then
>>>>>>>>>>> another
>>>>>>>>>>> four bytes to represent the number of key-value pairs. Next few
>>>> bytes
>>>>>>>>>>> represent the offsets of keys which follow (each offset is
>>>>>>>>>>> represented
>>>>>>>>>>> by
>>>>>>>>>>> 4
>>>>>>>>>>> bytes). Ids would be assigned to the keys. Next few bytes 
>>>>>>>>>>> would be
>>>> a
>>>>>>>>>>> sorted
>>>>>>>>>>> list of ids for keys in alphabetical order. The following bytes
>>>> would
>>>>>>>>>>> represent the keys in the object.Each key is a StringPointable
>>>>>>>>>>> followed
>>>>>>>>>>> by
>>>>>>>>>>> the id of the key. Each object would have a sequence 
>>>>>>>>>>> pointable: the
>>>>>>>>>>> following bytes would be the number of Items (items are the 
>>>>>>>>>>> values
>>>>>>>>>>> for
>>>>>>>>>>> keys) in the sequence. The next bytes would be the offset of 
>>>>>>>>>>> each
>>>>>>>>>>> item
>>>>>>>>>>> in
>>>>>>>>>>> the sequence. The last bytes would be the values for each key
>>>>>>>>>>> followed
>>>>>>>>>>> by
>>>>>>>>>>> the respective id of the key.
>>>>>>>>>>>
>>>>>>>>>>> Hope it makes sense.
>>>>>>>>>>>
>>>>>>>>>>> My problem is,
>>>>>>>>>>>
>>>>>>>>>>> I have not provided for the white spaces in the object. What 
>>>>>>>>>>> can I
>>>>>>>>>>> use
>>>>>>>>>>> to
>>>>>>>>>>> represent the white spaces? I cannot use a text node because 
>>>>>>>>>>> object
>>>>>>>>>>> is
>>>>>>>>>>> not
>>>>>>>>>>> a node.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>
>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0 
>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thank you.
>>>>>>>>>>>
>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>> Riyafa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> We have two students working with us this summer through 
>>>>>>>>>>> GSOC to
>>>>>>>>>>>
>>>>>>>>>>>> complete
>>>>>>>>>>>> JSONiq specification for arrays and objects. I think the first
>>>> step
>>>>>>>>>>>> is
>>>>>>>>>>>> to
>>>>>>>>>>>> define the data model used by JSONiq. The definition should be
>>>>>>>>>>>> defined
>>>>>>>>>>>> in
>>>>>>>>>>>> our wiki [1] before coding starts this summer. The wiki 
>>>>>>>>>>>> will allow
>>>>>>>>>>>> the
>>>>>>>>>>>> community to discuss the JSON data model implementation in
>>>> VXQuery.
>>>>>>>>>>>>
>>>>>>>>>>>> I updated the JSONiq wiki to help get the documentation 
>>>>>>>>>>>> started.
>>>>>>>>>>>> Please
>>>>>>>>>>>> fill in the JSON data model based on the examples seen on our
>>>>>>>>>>>> website
>>>>>>>>>>>> (links on the wiki page).
>>>>>>>>>>>>
>>>>>>>>>>>> Post here if you have any questions.
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> Riyafa Abdul Hameed
>>>>>>> Undergraduate, University of Moratuwa
>>>>>>>
>>>>>>> Email: riyafa.12@cse.mrt.ac.lk
>>>>>>> Website: https://riyafa.wordpress.com/ 
>>>>>>> <http://riyafa.wordpress.com/>
>>>>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>>>>> <http://twitter.com/Riyafa1>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Riyafa Abdul Hameed
>>>>> Undergraduate, University of Moratuwa
>>>>>
>>>>> Email: riyafa.12@cse.mrt.ac.lk
>>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>>> <http://twitter.com/Riyafa1>
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Riyafa Abdul Hameed
>>>> Undergraduate, University of Moratuwa
>>>>
>>>> Email: riyafa.12@cse.mrt.ac.lk
>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>>> <http://twitter.com/Riyafa1>
>>>>
>>>
>>>
>>>
>>> -- 
>>> Riyafa Abdul Hameed
>>> Undergraduate, University of Moratuwa
>>>
>>> Email: riyafa.12@cse.mrt.ac.lk
>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa>
>>> <http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Till Westmann <ti...@apache.org>.


On 9 May 2016, at 12:02, Preston Carman wrote:

> I think we have three options: optimize for space, keys (jdm:keys) or
> field lookup (jdm:value). The optimization for keys and field lookup
> could be done independently. Lets consider the option currently in the
> wiki as option 1 (space). Don't remove this option from the wiki so we
> have a reference. The new options for keys and field lookup can be
> added as option 2 and 3.
>
> Option 1 (space): A tightly compact format that is optimized to save 
> space.
> Option 2 (keys): A data model optimized for accessing a list of keys.
> Option 3 (lookup): A data model optimized for accessing a field in the 
> object.
>
> For option 2 (keys):
> Consider the return value for jdm:keys: jdm:keys($o as object()) as 
> xs:string*
> I am not sure I fully understand what xs:string* represents. Is this a
> sequence of string as in XQuery or an array in JSONiq or some other
> structure. The most optimal way to return the keys would be to store
> them in the same way they should be returned. This way you can do a
> simple copy to produce the result without processing the result. In
> this case, storing them as a sequence (or array) of string values
> might be the best option. The values would then need to be a separate
> sequence (or array) of typed values in the object data model. Pro:
> easy keys function. Con: added a list of offsets for the keys.

xs:string* is indeed a sequence of strings

> For option 3 (lookup):
> This option is independent of option 2. As Till suggested we can
> implement this at a later date. We would need a method to improve the
> lookup of a field. Option 1 and 2 requires a sequential search of the
> keys and a string comparison at each field. The AsterixDB record data
> model is a little more complex than I first thought. Take a look a
> their record implementation: writing the record [1] (line 205 to 245
> are interesting) and field look up [2] (line 277 to 344) . We only
> need to consider the open part of the record. (The closed part can be
> ignored.)

I had another idea for the implementation of the dictionary. We could
store the keys in sorted order - while we store the values in the 
original
order. If each key is then followed by the offset to the value, we would
get
a) a log n access for a value (as the keys are sorted and we can do 
binary
    search) and
b) the keys in their original order, if we sort them by the offsets.
Assuming that the value() access is quite a bit more common than the
keys() access this could be a reasonable trade-off.

> Comments?

Sounds good to list the options on the Wiki page.

> Also, what is the actual result of jdm:keys?

A sequence of strings.

> What is the requirement for the initial implementation?

It should be correct and tested.

My 2c,
Till

> [1] 
> https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/builders/RecordBuilder.java
> [2] 
> https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/ARecordSerializerDeserializer.java
>
> On Mon, May 9, 2016 at 8:35 AM, Riyafa Abdul Hameed
> <ri...@cse.mrt.ac.lk> wrote:
>> Hi,
>>
>> Is there any documentation I could go through to understand the 
>> AsterixDB
>> Hash code implementation on the open fields? I am not sure I 
>> understand
>> enough from the AsterixDB serialization [1] to define the data model 
>> for
>> objects using it.
>>
>> Sorry about any confusion.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>
>> Thank you.
>> Riyafa
>>
>> On 9 May 2016 at 20:16, Michael J. Carey <mj...@ics.uci.edu> wrote:
>>
>>> I think Preston's suggestion of looking at the AsterixDB 
>>> implementation of
>>> its binary data model is a good one, as it shares the efficient 
>>> field
>>> access by name requirements and several VXQuery folks are experts in 
>>> its
>>> details as well.  I believe it uses a sorted list instead of a hash 
>>> table
>>> internally, perhaps - slightly simpler for updates perhaps.
>>> On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed" 
>>> <ri...@cse.mrt.ac.lk>
>>> wrote:
>>>
>>> Hi again,
>>>
>>> I have been thinking of Till's suggestion of using a dictionary, and 
>>> I
>>> think it would be a better alternative because then we wouldn't have 
>>> to
>>> process the valuetag of the value of a particular key before moving 
>>> to the
>>> next key. Hence it would be easy to implement jdm:keys method. Any
>>> suggestions? Shall I updated the wiki and the doc based on this.
>>>
>>> Thank you.
>>> Riyafa
>>>
>>> On 9 May 2016 at 19:21, Riyafa Abdul Hameed 
>>> <ri...@cse.mrt.ac.lk>
>>> wrote:
>>>
>>>> Hi Till,
>>>>
>>>> Currently I have suggested storing each key followed by the value. 
>>>> This
>>>> uses less space and is quite similar to storing the offset of the 
>>>> values
>>>> and the access is also linear to the number of keys.
>>>>
>>>> Thanks.
>>>> Riyafa
>>>>
>>>> On 9 May 2016 at 18:54, Till Westmann <ti...@apache.org> wrote:
>>>>
>>>>> All of this looks pretty good!
>>>>>
>>>>> Wrt. the question of the dictionary for the fields, I think that 
>>>>> we
>>> should
>>>>> consider the 2 ways that we can access an object:
>>>>> 1. Either we get all keys (jdm:keys) or
>>>>> 2. we get a value for a key (jdm:value).
>>>>>
>>>>> To get all the keys efficiently and to be able to skip huge nested
>>> values
>>>>> a
>>>>> simple approach could be store a dictionary of the keys (in their
>>> original
>>>>> order) with pointers (offsets) to the values. That way we could 
>>>>> get the
>>>>> keys
>>>>> quickly by scanning the dictionary and each value by scanning the
>>>>> dictionary
>>>>> + 1 hop to find the value. This certainly has the problem, that 
>>>>> the
>>> access
>>>>> is linear in the number of the keys. But it is reasonably simple 
>>>>> and it
>>>>> would allow us to get a correct + testable implementation 
>>>>> relatively
>>> soon
>>>>> and to have a baseline for a more optimized representation.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> [1]
>>>>>
>>>
>>> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
>>>>>
>>>>> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>>>>>
>>>>> Hi Preston,
>>>>>>
>>>>>> I have edited the wiki[1] and the doc[2] based on the comments. 
>>>>>> Thank
>>> you
>>>>>> for the suggestions provided. I have removed the part that 
>>>>>> assigns an
>>> id
>>>>>> to
>>>>>> the keys and instead suggested that the keys be stored in the 
>>>>>> order
>>> they
>>>>>> appear in the json object. I am not sure I understand the concept 
>>>>>> of
>>>>>> hashcode--how to generate the hashcodes used for easy lookup?
>>>>>>
>>>>>>
>>>>>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>> [2]
>>>>>>
>>>>>>
>>>
>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>>
>>>>>> Thank you again.
>>>>>>
>>>>>> Yours sincerely,
>>>>>> Riyafa
>>>>>>
>>>>>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu>
>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>>
>>>>>>> I updated the wiki page according to Preston's comments along 
>>>>>>> with the
>>>>>>> json array example in [1].
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>
>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Christina
>>>>>>>
>>>>>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>>>>>>
>>>>>>> Nice job guys. I can see you are picking up how to create a data
>>>>>>>> model. I have limited my comments to the wiki [1] for now. At a 
>>>>>>>> high
>>>>>>>> level, I was impressed with your detail and thoughtful layouts. 
>>>>>>>> It
>>>>>>>> reminds me of the age old trade off: speed vs space. At this 
>>>>>>>> time,
>>>>>>>> lets error on saving space. The data model should the as 
>>>>>>>> compact as
>>>>>>>> possible.
>>>>>>>>
>>>>>>>> I also found the AsterixDB serialization [2] we can use as a
>>>>>>>> reference. Even though the AsterixDB data model includes object
>>>>>>>> length, I would leave that out since all the XQuery data models 
>>>>>>>> do
>>> not
>>>>>>>> include this property.
>>>>>>>>
>>>>>>>> Riyafa, take a look at the method AsterixDB uses for quick look 
>>>>>>>> ups
>>> (a
>>>>>>>> hash value for the name). Consider the pros and cons between 
>>>>>>>> your
>>>>>>>> method and AsterixDB's method: a list hash value for name and a
>>> sorted
>>>>>>>> list of names.
>>>>>>>>
>>>>>>>> Also, take a look at my wiki comments. Its a great start!
>>>>>>>>
>>>>>>>> Mahalo,
>>>>>>>> Preston
>>>>>>>>
>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>> [2]
>>>>>>>>
>>>>>>>>
>>>
>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>>>>>>
>>>>>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <
>>>>>>>> cpavl001@ucr.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I, also, designed an example for the json array [1] given the
>>>>>>>>> description I
>>>>>>>>> wrote in the wiki page.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>
>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Christina
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am attempting to create a doc on the JSONiq data model for
>>>>>>>>>> objects[1]
>>>>>>>>>> (It
>>>>>>>>>> might be full of errors because I am doing the calculations
>>>>>>>>>> manually).
>>>>>>>>>>
>>>>>>>>>> This is what I have come up on the data model for objects:
>>>>>>>>>>
>>>>>>>>>> The first byte would have the value tag, followed by the id 
>>>>>>>>>> (4
>>>>>>>>>> bytes) of
>>>>>>>>>> the object. Then 4 bytes to represent the size of the object. 
>>>>>>>>>> Then
>>>>>>>>>> another
>>>>>>>>>> four bytes to represent the number of key-value pairs. Next 
>>>>>>>>>> few
>>> bytes
>>>>>>>>>> represent the offsets of keys which follow (each offset is
>>>>>>>>>> represented
>>>>>>>>>> by
>>>>>>>>>> 4
>>>>>>>>>> bytes). Ids would be assigned to the keys. Next few bytes 
>>>>>>>>>> would be
>>> a
>>>>>>>>>> sorted
>>>>>>>>>> list of ids for keys in alphabetical order. The following 
>>>>>>>>>> bytes
>>> would
>>>>>>>>>> represent the keys in the object.Each key is a 
>>>>>>>>>> StringPointable
>>>>>>>>>> followed
>>>>>>>>>> by
>>>>>>>>>> the id of the key. Each object would have a sequence 
>>>>>>>>>> pointable: the
>>>>>>>>>> following bytes would be the number of Items (items are the 
>>>>>>>>>> values
>>>>>>>>>> for
>>>>>>>>>> keys) in the sequence. The next bytes would be the offset of 
>>>>>>>>>> each
>>>>>>>>>> item
>>>>>>>>>> in
>>>>>>>>>> the sequence. The last bytes would be the values for each key
>>>>>>>>>> followed
>>>>>>>>>> by
>>>>>>>>>> the respective id of the key.
>>>>>>>>>>
>>>>>>>>>> Hope it makes sense.
>>>>>>>>>>
>>>>>>>>>> My problem is,
>>>>>>>>>>
>>>>>>>>>> I have not provided for the white spaces in the object. What 
>>>>>>>>>> can I
>>>>>>>>>> use
>>>>>>>>>> to
>>>>>>>>>> represent the white spaces? I cannot use a text node because 
>>>>>>>>>> object
>>>>>>>>>> is
>>>>>>>>>> not
>>>>>>>>>> a node.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>
>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>>>>>>
>>>>>>>>>> Thank you.
>>>>>>>>>>
>>>>>>>>>> Yours sincerely,
>>>>>>>>>> Riyafa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 26 April 2016 at 10:29, Preston Carman 
>>>>>>>>>> <pr...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> We have two students working with us this summer through GSOC 
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>>>> complete
>>>>>>>>>>> JSONiq specification for arrays and objects. I think the 
>>>>>>>>>>> first
>>> step
>>>>>>>>>>> is
>>>>>>>>>>> to
>>>>>>>>>>> define the data model used by JSONiq. The definition should 
>>>>>>>>>>> be
>>>>>>>>>>> defined
>>>>>>>>>>> in
>>>>>>>>>>> our wiki [1] before coding starts this summer. The wiki will 
>>>>>>>>>>> allow
>>>>>>>>>>> the
>>>>>>>>>>> community to discuss the JSON data model implementation in
>>> VXQuery.
>>>>>>>>>>>
>>>>>>>>>>> I updated the JSONiq wiki to help get the documentation 
>>>>>>>>>>> started.
>>>>>>>>>>> Please
>>>>>>>>>>> fill in the JSON data model based on the examples seen on 
>>>>>>>>>>> our
>>>>>>>>>>> website
>>>>>>>>>>> (links on the wiki page).
>>>>>>>>>>>
>>>>>>>>>>> Post here if you have any questions.
>>>>>>>>>>>
>>>>>>>>>>> [1] 
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Riyafa Abdul Hameed
>>>>>> Undergraduate, University of Moratuwa
>>>>>>
>>>>>> Email: riyafa.12@cse.mrt.ac.lk
>>>>>> Website: https://riyafa.wordpress.com/ 
>>>>>> <http://riyafa.wordpress.com/>
>>>>>> <http://facebook.com/riyafa.ahf>  
>>>>>> <http://lk.linkedin.com/in/riyafa>
>>>>>> <http://twitter.com/Riyafa1>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Riyafa Abdul Hameed
>>>> Undergraduate, University of Moratuwa
>>>>
>>>> Email: riyafa.12@cse.mrt.ac.lk
>>>> Website: https://riyafa.wordpress.com/ 
>>>> <http://riyafa.wordpress.com/>
>>>> <http://facebook.com/riyafa.ahf>  
>>>> <http://lk.linkedin.com/in/riyafa>
>>>> <http://twitter.com/Riyafa1>
>>>>
>>>
>>>
>>>
>>> --
>>> Riyafa Abdul Hameed
>>> Undergraduate, University of Moratuwa
>>>
>>> Email: riyafa.12@cse.mrt.ac.lk
>>> Website: https://riyafa.wordpress.com/ 
>>> <http://riyafa.wordpress.com/>
>>> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
>>> <http://twitter.com/Riyafa1>
>>>
>>
>>
>>
>> --
>> Riyafa Abdul Hameed
>> Undergraduate, University of Moratuwa
>>
>> Email: riyafa.12@cse.mrt.ac.lk
>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
>> <http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Preston Carman <pr...@apache.org>.

I think we have three options: optimize for space, keys (jdm:keys) or
field lookup (jdm:value). The optimization for keys and field lookup
could be done independently. Lets consider the option currently in the
wiki as option 1 (space). Don't remove this option from the wiki so we
have a reference. The new options for keys and field lookup can be
added as option 2 and 3.

Option 1 (space): A tightly compact format that is optimized to save space.
Option 2 (keys): A data model optimized for accessing a list of keys.
Option 3 (lookup): A data model optimized for accessing a field in the object.

For option 2 (keys):
Consider the return value for jdm:keys: jdm:keys($o as object()) as xs:string*
I am not sure I fully understand what xs:string* represents. Is this a
sequence of string as in XQuery or an array in JSONiq or some other
structure. The most optimal way to return the keys would be to store
them in the same way they should be returned. This way you can do a
simple copy to produce the result without processing the result. In
this case, storing them as a sequence (or array) of string values
might be the best option. The values would then need to be a separate
sequence (or array) of typed values in the object data model. Pro:
easy keys function. Con: added a list of offsets for the keys.

For option 3 (lookup):
This option is independent of option 2. As Till suggested we can
implement this at a later date. We would need a method to improve the
lookup of a field. Option 1 and 2 requires a sequential search of the
keys and a string comparison at each field. The AsterixDB record data
model is a little more complex than I first thought. Take a look a
their record implementation: writing the record [1] (line 205 to 245
are interesting) and field look up [2] (line 277 to 344) . We only
need to consider the open part of the record. (The closed part can be
ignored.)

Comments? Also, what is the actual result of jdm:keys? What is the
requirement for the initial implementation?

[1] https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/builders/RecordBuilder.java
[2] https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/ARecordSerializerDeserializer.java

On Mon, May 9, 2016 at 8:35 AM, Riyafa Abdul Hameed
<ri...@cse.mrt.ac.lk> wrote:
> Hi,
>
> Is there any documentation I could go through to understand the AsterixDB
> Hash code implementation on the open fields? I am not sure I understand
> enough from the AsterixDB serialization [1] to define the data model for
> objects using it.
>
> Sorry about any confusion.
>
> [1]
> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>
> Thank you.
> Riyafa
>
> On 9 May 2016 at 20:16, Michael J. Carey <mj...@ics.uci.edu> wrote:
>
>> I think Preston's suggestion of looking at the AsterixDB implementation of
>> its binary data model is a good one, as it shares the efficient field
>> access by name requirements and several VXQuery folks are experts in its
>> details as well.  I believe it uses a sorted list instead of a hash table
>> internally, perhaps - slightly simpler for updates perhaps.
>> On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed" <ri...@cse.mrt.ac.lk>
>> wrote:
>>
>> Hi again,
>>
>> I have been thinking of Till's suggestion of using a dictionary, and I
>> think it would be a better alternative because then we wouldn't have to
>> process the valuetag of the value of a particular key before moving to the
>> next key. Hence it would be easy to implement jdm:keys method. Any
>> suggestions? Shall I updated the wiki and the doc based on this.
>>
>> Thank you.
>> Riyafa
>>
>> On 9 May 2016 at 19:21, Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>
>> wrote:
>>
>> > Hi Till,
>> >
>> > Currently I have suggested storing each key followed by the value. This
>> > uses less space and is quite similar to storing the offset of the values
>> > and the access is also linear to the number of keys.
>> >
>> > Thanks.
>> > Riyafa
>> >
>> > On 9 May 2016 at 18:54, Till Westmann <ti...@apache.org> wrote:
>> >
>> >> All of this looks pretty good!
>> >>
>> >> Wrt. the question of the dictionary for the fields, I think that we
>> should
>> >> consider the 2 ways that we can access an object:
>> >> 1. Either we get all keys (jdm:keys) or
>> >> 2. we get a value for a key (jdm:value).
>> >>
>> >> To get all the keys efficiently and to be able to skip huge nested
>> values
>> >> a
>> >> simple approach could be store a dictionary of the keys (in their
>> original
>> >> order) with pointers (offsets) to the values. That way we could get the
>> >> keys
>> >> quickly by scanning the dictionary and each value by scanning the
>> >> dictionary
>> >> + 1 hop to find the value. This certainly has the problem, that the
>> access
>> >> is linear in the number of the keys. But it is reasonably simple and it
>> >> would allow us to get a correct + testable implementation relatively
>> soon
>> >> and to have a baseline for a more optimized representation.
>> >>
>> >> Thoughts?
>> >>
>> >> Cheers,
>> >> Till
>> >>
>> >> [1]
>> >>
>>
>> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
>> >>
>> >> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>> >>
>> >> Hi Preston,
>> >>>
>> >>> I have edited the wiki[1] and the doc[2] based on the comments. Thank
>> you
>> >>> for the suggestions provided. I have removed the part that assigns an
>> id
>> >>> to
>> >>> the keys and instead suggested that the keys be stored in the order
>> they
>> >>> appear in the json object. I am not sure I understand the concept of
>> >>> hashcode--how to generate the hashcodes used for easy lookup?
>> >>>
>> >>>
>> >>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>> >>> [2]
>> >>>
>> >>>
>>
>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>> >>>
>> >>> Thank you again.
>> >>>
>> >>> Yours sincerely,
>> >>> Riyafa
>> >>>
>> >>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu>
>> wrote:
>> >>>
>> >>> Hi,
>> >>>>
>> >>>> I updated the wiki page according to Preston's comments along with the
>> >>>> json array example in [1].
>> >>>>
>> >>>> [1]
>> >>>>
>> >>>>
>>
>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>> >>>>
>> >>>> Thank you,
>> >>>> Christina
>> >>>>
>> >>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>> >>>>
>> >>>> Nice job guys. I can see you are picking up how to create a data
>> >>>>> model. I have limited my comments to the wiki [1] for now. At a high
>> >>>>> level, I was impressed with your detail and thoughtful layouts. It
>> >>>>> reminds me of the age old trade off: speed vs space. At this time,
>> >>>>> lets error on saving space. The data model should the as compact as
>> >>>>> possible.
>> >>>>>
>> >>>>> I also found the AsterixDB serialization [2] we can use as a
>> >>>>> reference. Even though the AsterixDB data model includes object
>> >>>>> length, I would leave that out since all the XQuery data models do
>> not
>> >>>>> include this property.
>> >>>>>
>> >>>>> Riyafa, take a look at the method AsterixDB uses for quick look ups
>> (a
>> >>>>> hash value for the name). Consider the pros and cons between your
>> >>>>> method and AsterixDB's method: a list hash value for name and a
>> sorted
>> >>>>> list of names.
>> >>>>>
>> >>>>> Also, take a look at my wiki comments. Its a great start!
>> >>>>>
>> >>>>> Mahalo,
>> >>>>> Preston
>> >>>>>
>> >>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>> >>>>> [2]
>> >>>>>
>> >>>>>
>>
>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>> >>>>>
>> >>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <
>> >>>>> cpavl001@ucr.edu>
>> >>>>> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>>
>> >>>>>> I, also, designed an example for the json array [1] given the
>> >>>>>> description I
>> >>>>>> wrote in the wiki page.
>> >>>>>>
>> >>>>>> [1]
>> >>>>>>
>> >>>>>>
>> >>>>>>
>>
>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>> >>>>>>
>> >>>>>> Thank you,
>> >>>>>> Christina
>> >>>>>>
>> >>>>>>
>> >>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>>
>> >>>>>>> I am attempting to create a doc on the JSONiq data model for
>> >>>>>>> objects[1]
>> >>>>>>> (It
>> >>>>>>> might be full of errors because I am doing the calculations
>> >>>>>>> manually).
>> >>>>>>>
>> >>>>>>> This is what I have come up on the data model for objects:
>> >>>>>>>
>> >>>>>>> The first byte would have the value tag, followed by the id (4
>> >>>>>>> bytes) of
>> >>>>>>> the object. Then 4 bytes to represent the size of the object. Then
>> >>>>>>> another
>> >>>>>>> four bytes to represent the number of key-value pairs. Next few
>> bytes
>> >>>>>>> represent the offsets of keys which follow (each offset is
>> >>>>>>> represented
>> >>>>>>> by
>> >>>>>>> 4
>> >>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be
>> a
>> >>>>>>> sorted
>> >>>>>>> list of ids for keys in alphabetical order. The following bytes
>> would
>> >>>>>>> represent the keys in the object.Each key is a StringPointable
>> >>>>>>> followed
>> >>>>>>> by
>> >>>>>>> the id of the key. Each object would have a sequence pointable: the
>> >>>>>>> following bytes would be the number of Items (items are the values
>> >>>>>>> for
>> >>>>>>> keys) in the sequence. The next bytes would be the offset of each
>> >>>>>>> item
>> >>>>>>> in
>> >>>>>>> the sequence. The last bytes would be the values for each key
>> >>>>>>> followed
>> >>>>>>> by
>> >>>>>>> the respective id of the key.
>> >>>>>>>
>> >>>>>>> Hope it makes sense.
>> >>>>>>>
>> >>>>>>> My problem is,
>> >>>>>>>
>> >>>>>>> I have not provided for the white spaces in the object. What can I
>> >>>>>>> use
>> >>>>>>> to
>> >>>>>>> represent the white spaces? I cannot use a text node because object
>> >>>>>>> is
>> >>>>>>> not
>> >>>>>>> a node.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> [1]
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>>
>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>> >>>>>>>
>> >>>>>>> Thank you.
>> >>>>>>>
>> >>>>>>> Yours sincerely,
>> >>>>>>> Riyafa
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>> We have two students working with us this summer through GSOC to
>> >>>>>>>
>> >>>>>>>> complete
>> >>>>>>>> JSONiq specification for arrays and objects. I think the first
>> step
>> >>>>>>>> is
>> >>>>>>>> to
>> >>>>>>>> define the data model used by JSONiq. The definition should be
>> >>>>>>>> defined
>> >>>>>>>> in
>> >>>>>>>> our wiki [1] before coding starts this summer. The wiki will allow
>> >>>>>>>> the
>> >>>>>>>> community to discuss the JSON data model implementation in
>> VXQuery.
>> >>>>>>>>
>> >>>>>>>> I updated the JSONiq wiki to help get the documentation started.
>> >>>>>>>> Please
>> >>>>>>>> fill in the JSON data model based on the examples seen on our
>> >>>>>>>> website
>> >>>>>>>> (links on the wiki page).
>> >>>>>>>>
>> >>>>>>>> Post here if you have any questions.
>> >>>>>>>>
>> >>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Riyafa Abdul Hameed
>> >>> Undergraduate, University of Moratuwa
>> >>>
>> >>> Email: riyafa.12@cse.mrt.ac.lk
>> >>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>> >>> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
>> >>> <http://twitter.com/Riyafa1>
>> >>>
>> >>
>> >
>> >
>> > --
>> > Riyafa Abdul Hameed
>> > Undergraduate, University of Moratuwa
>> >
>> > Email: riyafa.12@cse.mrt.ac.lk
>> > Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>> > <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
>> > <http://twitter.com/Riyafa1>
>> >
>>
>>
>>
>> --
>> Riyafa Abdul Hameed
>> Undergraduate, University of Moratuwa
>>
>> Email: riyafa.12@cse.mrt.ac.lk
>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
>> <http://twitter.com/Riyafa1>
>>
>
>
>
> --
> Riyafa Abdul Hameed
> Undergraduate, University of Moratuwa
>
> Email: riyafa.12@cse.mrt.ac.lk
> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
> <http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>.

Hi,

Is there any documentation I could go through to understand the AsterixDB
Hash code implementation on the open fields? I am not sure I understand
enough from the AsterixDB serialization [1] to define the data model for
objects using it.

Sorry about any confusion.

[1]
https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference

Thank you.
Riyafa

On 9 May 2016 at 20:16, Michael J. Carey <mj...@ics.uci.edu> wrote:

> I think Preston's suggestion of looking at the AsterixDB implementation of
> its binary data model is a good one, as it shares the efficient field
> access by name requirements and several VXQuery folks are experts in its
> details as well.  I believe it uses a sorted list instead of a hash table
> internally, perhaps - slightly simpler for updates perhaps.
> On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed" <ri...@cse.mrt.ac.lk>
> wrote:
>
> Hi again,
>
> I have been thinking of Till's suggestion of using a dictionary, and I
> think it would be a better alternative because then we wouldn't have to
> process the valuetag of the value of a particular key before moving to the
> next key. Hence it would be easy to implement jdm:keys method. Any
> suggestions? Shall I updated the wiki and the doc based on this.
>
> Thank you.
> Riyafa
>
> On 9 May 2016 at 19:21, Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>
> wrote:
>
> > Hi Till,
> >
> > Currently I have suggested storing each key followed by the value. This
> > uses less space and is quite similar to storing the offset of the values
> > and the access is also linear to the number of keys.
> >
> > Thanks.
> > Riyafa
> >
> > On 9 May 2016 at 18:54, Till Westmann <ti...@apache.org> wrote:
> >
> >> All of this looks pretty good!
> >>
> >> Wrt. the question of the dictionary for the fields, I think that we
> should
> >> consider the 2 ways that we can access an object:
> >> 1. Either we get all keys (jdm:keys) or
> >> 2. we get a value for a key (jdm:value).
> >>
> >> To get all the keys efficiently and to be able to skip huge nested
> values
> >> a
> >> simple approach could be store a dictionary of the keys (in their
> original
> >> order) with pointers (offsets) to the values. That way we could get the
> >> keys
> >> quickly by scanning the dictionary and each value by scanning the
> >> dictionary
> >> + 1 hop to find the value. This certainly has the problem, that the
> access
> >> is linear in the number of the keys. But it is reasonably simple and it
> >> would allow us to get a correct + testable implementation relatively
> soon
> >> and to have a baseline for a more optimized representation.
> >>
> >> Thoughts?
> >>
> >> Cheers,
> >> Till
> >>
> >> [1]
> >>
>
> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
> >>
> >> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
> >>
> >> Hi Preston,
> >>>
> >>> I have edited the wiki[1] and the doc[2] based on the comments. Thank
> you
> >>> for the suggestions provided. I have removed the part that assigns an
> id
> >>> to
> >>> the keys and instead suggested that the keys be stored in the order
> they
> >>> appear in the json object. I am not sure I understand the concept of
> >>> hashcode--how to generate the hashcodes used for easy lookup?
> >>>
> >>>
> >>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
> >>> [2]
> >>>
> >>>
>
> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
> >>>
> >>> Thank you again.
> >>>
> >>> Yours sincerely,
> >>> Riyafa
> >>>
> >>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu>
> wrote:
> >>>
> >>> Hi,
> >>>>
> >>>> I updated the wiki page according to Preston's comments along with the
> >>>> json array example in [1].
> >>>>
> >>>> [1]
> >>>>
> >>>>
>
> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
> >>>>
> >>>> Thank you,
> >>>> Christina
> >>>>
> >>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
> >>>>
> >>>> Nice job guys. I can see you are picking up how to create a data
> >>>>> model. I have limited my comments to the wiki [1] for now. At a high
> >>>>> level, I was impressed with your detail and thoughtful layouts. It
> >>>>> reminds me of the age old trade off: speed vs space. At this time,
> >>>>> lets error on saving space. The data model should the as compact as
> >>>>> possible.
> >>>>>
> >>>>> I also found the AsterixDB serialization [2] we can use as a
> >>>>> reference. Even though the AsterixDB data model includes object
> >>>>> length, I would leave that out since all the XQuery data models do
> not
> >>>>> include this property.
> >>>>>
> >>>>> Riyafa, take a look at the method AsterixDB uses for quick look ups
> (a
> >>>>> hash value for the name). Consider the pros and cons between your
> >>>>> method and AsterixDB's method: a list hash value for name and a
> sorted
> >>>>> list of names.
> >>>>>
> >>>>> Also, take a look at my wiki comments. Its a great start!
> >>>>>
> >>>>> Mahalo,
> >>>>> Preston
> >>>>>
> >>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
> >>>>> [2]
> >>>>>
> >>>>>
>
> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
> >>>>>
> >>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <
> >>>>> cpavl001@ucr.edu>
> >>>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>>
> >>>>>> I, also, designed an example for the json array [1] given the
> >>>>>> description I
> >>>>>> wrote in the wiki page.
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>>
> >>>>>>
>
> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
> >>>>>>
> >>>>>> Thank you,
> >>>>>> Christina
> >>>>>>
> >>>>>>
> >>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>>
> >>>>>>> I am attempting to create a doc on the JSONiq data model for
> >>>>>>> objects[1]
> >>>>>>> (It
> >>>>>>> might be full of errors because I am doing the calculations
> >>>>>>> manually).
> >>>>>>>
> >>>>>>> This is what I have come up on the data model for objects:
> >>>>>>>
> >>>>>>> The first byte would have the value tag, followed by the id (4
> >>>>>>> bytes) of
> >>>>>>> the object. Then 4 bytes to represent the size of the object. Then
> >>>>>>> another
> >>>>>>> four bytes to represent the number of key-value pairs. Next few
> bytes
> >>>>>>> represent the offsets of keys which follow (each offset is
> >>>>>>> represented
> >>>>>>> by
> >>>>>>> 4
> >>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be
> a
> >>>>>>> sorted
> >>>>>>> list of ids for keys in alphabetical order. The following bytes
> would
> >>>>>>> represent the keys in the object.Each key is a StringPointable
> >>>>>>> followed
> >>>>>>> by
> >>>>>>> the id of the key. Each object would have a sequence pointable: the
> >>>>>>> following bytes would be the number of Items (items are the values
> >>>>>>> for
> >>>>>>> keys) in the sequence. The next bytes would be the offset of each
> >>>>>>> item
> >>>>>>> in
> >>>>>>> the sequence. The last bytes would be the values for each key
> >>>>>>> followed
> >>>>>>> by
> >>>>>>> the respective id of the key.
> >>>>>>>
> >>>>>>> Hope it makes sense.
> >>>>>>>
> >>>>>>> My problem is,
> >>>>>>>
> >>>>>>> I have not provided for the white spaces in the object. What can I
> >>>>>>> use
> >>>>>>> to
> >>>>>>> represent the white spaces? I cannot use a text node because object
> >>>>>>> is
> >>>>>>> not
> >>>>>>> a node.
> >>>>>>>
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
>
> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
> >>>>>>>
> >>>>>>> Thank you.
> >>>>>>>
> >>>>>>> Yours sincerely,
> >>>>>>> Riyafa
> >>>>>>>
> >>>>>>>
> >>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> We have two students working with us this summer through GSOC to
> >>>>>>>
> >>>>>>>> complete
> >>>>>>>> JSONiq specification for arrays and objects. I think the first
> step
> >>>>>>>> is
> >>>>>>>> to
> >>>>>>>> define the data model used by JSONiq. The definition should be
> >>>>>>>> defined
> >>>>>>>> in
> >>>>>>>> our wiki [1] before coding starts this summer. The wiki will allow
> >>>>>>>> the
> >>>>>>>> community to discuss the JSON data model implementation in
> VXQuery.
> >>>>>>>>
> >>>>>>>> I updated the JSONiq wiki to help get the documentation started.
> >>>>>>>> Please
> >>>>>>>> fill in the JSON data model based on the examples seen on our
> >>>>>>>> website
> >>>>>>>> (links on the wiki page).
> >>>>>>>>
> >>>>>>>> Post here if you have any questions.
> >>>>>>>>
> >>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>>
> >>> --
> >>> Riyafa Abdul Hameed
> >>> Undergraduate, University of Moratuwa
> >>>
> >>> Email: riyafa.12@cse.mrt.ac.lk
> >>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
> >>> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
> >>> <http://twitter.com/Riyafa1>
> >>>
> >>
> >
> >
> > --
> > Riyafa Abdul Hameed
> > Undergraduate, University of Moratuwa
> >
> > Email: riyafa.12@cse.mrt.ac.lk
> > Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
> > <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
> > <http://twitter.com/Riyafa1>
> >
>
>
>
> --
> Riyafa Abdul Hameed
> Undergraduate, University of Moratuwa
>
> Email: riyafa.12@cse.mrt.ac.lk
> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
> <http://twitter.com/Riyafa1>
>



-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: riyafa.12@cse.mrt.ac.lk
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by "Michael J. Carey" <mj...@ics.uci.edu>.

I think Preston's suggestion of looking at the AsterixDB implementation of
its binary data model is a good one, as it shares the efficient field
access by name requirements and several VXQuery folks are experts in its
details as well.  I believe it uses a sorted list instead of a hash table
internally, perhaps - slightly simpler for updates perhaps.
On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed" <ri...@cse.mrt.ac.lk>
wrote:

Hi again,

I have been thinking of Till's suggestion of using a dictionary, and I
think it would be a better alternative because then we wouldn't have to
process the valuetag of the value of a particular key before moving to the
next key. Hence it would be easy to implement jdm:keys method. Any
suggestions? Shall I updated the wiki and the doc based on this.

Thank you.
Riyafa

On 9 May 2016 at 19:21, Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk> wrote:

> Hi Till,
>
> Currently I have suggested storing each key followed by the value. This
> uses less space and is quite similar to storing the offset of the values
> and the access is also linear to the number of keys.
>
> Thanks.
> Riyafa
>
> On 9 May 2016 at 18:54, Till Westmann <ti...@apache.org> wrote:
>
>> All of this looks pretty good!
>>
>> Wrt. the question of the dictionary for the fields, I think that we
should
>> consider the 2 ways that we can access an object:
>> 1. Either we get all keys (jdm:keys) or
>> 2. we get a value for a key (jdm:value).
>>
>> To get all the keys efficiently and to be able to skip huge nested values
>> a
>> simple approach could be store a dictionary of the keys (in their
original
>> order) with pointers (offsets) to the values. That way we could get the
>> keys
>> quickly by scanning the dictionary and each value by scanning the
>> dictionary
>> + 1 hop to find the value. This certainly has the problem, that the
access
>> is linear in the number of the keys. But it is reasonably simple and it
>> would allow us to get a correct + testable implementation relatively soon
>> and to have a baseline for a more optimized representation.
>>
>> Thoughts?
>>
>> Cheers,
>> Till
>>
>> [1]
>>
http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
>>
>> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>>
>> Hi Preston,
>>>
>>> I have edited the wiki[1] and the doc[2] based on the comments. Thank
you
>>> for the suggestions provided. I have removed the part that assigns an id
>>> to
>>> the keys and instead suggested that the keys be stored in the order they
>>> appear in the json object. I am not sure I understand the concept of
>>> hashcode--how to generate the hashcodes used for easy lookup?
>>>
>>>
>>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>> [2]
>>>
>>>
https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>
>>> Thank you again.
>>>
>>> Yours sincerely,
>>> Riyafa
>>>
>>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu> wrote:
>>>
>>> Hi,
>>>>
>>>> I updated the wiki page according to Preston's comments along with the
>>>> json array example in [1].
>>>>
>>>> [1]
>>>>
>>>>
https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>
>>>> Thank you,
>>>> Christina
>>>>
>>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>>>
>>>> Nice job guys. I can see you are picking up how to create a data
>>>>> model. I have limited my comments to the wiki [1] for now. At a high
>>>>> level, I was impressed with your detail and thoughtful layouts. It
>>>>> reminds me of the age old trade off: speed vs space. At this time,
>>>>> lets error on saving space. The data model should the as compact as
>>>>> possible.
>>>>>
>>>>> I also found the AsterixDB serialization [2] we can use as a
>>>>> reference. Even though the AsterixDB data model includes object
>>>>> length, I would leave that out since all the XQuery data models do not
>>>>> include this property.
>>>>>
>>>>> Riyafa, take a look at the method AsterixDB uses for quick look ups (a
>>>>> hash value for the name). Consider the pros and cons between your
>>>>> method and AsterixDB's method: a list hash value for name and a sorted
>>>>> list of names.
>>>>>
>>>>> Also, take a look at my wiki comments. Its a great start!
>>>>>
>>>>> Mahalo,
>>>>> Preston
>>>>>
>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>> [2]
>>>>>
>>>>>
https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>>>
>>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <
>>>>> cpavl001@ucr.edu>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> I, also, designed an example for the json array [1] given the
>>>>>> description I
>>>>>> wrote in the wiki page.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>>
https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>>
>>>>>> Thank you,
>>>>>> Christina
>>>>>>
>>>>>>
>>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>>>
>>>>>> Hi,
>>>>>>>
>>>>>>> I am attempting to create a doc on the JSONiq data model for
>>>>>>> objects[1]
>>>>>>> (It
>>>>>>> might be full of errors because I am doing the calculations
>>>>>>> manually).
>>>>>>>
>>>>>>> This is what I have come up on the data model for objects:
>>>>>>>
>>>>>>> The first byte would have the value tag, followed by the id (4
>>>>>>> bytes) of
>>>>>>> the object. Then 4 bytes to represent the size of the object. Then
>>>>>>> another
>>>>>>> four bytes to represent the number of key-value pairs. Next few
bytes
>>>>>>> represent the offsets of keys which follow (each offset is
>>>>>>> represented
>>>>>>> by
>>>>>>> 4
>>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>>>>>>> sorted
>>>>>>> list of ids for keys in alphabetical order. The following bytes
would
>>>>>>> represent the keys in the object.Each key is a StringPointable
>>>>>>> followed
>>>>>>> by
>>>>>>> the id of the key. Each object would have a sequence pointable: the
>>>>>>> following bytes would be the number of Items (items are the values
>>>>>>> for
>>>>>>> keys) in the sequence. The next bytes would be the offset of each
>>>>>>> item
>>>>>>> in
>>>>>>> the sequence. The last bytes would be the values for each key
>>>>>>> followed
>>>>>>> by
>>>>>>> the respective id of the key.
>>>>>>>
>>>>>>> Hope it makes sense.
>>>>>>>
>>>>>>> My problem is,
>>>>>>>
>>>>>>> I have not provided for the white spaces in the object. What can I
>>>>>>> use
>>>>>>> to
>>>>>>> represent the white spaces? I cannot use a text node because object
>>>>>>> is
>>>>>>> not
>>>>>>> a node.
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Yours sincerely,
>>>>>>> Riyafa
>>>>>>>
>>>>>>>
>>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> We have two students working with us this summer through GSOC to
>>>>>>>
>>>>>>>> complete
>>>>>>>> JSONiq specification for arrays and objects. I think the first step
>>>>>>>> is
>>>>>>>> to
>>>>>>>> define the data model used by JSONiq. The definition should be
>>>>>>>> defined
>>>>>>>> in
>>>>>>>> our wiki [1] before coding starts this summer. The wiki will allow
>>>>>>>> the
>>>>>>>> community to discuss the JSON data model implementation in VXQuery.
>>>>>>>>
>>>>>>>> I updated the JSONiq wiki to help get the documentation started.
>>>>>>>> Please
>>>>>>>> fill in the JSON data model based on the examples seen on our
>>>>>>>> website
>>>>>>>> (links on the wiki page).
>>>>>>>>
>>>>>>>> Post here if you have any questions.
>>>>>>>>
>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>
>>> --
>>> Riyafa Abdul Hameed
>>> Undergraduate, University of Moratuwa
>>>
>>> Email: riyafa.12@cse.mrt.ac.lk
>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
>>> <http://twitter.com/Riyafa1>
>>>
>>
>
>
> --
> Riyafa Abdul Hameed
> Undergraduate, University of Moratuwa
>
> Email: riyafa.12@cse.mrt.ac.lk
> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
> <http://twitter.com/Riyafa1>
>



--
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: riyafa.12@cse.mrt.ac.lk
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>.

Hi again,

I have been thinking of Till's suggestion of using a dictionary, and I
think it would be a better alternative because then we wouldn't have to
process the valuetag of the value of a particular key before moving to the
next key. Hence it would be easy to implement jdm:keys method. Any
suggestions? Shall I updated the wiki and the doc based on this.

Thank you.
Riyafa

On 9 May 2016 at 19:21, Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk> wrote:

> Hi Till,
>
> Currently I have suggested storing each key followed by the value. This
> uses less space and is quite similar to storing the offset of the values
> and the access is also linear to the number of keys.
>
> Thanks.
> Riyafa
>
> On 9 May 2016 at 18:54, Till Westmann <ti...@apache.org> wrote:
>
>> All of this looks pretty good!
>>
>> Wrt. the question of the dictionary for the fields, I think that we should
>> consider the 2 ways that we can access an object:
>> 1. Either we get all keys (jdm:keys) or
>> 2. we get a value for a key (jdm:value).
>>
>> To get all the keys efficiently and to be able to skip huge nested values
>> a
>> simple approach could be store a dictionary of the keys (in their original
>> order) with pointers (offsets) to the values. That way we could get the
>> keys
>> quickly by scanning the dictionary and each value by scanning the
>> dictionary
>> + 1 hop to find the value. This certainly has the problem, that the access
>> is linear in the number of the keys. But it is reasonably simple and it
>> would allow us to get a correct + testable implementation relatively soon
>> and to have a baseline for a more optimized representation.
>>
>> Thoughts?
>>
>> Cheers,
>> Till
>>
>> [1]
>> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
>>
>> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>>
>> Hi Preston,
>>>
>>> I have edited the wiki[1] and the doc[2] based on the comments. Thank you
>>> for the suggestions provided. I have removed the part that assigns an id
>>> to
>>> the keys and instead suggested that the keys be stored in the order they
>>> appear in the json object. I am not sure I understand the concept of
>>> hashcode--how to generate the hashcodes used for easy lookup?
>>>
>>>
>>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>> [2]
>>>
>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>
>>> Thank you again.
>>>
>>> Yours sincerely,
>>> Riyafa
>>>
>>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu> wrote:
>>>
>>> Hi,
>>>>
>>>> I updated the wiki page according to Preston's comments along with the
>>>> json array example in [1].
>>>>
>>>> [1]
>>>>
>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>
>>>> Thank you,
>>>> Christina
>>>>
>>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>>>
>>>> Nice job guys. I can see you are picking up how to create a data
>>>>> model. I have limited my comments to the wiki [1] for now. At a high
>>>>> level, I was impressed with your detail and thoughtful layouts. It
>>>>> reminds me of the age old trade off: speed vs space. At this time,
>>>>> lets error on saving space. The data model should the as compact as
>>>>> possible.
>>>>>
>>>>> I also found the AsterixDB serialization [2] we can use as a
>>>>> reference. Even though the AsterixDB data model includes object
>>>>> length, I would leave that out since all the XQuery data models do not
>>>>> include this property.
>>>>>
>>>>> Riyafa, take a look at the method AsterixDB uses for quick look ups (a
>>>>> hash value for the name). Consider the pros and cons between your
>>>>> method and AsterixDB's method: a list hash value for name and a sorted
>>>>> list of names.
>>>>>
>>>>> Also, take a look at my wiki comments. Its a great start!
>>>>>
>>>>> Mahalo,
>>>>> Preston
>>>>>
>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>> [2]
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>>>
>>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <
>>>>> cpavl001@ucr.edu>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> I, also, designed an example for the json array [1] given the
>>>>>> description I
>>>>>> wrote in the wiki page.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>>
>>>>>> Thank you,
>>>>>> Christina
>>>>>>
>>>>>>
>>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>>>
>>>>>> Hi,
>>>>>>>
>>>>>>> I am attempting to create a doc on the JSONiq data model for
>>>>>>> objects[1]
>>>>>>> (It
>>>>>>> might be full of errors because I am doing the calculations
>>>>>>> manually).
>>>>>>>
>>>>>>> This is what I have come up on the data model for objects:
>>>>>>>
>>>>>>> The first byte would have the value tag, followed by the id (4
>>>>>>> bytes) of
>>>>>>> the object. Then 4 bytes to represent the size of the object. Then
>>>>>>> another
>>>>>>> four bytes to represent the number of key-value pairs. Next few bytes
>>>>>>> represent the offsets of keys which follow (each offset is
>>>>>>> represented
>>>>>>> by
>>>>>>> 4
>>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>>>>>>> sorted
>>>>>>> list of ids for keys in alphabetical order. The following bytes would
>>>>>>> represent the keys in the object.Each key is a StringPointable
>>>>>>> followed
>>>>>>> by
>>>>>>> the id of the key. Each object would have a sequence pointable: the
>>>>>>> following bytes would be the number of Items (items are the values
>>>>>>> for
>>>>>>> keys) in the sequence. The next bytes would be the offset of each
>>>>>>> item
>>>>>>> in
>>>>>>> the sequence. The last bytes would be the values for each key
>>>>>>> followed
>>>>>>> by
>>>>>>> the respective id of the key.
>>>>>>>
>>>>>>> Hope it makes sense.
>>>>>>>
>>>>>>> My problem is,
>>>>>>>
>>>>>>> I have not provided for the white spaces in the object. What can I
>>>>>>> use
>>>>>>> to
>>>>>>> represent the white spaces? I cannot use a text node because object
>>>>>>> is
>>>>>>> not
>>>>>>> a node.
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Yours sincerely,
>>>>>>> Riyafa
>>>>>>>
>>>>>>>
>>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> We have two students working with us this summer through GSOC to
>>>>>>>
>>>>>>>> complete
>>>>>>>> JSONiq specification for arrays and objects. I think the first step
>>>>>>>> is
>>>>>>>> to
>>>>>>>> define the data model used by JSONiq. The definition should be
>>>>>>>> defined
>>>>>>>> in
>>>>>>>> our wiki [1] before coding starts this summer. The wiki will allow
>>>>>>>> the
>>>>>>>> community to discuss the JSON data model implementation in VXQuery.
>>>>>>>>
>>>>>>>> I updated the JSONiq wiki to help get the documentation started.
>>>>>>>> Please
>>>>>>>> fill in the JSON data model based on the examples seen on our
>>>>>>>> website
>>>>>>>> (links on the wiki page).
>>>>>>>>
>>>>>>>> Post here if you have any questions.
>>>>>>>>
>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>
>>> --
>>> Riyafa Abdul Hameed
>>> Undergraduate, University of Moratuwa
>>>
>>> Email: riyafa.12@cse.mrt.ac.lk
>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>>> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
>>> <http://twitter.com/Riyafa1>
>>>
>>
>
>
> --
> Riyafa Abdul Hameed
> Undergraduate, University of Moratuwa
>
> Email: riyafa.12@cse.mrt.ac.lk
> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
> <http://twitter.com/Riyafa1>
>



-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: riyafa.12@cse.mrt.ac.lk
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>.

Hi Till,

Currently I have suggested storing each key followed by the value. This
uses less space and is quite similar to storing the offset of the values
and the access is also linear to the number of keys.

Thanks.
Riyafa

On 9 May 2016 at 18:54, Till Westmann <ti...@apache.org> wrote:

> All of this looks pretty good!
>
> Wrt. the question of the dictionary for the fields, I think that we should
> consider the 2 ways that we can access an object:
> 1. Either we get all keys (jdm:keys) or
> 2. we get a value for a key (jdm:value).
>
> To get all the keys efficiently and to be able to skip huge nested values a
> simple approach could be store a dictionary of the keys (in their original
> order) with pointers (offsets) to the values. That way we could get the
> keys
> quickly by scanning the dictionary and each value by scanning the
> dictionary
> + 1 hop to find the value. This certainly has the problem, that the access
> is linear in the number of the keys. But it is reasonably simple and it
> would allow us to get a correct + testable implementation relatively soon
> and to have a baseline for a more optimized representation.
>
> Thoughts?
>
> Cheers,
> Till
>
> [1]
> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
>
> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:
>
> Hi Preston,
>>
>> I have edited the wiki[1] and the doc[2] based on the comments. Thank you
>> for the suggestions provided. I have removed the part that assigns an id
>> to
>> the keys and instead suggested that the keys be stored in the order they
>> appear in the json object. I am not sure I understand the concept of
>> hashcode--how to generate the hashcodes used for easy lookup?
>>
>>
>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>> [2]
>>
>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>
>> Thank you again.
>>
>> Yours sincerely,
>> Riyafa
>>
>> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu> wrote:
>>
>> Hi,
>>>
>>> I updated the wiki page according to Preston's comments along with the
>>> json array example in [1].
>>>
>>> [1]
>>>
>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>
>>> Thank you,
>>> Christina
>>>
>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>>
>>> Nice job guys. I can see you are picking up how to create a data
>>>> model. I have limited my comments to the wiki [1] for now. At a high
>>>> level, I was impressed with your detail and thoughtful layouts. It
>>>> reminds me of the age old trade off: speed vs space. At this time,
>>>> lets error on saving space. The data model should the as compact as
>>>> possible.
>>>>
>>>> I also found the AsterixDB serialization [2] we can use as a
>>>> reference. Even though the AsterixDB data model includes object
>>>> length, I would leave that out since all the XQuery data models do not
>>>> include this property.
>>>>
>>>> Riyafa, take a look at the method AsterixDB uses for quick look ups (a
>>>> hash value for the name). Consider the pros and cons between your
>>>> method and AsterixDB's method: a list hash value for name and a sorted
>>>> list of names.
>>>>
>>>> Also, take a look at my wiki comments. Its a great start!
>>>>
>>>> Mahalo,
>>>> Preston
>>>>
>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>> [2]
>>>>
>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>>
>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <cpavl001@ucr.edu
>>>> >
>>>> wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I, also, designed an example for the json array [1] given the
>>>>> description I
>>>>> wrote in the wiki page.
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>>
>>>>> Thank you,
>>>>> Christina
>>>>>
>>>>>
>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> I am attempting to create a doc on the JSONiq data model for
>>>>>> objects[1]
>>>>>> (It
>>>>>> might be full of errors because I am doing the calculations manually).
>>>>>>
>>>>>> This is what I have come up on the data model for objects:
>>>>>>
>>>>>> The first byte would have the value tag, followed by the id (4 bytes)
>>>>>> of
>>>>>> the object. Then 4 bytes to represent the size of the object. Then
>>>>>> another
>>>>>> four bytes to represent the number of key-value pairs. Next few bytes
>>>>>> represent the offsets of keys which follow (each offset is represented
>>>>>> by
>>>>>> 4
>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>>>>>> sorted
>>>>>> list of ids for keys in alphabetical order. The following bytes would
>>>>>> represent the keys in the object.Each key is a StringPointable
>>>>>> followed
>>>>>> by
>>>>>> the id of the key. Each object would have a sequence pointable: the
>>>>>> following bytes would be the number of Items (items are the values for
>>>>>> keys) in the sequence. The next bytes would be the offset of each item
>>>>>> in
>>>>>> the sequence. The last bytes would be the values for each key followed
>>>>>> by
>>>>>> the respective id of the key.
>>>>>>
>>>>>> Hope it makes sense.
>>>>>>
>>>>>> My problem is,
>>>>>>
>>>>>> I have not provided for the white spaces in the object. What can I use
>>>>>> to
>>>>>> represent the white spaces? I cannot use a text node because object is
>>>>>> not
>>>>>> a node.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Yours sincerely,
>>>>>> Riyafa
>>>>>>
>>>>>>
>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> We have two students working with us this summer through GSOC to
>>>>>>
>>>>>>> complete
>>>>>>> JSONiq specification for arrays and objects. I think the first step
>>>>>>> is
>>>>>>> to
>>>>>>> define the data model used by JSONiq. The definition should be
>>>>>>> defined
>>>>>>> in
>>>>>>> our wiki [1] before coding starts this summer. The wiki will allow
>>>>>>> the
>>>>>>> community to discuss the JSON data model implementation in VXQuery.
>>>>>>>
>>>>>>> I updated the JSONiq wiki to help get the documentation started.
>>>>>>> Please
>>>>>>> fill in the JSON data model based on the examples seen on our website
>>>>>>> (links on the wiki page).
>>>>>>>
>>>>>>> Post here if you have any questions.
>>>>>>>
>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>
>>
>> --
>> Riyafa Abdul Hameed
>> Undergraduate, University of Moratuwa
>>
>> Email: riyafa.12@cse.mrt.ac.lk
>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
>> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
>> <http://twitter.com/Riyafa1>
>>
>


-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: riyafa.12@cse.mrt.ac.lk
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Till Westmann <ti...@apache.org>.

All of this looks pretty good!

Wrt. the question of the dictionary for the fields, I think that we 
should
consider the 2 ways that we can access an object:
1. Either we get all keys (jdm:keys) or
2. we get a value for a key (jdm:value).

To get all the keys efficiently and to be able to skip huge nested 
values a
simple approach could be store a dictionary of the keys (in their 
original
order) with pointers (offsets) to the values. That way we could get the 
keys
quickly by scanning the dictionary and each value by scanning the 
dictionary
+ 1 hop to find the value. This certainly has the problem, that the 
access
is linear in the number of the keys. But it is reasonably simple and it
would allow us to get a correct + testable implementation relatively 
soon
and to have a baseline for a more optimized representation.

Thoughts?

Cheers,
Till

[1] 
http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880

On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:

> Hi Preston,
>
> I have edited the wiki[1] and the doc[2] based on the comments. Thank 
> you
> for the suggestions provided. I have removed the part that assigns an 
> id to
> the keys and instead suggested that the keys be stored in the order 
> they
> appear in the json object. I am not sure I understand the concept of
> hashcode--how to generate the hashcodes used for easy lookup?
>
>
> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
> [2]
> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>
> Thank you again.
>
> Yours sincerely,
> Riyafa
>
> On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu> 
> wrote:
>
>> Hi,
>>
>> I updated the wiki page according to Preston's comments along with 
>> the
>> json array example in [1].
>>
>> [1]
>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>
>> Thank you,
>> Christina
>>
>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>
>>> Nice job guys. I can see you are picking up how to create a data
>>> model. I have limited my comments to the wiki [1] for now. At a high
>>> level, I was impressed with your detail and thoughtful layouts. It
>>> reminds me of the age old trade off: speed vs space. At this time,
>>> lets error on saving space. The data model should the as compact as
>>> possible.
>>>
>>> I also found the AsterixDB serialization [2] we can use as a
>>> reference. Even though the AsterixDB data model includes object
>>> length, I would leave that out since all the XQuery data models do 
>>> not
>>> include this property.
>>>
>>> Riyafa, take a look at the method AsterixDB uses for quick look ups 
>>> (a
>>> hash value for the name). Consider the pros and cons between your
>>> method and AsterixDB's method: a list hash value for name and a 
>>> sorted
>>> list of names.
>>>
>>> Also, take a look at my wiki comments. Its a great start!
>>>
>>> Mahalo,
>>> Preston
>>>
>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>> [2]
>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>>
>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou 
>>> <cp...@ucr.edu>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I, also, designed an example for the json array [1] given the
>>>> description I
>>>> wrote in the wiki page.
>>>>
>>>> [1]
>>>>
>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>>
>>>> Thank you,
>>>> Christina
>>>>
>>>>
>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am attempting to create a doc on the JSONiq data model for 
>>>>> objects[1]
>>>>> (It
>>>>> might be full of errors because I am doing the calculations 
>>>>> manually).
>>>>>
>>>>> This is what I have come up on the data model for objects:
>>>>>
>>>>> The first byte would have the value tag, followed by the id (4 
>>>>> bytes) of
>>>>> the object. Then 4 bytes to represent the size of the object. Then
>>>>> another
>>>>> four bytes to represent the number of key-value pairs. Next few 
>>>>> bytes
>>>>> represent the offsets of keys which follow (each offset is 
>>>>> represented
>>>>> by
>>>>> 4
>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be 
>>>>> a
>>>>> sorted
>>>>> list of ids for keys in alphabetical order. The following bytes 
>>>>> would
>>>>> represent the keys in the object.Each key is a StringPointable 
>>>>> followed
>>>>> by
>>>>> the id of the key. Each object would have a sequence pointable: 
>>>>> the
>>>>> following bytes would be the number of Items (items are the values 
>>>>> for
>>>>> keys) in the sequence. The next bytes would be the offset of each 
>>>>> item
>>>>> in
>>>>> the sequence. The last bytes would be the values for each key 
>>>>> followed
>>>>> by
>>>>> the respective id of the key.
>>>>>
>>>>> Hope it makes sense.
>>>>>
>>>>> My problem is,
>>>>>
>>>>> I have not provided for the white spaces in the object. What can I 
>>>>> use
>>>>> to
>>>>> represent the white spaces? I cannot use a text node because 
>>>>> object is
>>>>> not
>>>>> a node.
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Yours sincerely,
>>>>> Riyafa
>>>>>
>>>>>
>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> 
>>>>> wrote:
>>>>>
>>>>> We have two students working with us this summer through GSOC to
>>>>>> complete
>>>>>> JSONiq specification for arrays and objects. I think the first 
>>>>>> step is
>>>>>> to
>>>>>> define the data model used by JSONiq. The definition should be 
>>>>>> defined
>>>>>> in
>>>>>> our wiki [1] before coding starts this summer. The wiki will 
>>>>>> allow the
>>>>>> community to discuss the JSON data model implementation in 
>>>>>> VXQuery.
>>>>>>
>>>>>> I updated the JSONiq wiki to help get the documentation started. 
>>>>>> Please
>>>>>> fill in the JSON data model based on the examples seen on our 
>>>>>> website
>>>>>> (links on the wiki page).
>>>>>>
>>>>>> Post here if you have any questions.
>>>>>>
>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>>
>>>>>>
>>>>>
>>
>
>
> -- 
> Riyafa Abdul Hameed
> Undergraduate, University of Moratuwa
>
> Email: riyafa.12@cse.mrt.ac.lk
> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
> <http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Riyafa Abdul Hameed <ri...@cse.mrt.ac.lk>.

Hi Preston,

I have edited the wiki[1] and the doc[2] based on the comments. Thank you
for the suggestions provided. I have removed the part that assigns an id to
the keys and instead suggested that the keys be stored in the order they
appear in the json object. I am not sure I understand the concept of
hashcode--how to generate the hashcodes used for easy lookup?


[1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
[2]
https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0

Thank you again.

Yours sincerely,
Riyafa

On 9 May 2016 at 01:23, christina pavlopoulou <cp...@ucr.edu> wrote:

> Hi,
>
> I updated the wiki page according to Preston's comments along with the
> json array example in [1].
>
> [1]
> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>
> Thank you,
> Christina
>
> On 5/8/2016 9:43 AM, Preston Carman wrote:
>
>> Nice job guys. I can see you are picking up how to create a data
>> model. I have limited my comments to the wiki [1] for now. At a high
>> level, I was impressed with your detail and thoughtful layouts. It
>> reminds me of the age old trade off: speed vs space. At this time,
>> lets error on saving space. The data model should the as compact as
>> possible.
>>
>> I also found the AsterixDB serialization [2] we can use as a
>> reference. Even though the AsterixDB data model includes object
>> length, I would leave that out since all the XQuery data models do not
>> include this property.
>>
>> Riyafa, take a look at the method AsterixDB uses for quick look ups (a
>> hash value for the name). Consider the pros and cons between your
>> method and AsterixDB's method: a list hash value for name and a sorted
>> list of names.
>>
>> Also, take a look at my wiki comments. Its a great start!
>>
>> Mahalo,
>> Preston
>>
>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>> [2]
>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>
>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <cp...@ucr.edu>
>> wrote:
>>
>>> Hi,
>>>
>>> I, also, designed an example for the json array [1] given the
>>> description I
>>> wrote in the wiki page.
>>>
>>> [1]
>>>
>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>
>>> Thank you,
>>> Christina
>>>
>>>
>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>
>>>> Hi,
>>>>
>>>> I am attempting to create a doc on the JSONiq data model for objects[1]
>>>> (It
>>>> might be full of errors because I am doing the calculations manually).
>>>>
>>>> This is what I have come up on the data model for objects:
>>>>
>>>> The first byte would have the value tag, followed by the id (4 bytes) of
>>>> the object. Then 4 bytes to represent the size of the object. Then
>>>> another
>>>> four bytes to represent the number of key-value pairs. Next few bytes
>>>> represent the offsets of keys which follow (each offset is represented
>>>> by
>>>> 4
>>>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>>>> sorted
>>>> list of ids for keys in alphabetical order. The following bytes would
>>>> represent the keys in the object.Each key is a StringPointable followed
>>>> by
>>>> the id of the key. Each object would have a sequence pointable: the
>>>> following bytes would be the number of Items (items are the values for
>>>> keys) in the sequence. The next bytes would be the offset of each item
>>>> in
>>>> the sequence. The last bytes would be the values for each key followed
>>>> by
>>>> the respective id of the key.
>>>>
>>>> Hope it makes sense.
>>>>
>>>> My problem is,
>>>>
>>>> I have not provided for the white spaces in the object. What can I use
>>>> to
>>>> represent the white spaces? I cannot use a text node because object is
>>>> not
>>>> a node.
>>>>
>>>>
>>>> [1]
>>>>
>>>>
>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>
>>>> Thank you.
>>>>
>>>> Yours sincerely,
>>>> Riyafa
>>>>
>>>>
>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> wrote:
>>>>
>>>> We have two students working with us this summer through GSOC to
>>>>> complete
>>>>> JSONiq specification for arrays and objects. I think the first step is
>>>>> to
>>>>> define the data model used by JSONiq. The definition should be defined
>>>>> in
>>>>> our wiki [1] before coding starts this summer. The wiki will allow the
>>>>> community to discuss the JSON data model implementation in VXQuery.
>>>>>
>>>>> I updated the JSONiq wiki to help get the documentation started. Please
>>>>> fill in the JSON data model based on the examples seen on our website
>>>>> (links on the wiki page).
>>>>>
>>>>> Post here if you have any questions.
>>>>>
>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>
>>>>>
>>>>
>


-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: riyafa.12@cse.mrt.ac.lk
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: JSONiq data model

Posted by Preston Carman <pr...@gmail.com>.

I second option b. 

Sent from my iPhone

> On May 9, 2016, at 5:13 AM, Till Westmann <ti...@apache.org> wrote:
> 
> I really like that we have these examples! However, it would be nice to make
> them discoverable by other community members without going through the
> mailing list. So we could either
> a) put them on the Wiki or
> b) decide to put them on the website (along with the XDM examples) when we
>   are done.
> I'd prefer b) and if we agree on that, I think that I'd be good if Riyafa
> and Christina filed issus for themselves to add the content of the Google
> doc to the website (so we won't forget).
> 
> Thoughts?
> 
> Cheers,
> Till
> 
>> On 8 May 2016, at 12:53, christina pavlopoulou wrote:
>> 
>> Hi,
>> 
>> I updated the wiki page according to Preston's comments along with the json array example in [1].
>> 
>> [1] https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>> 
>> Thank you,
>> Christina
>> 
>>> On 5/8/2016 9:43 AM, Preston Carman wrote:
>>> Nice job guys. I can see you are picking up how to create a data
>>> model. I have limited my comments to the wiki [1] for now. At a high
>>> level, I was impressed with your detail and thoughtful layouts. It
>>> reminds me of the age old trade off: speed vs space. At this time,
>>> lets error on saving space. The data model should the as compact as
>>> possible.
>>> 
>>> I also found the AsterixDB serialization [2] we can use as a
>>> reference. Even though the AsterixDB data model includes object
>>> length, I would leave that out since all the XQuery data models do not
>>> include this property.
>>> 
>>> Riyafa, take a look at the method AsterixDB uses for quick look ups (a
>>> hash value for the name). Consider the pros and cons between your
>>> method and AsterixDB's method: a list hash value for name and a sorted
>>> list of names.
>>> 
>>> Also, take a look at my wiki comments. Its a great start!
>>> 
>>> Mahalo,
>>> Preston
>>> 
>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>> [2] https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>> 
>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <cp...@ucr.edu> wrote:
>>>> Hi,
>>>> 
>>>> I, also, designed an example for the json array [1] given the description I
>>>> wrote in the wiki page.
>>>> 
>>>> [1]
>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>> 
>>>> Thank you,
>>>> Christina
>>>> 
>>>> 
>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>>> Hi,
>>>>> 
>>>>> I am attempting to create a doc on the JSONiq data model for objects[1]
>>>>> (It
>>>>> might be full of errors because I am doing the calculations manually).
>>>>> 
>>>>> This is what I have come up on the data model for objects:
>>>>> 
>>>>> The first byte would have the value tag, followed by the id (4 bytes) of
>>>>> the object. Then 4 bytes to represent the size of the object. Then another
>>>>> four bytes to represent the number of key-value pairs. Next few bytes
>>>>> represent the offsets of keys which follow (each offset is represented by
>>>>> 4
>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>>>>> sorted
>>>>> list of ids for keys in alphabetical order. The following bytes would
>>>>> represent the keys in the object.Each key is a StringPointable followed by
>>>>> the id of the key. Each object would have a sequence pointable: the
>>>>> following bytes would be the number of Items (items are the values for
>>>>> keys) in the sequence. The next bytes would be the offset of each item in
>>>>> the sequence. The last bytes would be the values for each key followed by
>>>>> the respective id of the key.
>>>>> 
>>>>> Hope it makes sense.
>>>>> 
>>>>> My problem is,
>>>>> 
>>>>> I have not provided for the white spaces in the object. What can I use to
>>>>> represent the white spaces? I cannot use a text node because object is not
>>>>> a node.
>>>>> 
>>>>> 
>>>>> [1]
>>>>> 
>>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>> Yours sincerely,
>>>>> Riyafa
>>>>> 
>>>>> 
>>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> wrote:
>>>>>> 
>>>>>> We have two students working with us this summer through GSOC to complete
>>>>>> JSONiq specification for arrays and objects. I think the first step is to
>>>>>> define the data model used by JSONiq. The definition should be defined in
>>>>>> our wiki [1] before coding starts this summer. The wiki will allow the
>>>>>> community to discuss the JSON data model implementation in VXQuery.
>>>>>> 
>>>>>> I updated the JSONiq wiki to help get the documentation started. Please
>>>>>> fill in the JSON data model based on the examples seen on our website
>>>>>> (links on the wiki page).
>>>>>> 
>>>>>> Post here if you have any questions.
>>>>>> 
>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>

Re: JSONiq data model

Posted by Till Westmann <ti...@apache.org>.

I really like that we have these examples! However, it would be nice to 
make
them discoverable by other community members without going through the
mailing list. So we could either
a) put them on the Wiki or
b) decide to put them on the website (along with the XDM examples) when 
we
    are done.
I'd prefer b) and if we agree on that, I think that I'd be good if 
Riyafa
and Christina filed issus for themselves to add the content of the 
Google
doc to the website (so we won't forget).

Thoughts?

Cheers,
Till

On 8 May 2016, at 12:53, christina pavlopoulou wrote:

> Hi,
>
> I updated the wiki page according to Preston's comments along with the 
> json array example in [1].
>
> [1] 
> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>
> Thank you,
> Christina
>
> On 5/8/2016 9:43 AM, Preston Carman wrote:
>> Nice job guys. I can see you are picking up how to create a data
>> model. I have limited my comments to the wiki [1] for now. At a high
>> level, I was impressed with your detail and thoughtful layouts. It
>> reminds me of the age old trade off: speed vs space. At this time,
>> lets error on saving space. The data model should the as compact as
>> possible.
>>
>> I also found the AsterixDB serialization [2] we can use as a
>> reference. Even though the AsterixDB data model includes object
>> length, I would leave that out since all the XQuery data models do 
>> not
>> include this property.
>>
>> Riyafa, take a look at the method AsterixDB uses for quick look ups 
>> (a
>> hash value for the name). Consider the pros and cons between your
>> method and AsterixDB's method: a list hash value for name and a 
>> sorted
>> list of names.
>>
>> Also, take a look at my wiki comments. Its a great start!
>>
>> Mahalo,
>> Preston
>>
>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>> [2] 
>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>>
>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou 
>> <cp...@ucr.edu> wrote:
>>> Hi,
>>>
>>> I, also, designed an example for the json array [1] given the 
>>> description I
>>> wrote in the wiki page.
>>>
>>> [1]
>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>>
>>> Thank you,
>>> Christina
>>>
>>>
>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>> Hi,
>>>>
>>>> I am attempting to create a doc on the JSONiq data model for 
>>>> objects[1]
>>>> (It
>>>> might be full of errors because I am doing the calculations 
>>>> manually).
>>>>
>>>> This is what I have come up on the data model for objects:
>>>>
>>>> The first byte would have the value tag, followed by the id (4 
>>>> bytes) of
>>>> the object. Then 4 bytes to represent the size of the object. Then 
>>>> another
>>>> four bytes to represent the number of key-value pairs. Next few 
>>>> bytes
>>>> represent the offsets of keys which follow (each offset is 
>>>> represented by
>>>> 4
>>>> bytes). Ids would be assigned to the keys. Next few bytes would be 
>>>> a
>>>> sorted
>>>> list of ids for keys in alphabetical order. The following bytes 
>>>> would
>>>> represent the keys in the object.Each key is a StringPointable 
>>>> followed by
>>>> the id of the key. Each object would have a sequence pointable: the
>>>> following bytes would be the number of Items (items are the values 
>>>> for
>>>> keys) in the sequence. The next bytes would be the offset of each 
>>>> item in
>>>> the sequence. The last bytes would be the values for each key 
>>>> followed by
>>>> the respective id of the key.
>>>>
>>>> Hope it makes sense.
>>>>
>>>> My problem is,
>>>>
>>>> I have not provided for the white spaces in the object. What can I 
>>>> use to
>>>> represent the white spaces? I cannot use a text node because object 
>>>> is not
>>>> a node.
>>>>
>>>>
>>>> [1]
>>>>
>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>>
>>>> Thank you.
>>>>
>>>> Yours sincerely,
>>>> Riyafa
>>>>
>>>>
>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> 
>>>> wrote:
>>>>
>>>>> We have two students working with us this summer through GSOC to 
>>>>> complete
>>>>> JSONiq specification for arrays and objects. I think the first 
>>>>> step is to
>>>>> define the data model used by JSONiq. The definition should be 
>>>>> defined in
>>>>> our wiki [1] before coding starts this summer. The wiki will allow 
>>>>> the
>>>>> community to discuss the JSON data model implementation in 
>>>>> VXQuery.
>>>>>
>>>>> I updated the JSONiq wiki to help get the documentation started. 
>>>>> Please
>>>>> fill in the JSON data model based on the examples seen on our 
>>>>> website
>>>>> (links on the wiki page).
>>>>>
>>>>> Post here if you have any questions.
>>>>>
>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>>
>>>>

Re: JSONiq data model

Posted by christina pavlopoulou <cp...@ucr.edu>.

Hi,

I updated the wiki page according to Preston's comments along with the 
json array example in [1].

[1] 
https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit

Thank you,
Christina

On 5/8/2016 9:43 AM, Preston Carman wrote:
> Nice job guys. I can see you are picking up how to create a data
> model. I have limited my comments to the wiki [1] for now. At a high
> level, I was impressed with your detail and thoughtful layouts. It
> reminds me of the age old trade off: speed vs space. At this time,
> lets error on saving space. The data model should the as compact as
> possible.
>
> I also found the AsterixDB serialization [2] we can use as a
> reference. Even though the AsterixDB data model includes object
> length, I would leave that out since all the XQuery data models do not
> include this property.
>
> Riyafa, take a look at the method AsterixDB uses for quick look ups (a
> hash value for the name). Consider the pros and cons between your
> method and AsterixDB's method: a list hash value for name and a sorted
> list of names.
>
> Also, take a look at my wiki comments. Its a great start!
>
> Mahalo,
> Preston
>
> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
> [2] https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>
> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <cp...@ucr.edu> wrote:
>> Hi,
>>
>> I, also, designed an example for the json array [1] given the description I
>> wrote in the wiki page.
>>
>> [1]
>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>
>> Thank you,
>> Christina
>>
>>
>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>> Hi,
>>>
>>> I am attempting to create a doc on the JSONiq data model for objects[1]
>>> (It
>>> might be full of errors because I am doing the calculations manually).
>>>
>>> This is what I have come up on the data model for objects:
>>>
>>> The first byte would have the value tag, followed by the id (4 bytes) of
>>> the object. Then 4 bytes to represent the size of the object. Then another
>>> four bytes to represent the number of key-value pairs. Next few bytes
>>> represent the offsets of keys which follow (each offset is represented by
>>> 4
>>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>>> sorted
>>> list of ids for keys in alphabetical order. The following bytes would
>>> represent the keys in the object.Each key is a StringPointable followed by
>>> the id of the key. Each object would have a sequence pointable: the
>>> following bytes would be the number of Items (items are the values for
>>> keys) in the sequence. The next bytes would be the offset of each item in
>>> the sequence. The last bytes would be the values for each key followed by
>>> the respective id of the key.
>>>
>>> Hope it makes sense.
>>>
>>> My problem is,
>>>
>>> I have not provided for the white spaces in the object. What can I use to
>>> represent the white spaces? I cannot use a text node because object is not
>>> a node.
>>>
>>>
>>> [1]
>>>
>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>
>>> Thank you.
>>>
>>> Yours sincerely,
>>> Riyafa
>>>
>>>
>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> wrote:
>>>
>>>> We have two students working with us this summer through GSOC to complete
>>>> JSONiq specification for arrays and objects. I think the first step is to
>>>> define the data model used by JSONiq. The definition should be defined in
>>>> our wiki [1] before coding starts this summer. The wiki will allow the
>>>> community to discuss the JSON data model implementation in VXQuery.
>>>>
>>>> I updated the JSONiq wiki to help get the documentation started. Please
>>>> fill in the JSON data model based on the examples seen on our website
>>>> (links on the wiki page).
>>>>
>>>> Post here if you have any questions.
>>>>
>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>
>>>

Re: JSONiq data model

Posted by Preston Carman <pr...@gmail.com>.

The comments have been removed. The new/current format reflects my comments. 

Sent from my iPhone

> On May 9, 2016, at 5:06 AM, Till Westmann <ti...@apache.org> wrote:
> 
> Hi Preston,
> 
> I don’t seem to be able to see your comments on the Wiki page [1].
> Where do I need to look?
> 
> Cheers,
> Till
> 
>> On 8 May 2016, at 9:43, Preston Carman wrote:
>> 
>> Nice job guys. I can see you are picking up how to create a data
>> model. I have limited my comments to the wiki [1] for now. At a high
>> level, I was impressed with your detail and thoughtful layouts. It
>> reminds me of the age old trade off: speed vs space. At this time,
>> lets error on saving space. The data model should the as compact as
>> possible.
>> 
>> I also found the AsterixDB serialization [2] we can use as a
>> reference. Even though the AsterixDB data model includes object
>> length, I would leave that out since all the XQuery data models do not
>> include this property.
>> 
>> Riyafa, take a look at the method AsterixDB uses for quick look ups (a
>> hash value for the name). Consider the pros and cons between your
>> method and AsterixDB's method: a list hash value for name and a sorted
>> list of names.
>> 
>> Also, take a look at my wiki comments. Its a great start!
>> 
>> Mahalo,
>> Preston
>> 
>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>> [2] https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>> 
>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <cp...@ucr.edu> wrote:
>>> Hi,
>>> 
>>> I, also, designed an example for the json array [1] given the description I
>>> wrote in the wiki page.
>>> 
>>> [1]
>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>> 
>>> Thank you,
>>> Christina
>>> 
>>> 
>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am attempting to create a doc on the JSONiq data model for objects[1]
>>>> (It
>>>> might be full of errors because I am doing the calculations manually).
>>>> 
>>>> This is what I have come up on the data model for objects:
>>>> 
>>>> The first byte would have the value tag, followed by the id (4 bytes) of
>>>> the object. Then 4 bytes to represent the size of the object. Then another
>>>> four bytes to represent the number of key-value pairs. Next few bytes
>>>> represent the offsets of keys which follow (each offset is represented by
>>>> 4
>>>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>>>> sorted
>>>> list of ids for keys in alphabetical order. The following bytes would
>>>> represent the keys in the object.Each key is a StringPointable followed by
>>>> the id of the key. Each object would have a sequence pointable: the
>>>> following bytes would be the number of Items (items are the values for
>>>> keys) in the sequence. The next bytes would be the offset of each item in
>>>> the sequence. The last bytes would be the values for each key followed by
>>>> the respective id of the key.
>>>> 
>>>> Hope it makes sense.
>>>> 
>>>> My problem is,
>>>> 
>>>> I have not provided for the white spaces in the object. What can I use to
>>>> represent the white spaces? I cannot use a text node because object is not
>>>> a node.
>>>> 
>>>> 
>>>> [1]
>>>> 
>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>> 
>>>> Thank you.
>>>> 
>>>> Yours sincerely,
>>>> Riyafa
>>>> 
>>>> 
>>>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> wrote:
>>>>> 
>>>>> We have two students working with us this summer through GSOC to complete
>>>>> JSONiq specification for arrays and objects. I think the first step is to
>>>>> define the data model used by JSONiq. The definition should be defined in
>>>>> our wiki [1] before coding starts this summer. The wiki will allow the
>>>>> community to discuss the JSON data model implementation in VXQuery.
>>>>> 
>>>>> I updated the JSONiq wiki to help get the documentation started. Please
>>>>> fill in the JSON data model based on the examples seen on our website
>>>>> (links on the wiki page).
>>>>> 
>>>>> Post here if you have any questions.
>>>>> 
>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>

Re: JSONiq data model

Posted by Till Westmann <ti...@apache.org>.

Hi Preston,

I don’t seem to be able to see your comments on the Wiki page [1].
Where do I need to look?

Cheers,
Till

On 8 May 2016, at 9:43, Preston Carman wrote:

> Nice job guys. I can see you are picking up how to create a data
> model. I have limited my comments to the wiki [1] for now. At a high
> level, I was impressed with your detail and thoughtful layouts. It
> reminds me of the age old trade off: speed vs space. At this time,
> lets error on saving space. The data model should the as compact as
> possible.
>
> I also found the AsterixDB serialization [2] we can use as a
> reference. Even though the AsterixDB data model includes object
> length, I would leave that out since all the XQuery data models do not
> include this property.
>
> Riyafa, take a look at the method AsterixDB uses for quick look ups (a
> hash value for the name). Consider the pros and cons between your
> method and AsterixDB's method: a list hash value for name and a sorted
> list of names.
>
> Also, take a look at my wiki comments. Its a great start!
>
> Mahalo,
> Preston
>
> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
> [2] 
> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
>
> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou 
> <cp...@ucr.edu> wrote:
>> Hi,
>>
>> I, also, designed an example for the json array [1] given the 
>> description I
>> wrote in the wiki page.
>>
>> [1]
>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>>
>> Thank you,
>> Christina
>>
>>
>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>>
>>> Hi,
>>>
>>> I am attempting to create a doc on the JSONiq data model for 
>>> objects[1]
>>> (It
>>> might be full of errors because I am doing the calculations 
>>> manually).
>>>
>>> This is what I have come up on the data model for objects:
>>>
>>> The first byte would have the value tag, followed by the id (4 
>>> bytes) of
>>> the object. Then 4 bytes to represent the size of the object. Then 
>>> another
>>> four bytes to represent the number of key-value pairs. Next few 
>>> bytes
>>> represent the offsets of keys which follow (each offset is 
>>> represented by
>>> 4
>>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>>> sorted
>>> list of ids for keys in alphabetical order. The following bytes 
>>> would
>>> represent the keys in the object.Each key is a StringPointable 
>>> followed by
>>> the id of the key. Each object would have a sequence pointable: the
>>> following bytes would be the number of Items (items are the values 
>>> for
>>> keys) in the sequence. The next bytes would be the offset of each 
>>> item in
>>> the sequence. The last bytes would be the values for each key 
>>> followed by
>>> the respective id of the key.
>>>
>>> Hope it makes sense.
>>>
>>> My problem is,
>>>
>>> I have not provided for the white spaces in the object. What can I 
>>> use to
>>> represent the white spaces? I cannot use a text node because object 
>>> is not
>>> a node.
>>>
>>>
>>> [1]
>>>
>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>>
>>> Thank you.
>>>
>>> Yours sincerely,
>>> Riyafa
>>>
>>>
>>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> 
>>> wrote:
>>>
>>>> We have two students working with us this summer through GSOC to 
>>>> complete
>>>> JSONiq specification for arrays and objects. I think the first step 
>>>> is to
>>>> define the data model used by JSONiq. The definition should be 
>>>> defined in
>>>> our wiki [1] before coding starts this summer. The wiki will allow 
>>>> the
>>>> community to discuss the JSON data model implementation in VXQuery.
>>>>
>>>> I updated the JSONiq wiki to help get the documentation started. 
>>>> Please
>>>> fill in the JSON data model based on the examples seen on our 
>>>> website
>>>> (links on the wiki page).
>>>>
>>>> Post here if you have any questions.
>>>>
>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>>
>>>
>>>
>>

Re: JSONiq data model

Posted by Preston Carman <pr...@apache.org>.

Nice job guys. I can see you are picking up how to create a data
model. I have limited my comments to the wiki [1] for now. At a high
level, I was impressed with your detail and thoughtful layouts. It
reminds me of the age old trade off: speed vs space. At this time,
lets error on saving space. The data model should the as compact as
possible.

I also found the AsterixDB serialization [2] we can use as a
reference. Even though the AsterixDB data model includes object
length, I would leave that out since all the XQuery data models do not
include this property.

Riyafa, take a look at the method AsterixDB uses for quick look ups (a
hash value for the name). Consider the pros and cons between your
method and AsterixDB's method: a list hash value for name and a sorted
list of names.

Also, take a look at my wiki comments. Its a great start!

Mahalo,
Preston

[1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
[2] https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference

On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <cp...@ucr.edu> wrote:
> Hi,
>
> I, also, designed an example for the json array [1] given the description I
> wrote in the wiki page.
>
> [1]
> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
>
> Thank you,
> Christina
>
>
> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
>>
>> Hi,
>>
>> I am attempting to create a doc on the JSONiq data model for objects[1]
>> (It
>> might be full of errors because I am doing the calculations manually).
>>
>> This is what I have come up on the data model for objects:
>>
>> The first byte would have the value tag, followed by the id (4 bytes) of
>> the object. Then 4 bytes to represent the size of the object. Then another
>> four bytes to represent the number of key-value pairs. Next few bytes
>> represent the offsets of keys which follow (each offset is represented by
>> 4
>> bytes). Ids would be assigned to the keys. Next few bytes would be a
>> sorted
>> list of ids for keys in alphabetical order. The following bytes would
>> represent the keys in the object.Each key is a StringPointable followed by
>> the id of the key. Each object would have a sequence pointable: the
>> following bytes would be the number of Items (items are the values for
>> keys) in the sequence. The next bytes would be the offset of each item in
>> the sequence. The last bytes would be the values for each key followed by
>> the respective id of the key.
>>
>> Hope it makes sense.
>>
>> My problem is,
>>
>> I have not provided for the white spaces in the object. What can I use to
>> represent the white spaces? I cannot use a text node because object is not
>> a node.
>>
>>
>> [1]
>>
>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>>
>> Thank you.
>>
>> Yours sincerely,
>> Riyafa
>>
>>
>> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> wrote:
>>
>>> We have two students working with us this summer through GSOC to complete
>>> JSONiq specification for arrays and objects. I think the first step is to
>>> define the data model used by JSONiq. The definition should be defined in
>>> our wiki [1] before coding starts this summer. The wiki will allow the
>>> community to discuss the JSON data model implementation in VXQuery.
>>>
>>> I updated the JSONiq wiki to help get the documentation started. Please
>>> fill in the JSON data model based on the examples seen on our website
>>> (links on the wiki page).
>>>
>>> Post here if you have any questions.
>>>
>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>>
>>
>>
>

Re: JSONiq data model

Posted by christina pavlopoulou <cp...@ucr.edu>.

Hi,

I, also, designed an example for the json array [1] given the 
description I wrote in the wiki page.

[1] 
https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit

Thank you,
Christina

On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
> Hi,
>
> I am attempting to create a doc on the JSONiq data model for objects[1] (It
> might be full of errors because I am doing the calculations manually).
>
> This is what I have come up on the data model for objects:
>
> The first byte would have the value tag, followed by the id (4 bytes) of
> the object. Then 4 bytes to represent the size of the object. Then another
> four bytes to represent the number of key-value pairs. Next few bytes
> represent the offsets of keys which follow (each offset is represented by 4
> bytes). Ids would be assigned to the keys. Next few bytes would be a sorted
> list of ids for keys in alphabetical order. The following bytes would
> represent the keys in the object.Each key is a StringPointable followed by
> the id of the key. Each object would have a sequence pointable: the
> following bytes would be the number of Items (items are the values for
> keys) in the sequence. The next bytes would be the offset of each item in
> the sequence. The last bytes would be the values for each key followed by
> the respective id of the key.
>
> Hope it makes sense.
>
> My problem is,
>
> I have not provided for the white spaces in the object. What can I use to
> represent the white spaces? I cannot use a text node because object is not
> a node.
>
>
> [1]
> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
>
> Thank you.
>
> Yours sincerely,
> Riyafa
>
>
> On 26 April 2016 at 10:29, Preston Carman <pr...@apache.org> wrote:
>
>> We have two students working with us this summer through GSOC to complete
>> JSONiq specification for arrays and objects. I think the first step is to
>> define the data model used by JSONiq. The definition should be defined in
>> our wiki [1] before coding starts this summer. The wiki will allow the
>> community to discuss the JSON data model implementation in VXQuery.
>>
>> I updated the JSONiq wiki to help get the documentation started. Please
>> fill in the JSON data model based on the examples seen on our website
>> (links on the wiki page).
>>
>> Post here if you have any questions.
>>
>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
>>
>
>