You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Mike Carey <dt...@gmail.com> on 2015/09/12 00:32:29 UTC

Re: ADM round-trip no longer possible?

I think we'd normally want the receiving client in the ADM case to be 
looping on the input stream doing "read next ADM instance" until the 
stream is exhausted - i.e., the norm should be that the result contains 
a sequence of such things for ADM (as opposed to a single document for 
JSON - I think we'd want both options in JSON perhaps - Till has an 
informed opinion on this too, I think).

On 8/20/15 12:59 AM, Chris Hillery wrote:
> The list wrapper was primarily introduced to make the JSON output more
> sensible. I think it's important in the JSON case, because otherwise a user
> would need to do some pre-parsing of the output to handle query results
> with more than one instance. If that's not an issue for ADM parsing, then I
> believe we could omit it when outputting ADM.
>
> Ceej
> aka Chris Hillery
>
> On Thu, Aug 20, 2015 at 12:51 AM, Mike Carey <dt...@gmail.com> wrote:
>
>> I was going to also say that other thing about the "i64" case.  :-)
>>
>> W.r.t. the list wrapper, I prefer (a) because I don't think we should have
>> the notion that the query output is a single data model instance - I kinda
>> think we should have a model where the inputs and outputs can br multiple
>> instances.  Otherwise we'll create a world that's potentially very
>> unfriendly to streaming, etc.  If someone wants to deal with single lists,
>> they can wrap the result into one.  (But they shouldn't do that if they
>> want to create a load-friendly result.)  Note that this is all consistent
>> with the persistence model we have, too - datasets are not really in the
>> data model of instances - they are containers for (often large) collections
>> of data model instances.
>>
>> Cheers,
>> Mike
>>
>>
>> On 8/19/15 6:47 PM, Till Westmann wrote:
>>
>>> I agree, that we don't need to print the 'd', but we probably also should
>>> accept (and ignore) it when parsing.
>>>
>>> Wrt the list wrapper, I think that we've added that to the HTTP API to
>>> ensure that we get a valid instance of the data model. So this would be an
>>> artifact of the HTTP interface and not part of the result. And I think that
>>> we shouldn't simply ignore it on bulkload, as somebody might actually want
>>> to load lists (even though I don't know what to load them into right now
>>> ...).
>>> I think that the 2 options on this are
>>> a) add an option to the HTTP interface to not wrap the output sequence in
>>> a list and
>>> b) add an option to the bulkload to ignore an enclosing list.
>>> Right now I prefer b).
>>>
>>> Thoughts?
>>>
>>> Cheers,
>>> Till
>>>
>>>
>>>
>>> On Aug 19, 2015, at 18:14, Ian Maxon <im...@uci.edu> wrote:
>>>> Interesting. I'm not sure I see the value in the suffix, as the default
>>>> is
>>>> double.
>>>>
>>>> While we are at it in fixing this, we should remove the outer list
>>>> wrapper
>>>> (or fix the ADM bulkload to accept that as input). Otherwise again, one
>>>> can't roundtrip.
>>>>
>>>> - Ian
>>>>
>>>> On Wed, Aug 19, 2015 at 5:53 PM, Taewoo Kim <wa...@gmail.com> wrote:
>>>>> In adm.grammar file, DOUBLE_LITERAL doesn't have information about "d"
>>>>> suffix unlike the other numeric cases. This means that, for the real
>>>>> DOUBLE
>>>>> values, the output of ADM shouldn't have "d" in it? The ADoublePrinter
>>>>> class is printing "d" as suffix when it generates a double output: ps
>>>>> .print(ADoubleSerializerDeserializer.getDouble(b, s + 1) + "d"); Which
>>>>> way
>>>>> is the correct way? We need to remove "d" suffix? The adm.grammar file
>>>>> assumes so.
>>>>>
>>>>>
>>>>> INT_LITERAL          = signOrNothing(), digitSequence()
>>>>>
>>>>> INT8_LITERAL         = token(INT_LITERAL), string(i8)
>>>>>
>>>>> INT16_LITERAL        = token(INT_LITERAL), string(i16)
>>>>>
>>>>> INT32_LITERAL        = token(INT_LITERAL), string(i32)
>>>>>
>>>>> INT64_LITERAL        = token(INT_LITERAL), string(i64)
>>>>>
>>>>>
>>>>> @EXPONENT            = caseInsensitiveChar(e), signOrNothing(),
>>>>> digitSequence()
>>>>>
>>>>>
>>>>> DOUBLE_LITERAL = signOrNothing(), char(.), digitSequence()
>>>>>
>>>>> DOUBLE_LITERAL = signOrNothing(), digitSequence(), char(.),
>>>>> digitSequence()
>>>>>
>>>>> DOUBLE_LITERAL = signOrNothing(), digitSequence(), char(.),
>>>>> digitSequence(), token(@EXPONENT)
>>>>>
>>>>> DOUBLE_LITERAL = signOrNothing(), digitSequence(), token(@EXPONENT)
>>>>>
>>>>>
>>>>> FLOAT_LITERAL = token(DOUBLE_LITERAL), caseInsensitiveChar(f)
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>> Taewoo
>>>>>
>>>>> On Wed, Aug 19, 2015 at 3:15 PM, Mike Carey <dt...@gmail.com> wrote:
>>>>>> We definitely need this to work.  😃
>>>>>>
>>>>>>> On Aug 19, 2015 9:36 AM, "Ian Maxon" <im...@uci.edu> wrote:
>>>>>>>
>>>>>>> Yes I am just trying to bulk load from an ADM file that's the result
>>>>>>> of
>>>>>>> querying. It's annoying to have to mangle it with sed or similar, as
>>>>>>>
>>>>>> its
>>>>>> about 4GB.
>>>>>>> On Aug 19, 2015 2:41 AM, "Chris Hillery" <ch...@hillery.land>
>>>>>>>
>>>>>> wrote:
>>>>>> I noticed that the output wasn't re-parseable as input last week when
>>>>>>>> coming up with the proposed JSON serialization. In particular, the
>>>>>>>>
>>>>>>> numeric
>>>>>>>
>>>>>>>> suffixes like 0.0d, 15i8, 333333i32, and so on didn't parse as AQL
>>>>>>>> (although interestingly 32.5f does). I wasn't aware that it used to
>>>>>>>>
>>>>>>> work,
>>>>>>> though. It's an odd disconnect between ADM and AQL, at the least. It
>>>>>>> sounds
>>>>>>>
>>>>>>>> like you're seeing the same issue when parsing as actual ADM?
>>>>>>>>
>>>>>>>> Ceej
>>>>>>>> aka Chris Hillery
>>>>>>>>
>>>>>>>> On Wed, Aug 19, 2015 at 1:43 AM, Ian Maxon <im...@uci.edu> wrote:
>>>>>>>>> Hi,
>>>>>>>>> Has ADM output from Asterix stopped being round-trippable? I'm
>>>>>>>>>
>>>>>>>> trying
>>>>>> to
>>>>>>>> load a record I got from dumping from another instance, via the
>>>>>>>> REST
>>>>>> API,
>>>>>>>> requesting 'application-x-adm' in the accept header. First I had to
>>>>>>>> remove
>>>>>>>>
>>>>>>>>> the outer wrapper list that was added a while back, which I suppose
>>>>>>>>>
>>>>>>>> isn't
>>>>>>>> awful, but now it seems like I'm getting more subtle errors. For
>>>>>>>> example,
>>>>>>>> trying to load a record, and the parse fails once it sees a field
>>>>>>>> like
>>>>>>> this
>>>>>>>>> in a record :
>>>>>>>>>
>>>>>>>>> { ... ,  "Rank": 0.0d, .... }
>>>>>>>>>
>>>>>>>>> With:
>>>>>>>>>
>>>>>>>>> Parse error at (1, 4421) expecting: <DOUBLE_CONS> <DATE_CONS>
>>>>>>>>> <DATETIME_CONS> <DURATION_CONS> <DAY_TIME_DURATION_CONS>
>>>>>>>>> [AdmLexerException]
>>>>>>>>>
>>>>>>>>> Any ideas?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> - Ian
>>>>>>>>>