You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2018/04/06 10:27:00 UTC

[jira] [Commented] (SOLR-12094) JsonRecordReader ignores root record fields after the split point

    [ https://issues.apache.org/jira/browse/SOLR-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428179#comment-16428179 ] 

Dawid Weiss commented on SOLR-12094:
------------------------------------

I understand the concept of "streaming" imports, but this just seems wrong to me here. An analogy here would be XSLT or other technologies where the implementation permits efficient "streaming" mode in certain cases, unless the input makes it impossible. 

I perceive a similar situation here: the parser should be able to handle the input efficiently if possible, but should also give the possibility for processing any type of input, even such that cannot be processed without bookkeeping of some history. Sure, an abuse case of millions of split nodes awaiting a single attribute is possible, but even then it'd be simpler to just say "yeah, buffer up until you can emit the output" than modify the structure of such a json (write a converter so that the nested nodes are always placed at the end of the parent).

[~awislowski] do you think you'd be able to modify the patch so that it accepts an argument and switches between the 'strict streaming' mode and 'relaxed' mode? In 'strict streaming' mode there should be no buffering and the parser should complain with an exception if it encounters extra nodes after the split. In the 'relaxed mode' the parser should buffer up the information until it's complete and can be emitted.

> JsonRecordReader ignores root record fields after the split point
> -----------------------------------------------------------------
>
>                 Key: SOLR-12094
>                 URL: https://issues.apache.org/jira/browse/SOLR-12094
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: master (8.0)
>            Reporter: Przemysław Szeremiota
>            Priority: Major
>         Attachments: SOLR-12094.patch, SOLR-12094.patch, json-record-reader-bug.patch
>
>
> JsonRecordReader, when configured with other than top-level split, ignores all top-level JSON nodes after the split ends, for example:
> {code}
> {
>   "first": "John",
>   "last": "Doe",
>   "grade": 8,
>   "exams": [
>     {
>         "subject": "Maths",
>         "test": "term1",
>         "marks": 90
>     },
>     {
>         "subject": "Biology",
>         "test": "term1",
>         "marks": 86
>     }
>   ],
>   "after": "456"
> }
> {code}
> Node "after" won't be visible in SolrInputDocument constructed from /update/json/docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org