You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Christopher Schultz <ch...@christopherschultz.net> on 2020/09/17 15:48:28 UTC

Help using Noggit for streaming JSON data

All,

Is this an appropriate forum for asking questions about how to use
Noggit? The Github doesn't have any discussions available and filing an
"issue" to ask a question is kinda silly. I'm happy to be redirected to
the right place if this isn't appropriate.

I've been able to figure out most things in Noggit by reading the code,
but I have a new use-case where I expect that I'll have very large
values (base64-encoded binary) and I'd like to stream those rather than
calling parser.getString() and getting a potentially huge string coming
back. I'm streaming into a database so I never need the whole string in
one place at one time.

I was thinking something like this:

JSONParser p = ...;

int evt = p.nextEvent();
if(JSONParser.STRING == evt) {
  // Start streaming
  boolean eos = false;
  while(!eos) {
    char c = p.getChar();
    if(c == '"') {
      eos = true;
    } else {
      append to stream
    }
  }
}

But getChar() is not public. The only "documentation" I've really been
able to find for Noggit is this post from Yonic back in 2014:

http://yonik.com/noggit-json-parser/

It mostly says "Noggit is great!" and specifically mentions huge, long
strings but does not actually show any Java code to consume the JSON
data in any kind of streaming way.

The ObjectBuilder class is a great user of JSONParser, but it just
builds standard objects and would consume tons of memory in my case.

I know for sure that Solr consumes huge JSON documents and I'm assuming
that Noggit is being used in that situation, though I have not looked at
the code used to do that.

Any suggestions?

-chris

Re: Help using Noggit for streaming JSON data

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Yonic,

Thanks for the reply, and apologies for the long delay in this reply. Also apologies for top-posting, I’m writing from my phone. :(

Oh, of course... simply subclass the CharArr.

In my case, I should be able to immediately base64-decode the value (saves 1/4 in-memory representation) and, if I do everything correctly, may be able to stream directly to my database.

With a *very* complicated CharArr implementation of course :)

Thanks,
-chris

> On Sep 17, 2020, at 12:22, Yonik Seeley <ys...@gmail.com> wrote:
> 
> See this method:
> 
>  /** Reads a JSON string into the output, decoding any escaped characters.
> */
>  public void getString(CharArr output) throws IOException
> 
> And then the idea is to create a subclass of CharArr to incrementally
> handle the string that is written to it.
> You could overload write methods, or perhaps reserve() to flush/handle the
> buffer when it reaches a certain size.
> 
> -Yonik
> 
> 
>> On Thu, Sep 17, 2020 at 11:48 AM Christopher Schultz <
>> chris@christopherschultz.net> wrote:
>> 
>> All,
>> 
>> Is this an appropriate forum for asking questions about how to use
>> Noggit? The Github doesn't have any discussions available and filing an
>> "issue" to ask a question is kinda silly. I'm happy to be redirected to
>> the right place if this isn't appropriate.
>> 
>> I've been able to figure out most things in Noggit by reading the code,
>> but I have a new use-case where I expect that I'll have very large
>> values (base64-encoded binary) and I'd like to stream those rather than
>> calling parser.getString() and getting a potentially huge string coming
>> back. I'm streaming into a database so I never need the whole string in
>> one place at one time.
>> 
>> I was thinking something like this:
>> 
>> JSONParser p = ...;
>> 
>> int evt = p.nextEvent();
>> if(JSONParser.STRING == evt) {
>>  // Start streaming
>>  boolean eos = false;
>>  while(!eos) {
>>    char c = p.getChar();
>>    if(c == '"') {
>>      eos = true;
>>    } else {
>>      append to stream
>>    }
>>  }
>> }
>> 
>> But getChar() is not public. The only "documentation" I've really been
>> able to find for Noggit is this post from Yonic back in 2014:
>> 
>> http://yonik.com/noggit-json-parser/
>> 
>> It mostly says "Noggit is great!" and specifically mentions huge, long
>> strings but does not actually show any Java code to consume the JSON
>> data in any kind of streaming way.
>> 
>> The ObjectBuilder class is a great user of JSONParser, but it just
>> builds standard objects and would consume tons of memory in my case.
>> 
>> I know for sure that Solr consumes huge JSON documents and I'm assuming
>> that Noggit is being used in that situation, though I have not looked at
>> the code used to do that.
>> 
>> Any suggestions?
>> 
>> -chris
>> 

Re: Help using Noggit for streaming JSON data

Posted by Yonik Seeley <ys...@gmail.com>.
See this method:

  /** Reads a JSON string into the output, decoding any escaped characters.
*/
  public void getString(CharArr output) throws IOException

And then the idea is to create a subclass of CharArr to incrementally
handle the string that is written to it.
You could overload write methods, or perhaps reserve() to flush/handle the
buffer when it reaches a certain size.

-Yonik


On Thu, Sep 17, 2020 at 11:48 AM Christopher Schultz <
chris@christopherschultz.net> wrote:

> All,
>
> Is this an appropriate forum for asking questions about how to use
> Noggit? The Github doesn't have any discussions available and filing an
> "issue" to ask a question is kinda silly. I'm happy to be redirected to
> the right place if this isn't appropriate.
>
> I've been able to figure out most things in Noggit by reading the code,
> but I have a new use-case where I expect that I'll have very large
> values (base64-encoded binary) and I'd like to stream those rather than
> calling parser.getString() and getting a potentially huge string coming
> back. I'm streaming into a database so I never need the whole string in
> one place at one time.
>
> I was thinking something like this:
>
> JSONParser p = ...;
>
> int evt = p.nextEvent();
> if(JSONParser.STRING == evt) {
>   // Start streaming
>   boolean eos = false;
>   while(!eos) {
>     char c = p.getChar();
>     if(c == '"') {
>       eos = true;
>     } else {
>       append to stream
>     }
>   }
> }
>
> But getChar() is not public. The only "documentation" I've really been
> able to find for Noggit is this post from Yonic back in 2014:
>
> http://yonik.com/noggit-json-parser/
>
> It mostly says "Noggit is great!" and specifically mentions huge, long
> strings but does not actually show any Java code to consume the JSON
> data in any kind of streaming way.
>
> The ObjectBuilder class is a great user of JSONParser, but it just
> builds standard objects and would consume tons of memory in my case.
>
> I know for sure that Solr consumes huge JSON documents and I'm assuming
> that Noggit is being used in that situation, though I have not looked at
> the code used to do that.
>
> Any suggestions?
>
> -chris
>