You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2021/09/13 14:19:30 UTC
[GitHub] [couchdb-nano] glynnbird commented on issue #268: How to isolate/extract documents when using viewAsStream ?

glynnbird commented on issue #268:
URL: https://github.com/apache/couchdb-nano/issues/268#issuecomment-918242042


   This isn't really a Nano question, but more of a question of how to deal with Node.js streams. Here's some code that process a stream that you may modify to taste:
   
   ```js
   const Nano = require('.')
   const nano = Nano(process.env.COUCH_URL)
   const db = nano.db.use('cities')
   const stream = require('stream')
   
   const liner = function () {
     const liner = new stream.Transform({ objectMode: true })
     liner._transform = function (chunk, encoding, done) {
       let data = chunk.toString()
       if (this._lastLineData) {
         data = this._lastLineData + data
       }
       const lines = data.split('\n')
       this._lastLineData = lines.splice(lines.length - 1, 1)[0]
       for (const i in lines) {
         this.push(lines[i])
       }
       done()
     }
     liner._flush = function (done) {
       if (this._lastLineData) {
         this.push(this._lastLineData)
       }
       this._lastLineData = null
       done()
     }
     return liner
   }
   
   const objectifier = function () {
     const objectifier = new stream.Transform({ objectMode: true })
     objectifier._transform = function (chunk, encoding, done) {
       // i've got a line - let's see if it's JSON
       // first strip trailing comma
       chunk = chunk.replace(/,$/, '')
       try {
         const obj = JSON.parse(chunk)
         this.push(obj)
         console.log('object', obj)
       } catch (e) {
         // do nothing, not JSON object
       }
       done()
     }
     return objectifier
   }
   
   db.listAsStream({ include_docs: true, limit: 10 })
     .pipe(liner())
     .pipe(objectifier())
     .on('finished', () => { console.log('finished') })
     .on('error', console.error)
   
   ```
   
   The streamed output of Nano (in this case an `_all_docs` call) is sent to to stream Transformers in turn:
   
   - the first (`liner`) breaks the chunks into lines of output
   - the second (`objectifier`) takes each of these lines and checks if the trailing comma is removed, does it parse nicely as JSON - if it does we're good
   
   It takes advantage of the fact that the JSON that comes out of CouchDB is always formatted with one row of output on its own line. If CouchDB ever changes that, this technique will break.
   
   A better way, which I only remembered after coming up with ^^^ that, is to use a streaming JSON parser: see https://www.npmjs.com/package/JSONStream


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org