You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eric Pugh <ep...@opensourceconnections.com> on 2020/07/24 16:59:00 UTC

Loading JSON docs into Solr with Streaming Expressions?

Hey all,   I wanted to load some JSON docs into Solr and as I load them, do some manipulations to the documents as they go in.   I looked at https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html <https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html>, however I also wanted to see if Streaming would help.

I’ve used the combination of cat and parseCSV streaming functions successfully to load data into Solr, so I looked a bit at what we could do with JSON source format.

I didn’t see an obvious path for taking a .json file and loading it, so I played around and made this JSON w/ Lines formatted file streaming expression: https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3 <https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3>

The expression looks like
commit(icecat,
  update(icecat,
    parseJSONL(
      cat('two_docs.jsonl')
    )
  )
)
I was curious what other folks have done?  I saw that there is a JSONTupleStream, but it didn’t quite seem to fit the need.

Eric

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Re: Loading JSON docs into Solr with Streaming Expressions?

Posted by Joel Bernstein <jo...@gmail.com>.
It's probably time to add JSON loading support to streaming
expressions, but nothing yet. This ticket is almost done and paves the way
for a suite of parseXYZ functions:

https://issues.apache.org/jira/browse/SOLR-14673



Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jul 24, 2020 at 1:00 PM Eric Pugh <ep...@opensourceconnections.com>
wrote:

> Hey all,   I wanted to load some JSON docs into Solr and as I load them,
> do some manipulations to the documents as they go in.   I looked at
> https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html
> <
> https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html>,
> however I also wanted to see if Streaming would help.
>
> I’ve used the combination of cat and parseCSV streaming functions
> successfully to load data into Solr, so I looked a bit at what we could do
> with JSON source format.
>
> I didn’t see an obvious path for taking a .json file and loading it, so I
> played around and made this JSON w/ Lines formatted file streaming
> expression:
> https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3 <
> https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3>
>
> The expression looks like
> commit(icecat,
>   update(icecat,
>     parseJSONL(
>       cat('two_docs.jsonl')
>     )
>   )
> )
> I was curious what other folks have done?  I saw that there is a
> JSONTupleStream, but it didn’t quite seem to fit the need.
>
> Eric
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>