You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eric Pugh <ep...@opensourceconnections.com> on 2020/07/24 16:59:00 UTC
Loading JSON docs into Solr with Streaming Expressions?
Hey all, I wanted to load some JSON docs into Solr and as I load them, do some manipulations to the documents as they go in. I looked at https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html <https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html>, however I also wanted to see if Streaming would help.
I’ve used the combination of cat and parseCSV streaming functions successfully to load data into Solr, so I looked a bit at what we could do with JSON source format.
I didn’t see an obvious path for taking a .json file and loading it, so I played around and made this JSON w/ Lines formatted file streaming expression: https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3 <https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3>
The expression looks like
commit(icecat,
update(icecat,
parseJSONL(
cat('two_docs.jsonl')
)
)
)
I was curious what other folks have done? I saw that there is a JSONTupleStream, but it didn’t quite seem to fit the need.
Eric
_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: Loading JSON docs into Solr with Streaming Expressions?
Posted by Joel Bernstein <jo...@gmail.com>.
It's probably time to add JSON loading support to streaming
expressions, but nothing yet. This ticket is almost done and paves the way
for a suite of parseXYZ functions:
https://issues.apache.org/jira/browse/SOLR-14673
Joel Bernstein
http://joelsolr.blogspot.com/
On Fri, Jul 24, 2020 at 1:00 PM Eric Pugh <ep...@opensourceconnections.com>
wrote:
> Hey all, I wanted to load some JSON docs into Solr and as I load them,
> do some manipulations to the documents as they go in. I looked at
> https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html
> <
> https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html>,
> however I also wanted to see if Streaming would help.
>
> I’ve used the combination of cat and parseCSV streaming functions
> successfully to load data into Solr, so I looked a bit at what we could do
> with JSON source format.
>
> I didn’t see an obvious path for taking a .json file and loading it, so I
> played around and made this JSON w/ Lines formatted file streaming
> expression:
> https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3 <
> https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3>
>
> The expression looks like
> commit(icecat,
> update(icecat,
> parseJSONL(
> cat('two_docs.jsonl')
> )
> )
> )
> I was curious what other folks have done? I saw that there is a
> JSONTupleStream, but it didn’t quite seem to fit the need.
>
> Eric
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>