You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by marc nicole <mk...@gmail.com> on 2023/01/29 19:26:42 UTC

How to index a csv dataset into Solr using SolrJ

Hi guys,

I can't find a reference on how to index a dataset.csv file into Solr using
SolrJ.
https://solr.apache.org/guide/6_6/using-solrj.html

Thanks.

Re: How to index a csv dataset into Solr using SolrJ

Posted by Vincenzo D'Amore <v....@gmail.com>.
If you need to upload csv files directly into solr (and they have a
reasonable amount of rows, i.e. not too much to lead an OOM in Solr)
Well, I'm used to loading them directly with a curl from a bash script.
It's something like this:

curl "http://solr.server:8983/solr/collection/update?commit=true"
--data-binary @file.csv -H 'Content-type:application/csv'

You must have the name of the fields in your solr collection as the first
row of CSV file, it should be something like that:

"id","code","description","field1","field2","field3"
1,"code1","description 1","xxxx","yyy","zzz"
2,"code2","description 2","20","129","M"



On Fri, Feb 10, 2023 at 9:28 PM Chris Hostetter <ho...@fucit.org>
wrote:

> : @Chris can you provide a sample Java code using
> ContentStreamUpdateRequest
> : class?
>
> I mean ... it's a SolrRequest like any other...
>
> 1) create an instante
>
> 2) add the File you want to add (or pass in some other ContentStream --
> maybe StringStream if your CSV is already in memory)
>
> 3) process() it using your SolrClient
>
>
> As with most classes in solrj, looking at the the test cases is probably
> the best way to see "sample" code.  (allthough some of them are explictly
> convoluted to test edge cases in the underlying implementation.)
>
>
> This is probably the simplest one...
>
> hossman@slate:~/lucene/solr [j11] [branch_9_1] $ grep -A5 'new
> ContentStreamUpdateRequest'
> solr/solrj/src/test/org/apache/solr/client/solrj/request/json/JsonQueryRequestIntegrationTest.java
>     ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update");
>     up.setParam("collection", COLLECTION_NAME);
>     up.addFile(getFile("solrj/books.csv"), "application/csv");
>     up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>     UpdateResponse updateResponse = up.process(cluster.getSolrClient());
>     assertEquals(0, updateResponse.getStatus());
>
>
>
>
>
> :
> : Le ven. 10 févr. 2023 à 19:22, Chris Hostetter <ho...@fucit.org>
> a
> : écrit :
> :
> : >
> : > : what is a common use case then if it is not the csv type?
> : > : how to index massively data into Solr using SolrJ
> : > : You can't just read line by line each dataset you want to index.
> : >
> : > There are lots of usecases for using SolrJ that involve programaticlly
> : > generating the SolrInputDocuments you wnat to index in solr --
> frequently
> : > after ready from some normalized /authoritative data store.
> : >
> : > If you already have data "on disk" in a format that solr can parse
> (csv,
> : > solr's xml, a PDF file you want Solr's extraction module to parse,
> etc...)
> : > then that's what the ContentStreamUpdateRequest is for...
> : >
> : >
> : >
> https://solr.apache.org/docs/9_1_0/solrj/org/apache/solr/client/solrj/request/ContentStreamUpdateRequest.html
> : >
> : > :
> : > : Le lun. 30 janv. 2023 à 14:11, Jan Høydahl <ja...@cominvent.com> a
> : > écrit :
> : > :
> : > : > It's not a common use case for SolrJ to post plain CSV content to
> Solr.
> : > : > SolrJ is used to push SolrInputDocument objects. Maybe there's a
> way
> : > to do
> : > : > it by using some Generic request type and overriding content type..
> : > Can you
> : > : > explain more what you app will do, where that CSV file comes from
> in
> : > the
> : > : > first place and why you'd want to use SolrJ to move it to Solr,
> rather
> : > than
> : > : > curl or some other http client lib?
> : > : >
> : > : > Jan
> : > : >
> : > : > > 29. jan. 2023 kl. 20:44 skrev marc nicole <mk...@gmail.com>:
> : > : > >
> : > : > > The Java code should perform the post. Any piece of code to show
> to
> : > : > better
> : > : > > explain this?
> : > : > >
> : > : > > thanks
> : > : > >
> : > : > > Le dim. 29 janv. 2023 à 20:29, Jan Høydahl <
> jan.asf@cominvent.com> a
> : > : > écrit :
> : > : > >
> : > : > >> Read csv in your app, create a Solr doc from each line and
> ingest to
> : > : > Solr
> : > : > >> in fitting batches. You can use a csv library or just parse each
> : > line
> : > : > >> yourself if the format is fixed.
> : > : > >>
> : > : > >> If you need to post csv directly to Solr you’d use a plain http
> post
> : > : > with
> : > : > >> content-type csv, but in most cases your app would do that.
> : > : > >>
> : > : > >> Jan Høydahl
> : > : > >>
> : > : > >>> 29. jan. 2023 kl. 20:21 skrev marc nicole <mk1853387@gmail.com
> >:
> : > : > >>>
> : > : > >>> Hi guys,
> : > : > >>>
> : > : > >>> I can't find a reference on how to index a dataset.csv file
> into
> : > Solr
> : > : > >> using
> : > : > >>> SolrJ.
> : > : > >>> https://solr.apache.org/guide/6_6/using-solrj.html
> : > : > >>>
> : > : > >>> Thanks.
> : > : > >>
> : > : >
> : > : >
> : > :
> : >
> : > -Hoss
> : > http://www.lucidworks.com/
> :
>
> -Hoss
> http://www.lucidworks.com/



-- 
Vincenzo D'Amore

Re: How to index a csv dataset into Solr using SolrJ

Posted by Chris Hostetter <ho...@fucit.org>.
: @Chris can you provide a sample Java code using ContentStreamUpdateRequest
: class?

I mean ... it's a SolrRequest like any other...

1) create an instante

2) add the File you want to add (or pass in some other ContentStream -- 
maybe StringStream if your CSV is already in memory)

3) process() it using your SolrClient


As with most classes in solrj, looking at the the test cases is probably 
the best way to see "sample" code.  (allthough some of them are explictly 
convoluted to test edge cases in the underlying implementation.)


This is probably the simplest one...

hossman@slate:~/lucene/solr [j11] [branch_9_1] $ grep -A5 'new ContentStreamUpdateRequest' solr/solrj/src/test/org/apache/solr/client/solrj/request/json/JsonQueryRequestIntegrationTest.java
    ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update");
    up.setParam("collection", COLLECTION_NAME);
    up.addFile(getFile("solrj/books.csv"), "application/csv");
    up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
    UpdateResponse updateResponse = up.process(cluster.getSolrClient());
    assertEquals(0, updateResponse.getStatus());





: 
: Le ven. 10 févr. 2023 à 19:22, Chris Hostetter <ho...@fucit.org> a
: écrit :
: 
: >
: > : what is a common use case then if it is not the csv type?
: > : how to index massively data into Solr using SolrJ
: > : You can't just read line by line each dataset you want to index.
: >
: > There are lots of usecases for using SolrJ that involve programaticlly
: > generating the SolrInputDocuments you wnat to index in solr -- frequently
: > after ready from some normalized /authoritative data store.
: >
: > If you already have data "on disk" in a format that solr can parse (csv,
: > solr's xml, a PDF file you want Solr's extraction module to parse, etc...)
: > then that's what the ContentStreamUpdateRequest is for...
: >
: >
: > https://solr.apache.org/docs/9_1_0/solrj/org/apache/solr/client/solrj/request/ContentStreamUpdateRequest.html
: >
: > :
: > : Le lun. 30 janv. 2023 à 14:11, Jan Høydahl <ja...@cominvent.com> a
: > écrit :
: > :
: > : > It's not a common use case for SolrJ to post plain CSV content to Solr.
: > : > SolrJ is used to push SolrInputDocument objects. Maybe there's a way
: > to do
: > : > it by using some Generic request type and overriding content type..
: > Can you
: > : > explain more what you app will do, where that CSV file comes from in
: > the
: > : > first place and why you'd want to use SolrJ to move it to Solr, rather
: > than
: > : > curl or some other http client lib?
: > : >
: > : > Jan
: > : >
: > : > > 29. jan. 2023 kl. 20:44 skrev marc nicole <mk...@gmail.com>:
: > : > >
: > : > > The Java code should perform the post. Any piece of code to show to
: > : > better
: > : > > explain this?
: > : > >
: > : > > thanks
: > : > >
: > : > > Le dim. 29 janv. 2023 à 20:29, Jan Høydahl <ja...@cominvent.com> a
: > : > écrit :
: > : > >
: > : > >> Read csv in your app, create a Solr doc from each line and ingest to
: > : > Solr
: > : > >> in fitting batches. You can use a csv library or just parse each
: > line
: > : > >> yourself if the format is fixed.
: > : > >>
: > : > >> If you need to post csv directly to Solr you’d use a plain http post
: > : > with
: > : > >> content-type csv, but in most cases your app would do that.
: > : > >>
: > : > >> Jan Høydahl
: > : > >>
: > : > >>> 29. jan. 2023 kl. 20:21 skrev marc nicole <mk...@gmail.com>:
: > : > >>>
: > : > >>> Hi guys,
: > : > >>>
: > : > >>> I can't find a reference on how to index a dataset.csv file into
: > Solr
: > : > >> using
: > : > >>> SolrJ.
: > : > >>> https://solr.apache.org/guide/6_6/using-solrj.html
: > : > >>>
: > : > >>> Thanks.
: > : > >>
: > : >
: > : >
: > :
: >
: > -Hoss
: > http://www.lucidworks.com/
: 

-Hoss
http://www.lucidworks.com/

Re: How to index a csv dataset into Solr using SolrJ

Posted by marc nicole <mk...@gmail.com>.
@Chris can you provide a sample Java code using ContentStreamUpdateRequest
class?

Le ven. 10 févr. 2023 à 19:22, Chris Hostetter <ho...@fucit.org> a
écrit :

>
> : what is a common use case then if it is not the csv type?
> : how to index massively data into Solr using SolrJ
> : You can't just read line by line each dataset you want to index.
>
> There are lots of usecases for using SolrJ that involve programaticlly
> generating the SolrInputDocuments you wnat to index in solr -- frequently
> after ready from some normalized /authoritative data store.
>
> If you already have data "on disk" in a format that solr can parse (csv,
> solr's xml, a PDF file you want Solr's extraction module to parse, etc...)
> then that's what the ContentStreamUpdateRequest is for...
>
>
> https://solr.apache.org/docs/9_1_0/solrj/org/apache/solr/client/solrj/request/ContentStreamUpdateRequest.html
>
> :
> : Le lun. 30 janv. 2023 à 14:11, Jan Høydahl <ja...@cominvent.com> a
> écrit :
> :
> : > It's not a common use case for SolrJ to post plain CSV content to Solr.
> : > SolrJ is used to push SolrInputDocument objects. Maybe there's a way
> to do
> : > it by using some Generic request type and overriding content type..
> Can you
> : > explain more what you app will do, where that CSV file comes from in
> the
> : > first place and why you'd want to use SolrJ to move it to Solr, rather
> than
> : > curl or some other http client lib?
> : >
> : > Jan
> : >
> : > > 29. jan. 2023 kl. 20:44 skrev marc nicole <mk...@gmail.com>:
> : > >
> : > > The Java code should perform the post. Any piece of code to show to
> : > better
> : > > explain this?
> : > >
> : > > thanks
> : > >
> : > > Le dim. 29 janv. 2023 à 20:29, Jan Høydahl <ja...@cominvent.com> a
> : > écrit :
> : > >
> : > >> Read csv in your app, create a Solr doc from each line and ingest to
> : > Solr
> : > >> in fitting batches. You can use a csv library or just parse each
> line
> : > >> yourself if the format is fixed.
> : > >>
> : > >> If you need to post csv directly to Solr you’d use a plain http post
> : > with
> : > >> content-type csv, but in most cases your app would do that.
> : > >>
> : > >> Jan Høydahl
> : > >>
> : > >>> 29. jan. 2023 kl. 20:21 skrev marc nicole <mk...@gmail.com>:
> : > >>>
> : > >>> Hi guys,
> : > >>>
> : > >>> I can't find a reference on how to index a dataset.csv file into
> Solr
> : > >> using
> : > >>> SolrJ.
> : > >>> https://solr.apache.org/guide/6_6/using-solrj.html
> : > >>>
> : > >>> Thanks.
> : > >>
> : >
> : >
> :
>
> -Hoss
> http://www.lucidworks.com/

Re: How to index a csv dataset into Solr using SolrJ

Posted by Chris Hostetter <ho...@fucit.org>.
: what is a common use case then if it is not the csv type?
: how to index massively data into Solr using SolrJ
: You can't just read line by line each dataset you want to index.

There are lots of usecases for using SolrJ that involve programaticlly 
generating the SolrInputDocuments you wnat to index in solr -- frequently 
after ready from some normalized /authoritative data store.

If you already have data "on disk" in a format that solr can parse (csv, 
solr's xml, a PDF file you want Solr's extraction module to parse, etc...) 
then that's what the ContentStreamUpdateRequest is for...

https://solr.apache.org/docs/9_1_0/solrj/org/apache/solr/client/solrj/request/ContentStreamUpdateRequest.html

: 
: Le lun. 30 janv. 2023 à 14:11, Jan Høydahl <ja...@cominvent.com> a écrit :
: 
: > It's not a common use case for SolrJ to post plain CSV content to Solr.
: > SolrJ is used to push SolrInputDocument objects. Maybe there's a way to do
: > it by using some Generic request type and overriding content type.. Can you
: > explain more what you app will do, where that CSV file comes from in the
: > first place and why you'd want to use SolrJ to move it to Solr, rather than
: > curl or some other http client lib?
: >
: > Jan
: >
: > > 29. jan. 2023 kl. 20:44 skrev marc nicole <mk...@gmail.com>:
: > >
: > > The Java code should perform the post. Any piece of code to show to
: > better
: > > explain this?
: > >
: > > thanks
: > >
: > > Le dim. 29 janv. 2023 à 20:29, Jan Høydahl <ja...@cominvent.com> a
: > écrit :
: > >
: > >> Read csv in your app, create a Solr doc from each line and ingest to
: > Solr
: > >> in fitting batches. You can use a csv library or just parse each line
: > >> yourself if the format is fixed.
: > >>
: > >> If you need to post csv directly to Solr you’d use a plain http post
: > with
: > >> content-type csv, but in most cases your app would do that.
: > >>
: > >> Jan Høydahl
: > >>
: > >>> 29. jan. 2023 kl. 20:21 skrev marc nicole <mk...@gmail.com>:
: > >>>
: > >>> Hi guys,
: > >>>
: > >>> I can't find a reference on how to index a dataset.csv file into Solr
: > >> using
: > >>> SolrJ.
: > >>> https://solr.apache.org/guide/6_6/using-solrj.html
: > >>>
: > >>> Thanks.
: > >>
: >
: >
: 

-Hoss
http://www.lucidworks.com/

Re: How to index a csv dataset into Solr using SolrJ

Posted by sambasivarao giddaluri <sa...@gmail.com>.
As part of migration we converted csv data by creating multiple json files
each consisting around 100mb data and then wrote a small shell script to
inject these files through solr api in loop .

Just make sure when If you have multiple nodes then it might take some time
to get the replication done .

On Fri, Feb 10, 2023 at 12:57 PM marc nicole <mk...@gmail.com> wrote:

> what is a common use case then if it is not the csv type?
> how to index massively data into Solr using SolrJ
> You can't just read line by line each dataset you want to index.
>
> Le lun. 30 janv. 2023 à 14:11, Jan Høydahl <ja...@cominvent.com> a
> écrit :
>
> > It's not a common use case for SolrJ to post plain CSV content to Solr.
> > SolrJ is used to push SolrInputDocument objects. Maybe there's a way to
> do
> > it by using some Generic request type and overriding content type.. Can
> you
> > explain more what you app will do, where that CSV file comes from in the
> > first place and why you'd want to use SolrJ to move it to Solr, rather
> than
> > curl or some other http client lib?
> >
> > Jan
> >
> > > 29. jan. 2023 kl. 20:44 skrev marc nicole <mk...@gmail.com>:
> > >
> > > The Java code should perform the post. Any piece of code to show to
> > better
> > > explain this?
> > >
> > > thanks
> > >
> > > Le dim. 29 janv. 2023 à 20:29, Jan Høydahl <ja...@cominvent.com> a
> > écrit :
> > >
> > >> Read csv in your app, create a Solr doc from each line and ingest to
> > Solr
> > >> in fitting batches. You can use a csv library or just parse each line
> > >> yourself if the format is fixed.
> > >>
> > >> If you need to post csv directly to Solr you’d use a plain http post
> > with
> > >> content-type csv, but in most cases your app would do that.
> > >>
> > >> Jan Høydahl
> > >>
> > >>> 29. jan. 2023 kl. 20:21 skrev marc nicole <mk...@gmail.com>:
> > >>>
> > >>> Hi guys,
> > >>>
> > >>> I can't find a reference on how to index a dataset.csv file into Solr
> > >> using
> > >>> SolrJ.
> > >>> https://solr.apache.org/guide/6_6/using-solrj.html
> > >>>
> > >>> Thanks.
> > >>
> >
> >
>

Re: How to index a csv dataset into Solr using SolrJ

Posted by marc nicole <mk...@gmail.com>.
what is a common use case then if it is not the csv type?
how to index massively data into Solr using SolrJ
You can't just read line by line each dataset you want to index.

Le lun. 30 janv. 2023 à 14:11, Jan Høydahl <ja...@cominvent.com> a écrit :

> It's not a common use case for SolrJ to post plain CSV content to Solr.
> SolrJ is used to push SolrInputDocument objects. Maybe there's a way to do
> it by using some Generic request type and overriding content type.. Can you
> explain more what you app will do, where that CSV file comes from in the
> first place and why you'd want to use SolrJ to move it to Solr, rather than
> curl or some other http client lib?
>
> Jan
>
> > 29. jan. 2023 kl. 20:44 skrev marc nicole <mk...@gmail.com>:
> >
> > The Java code should perform the post. Any piece of code to show to
> better
> > explain this?
> >
> > thanks
> >
> > Le dim. 29 janv. 2023 à 20:29, Jan Høydahl <ja...@cominvent.com> a
> écrit :
> >
> >> Read csv in your app, create a Solr doc from each line and ingest to
> Solr
> >> in fitting batches. You can use a csv library or just parse each line
> >> yourself if the format is fixed.
> >>
> >> If you need to post csv directly to Solr you’d use a plain http post
> with
> >> content-type csv, but in most cases your app would do that.
> >>
> >> Jan Høydahl
> >>
> >>> 29. jan. 2023 kl. 20:21 skrev marc nicole <mk...@gmail.com>:
> >>>
> >>> Hi guys,
> >>>
> >>> I can't find a reference on how to index a dataset.csv file into Solr
> >> using
> >>> SolrJ.
> >>> https://solr.apache.org/guide/6_6/using-solrj.html
> >>>
> >>> Thanks.
> >>
>
>

Re: How to index a csv dataset into Solr using SolrJ

Posted by Jan Høydahl <ja...@cominvent.com>.
It's not a common use case for SolrJ to post plain CSV content to Solr. SolrJ is used to push SolrInputDocument objects. Maybe there's a way to do it by using some Generic request type and overriding content type.. Can you explain more what you app will do, where that CSV file comes from in the first place and why you'd want to use SolrJ to move it to Solr, rather than curl or some other http client lib?

Jan

> 29. jan. 2023 kl. 20:44 skrev marc nicole <mk...@gmail.com>:
> 
> The Java code should perform the post. Any piece of code to show to better
> explain this?
> 
> thanks
> 
> Le dim. 29 janv. 2023 à 20:29, Jan Høydahl <ja...@cominvent.com> a écrit :
> 
>> Read csv in your app, create a Solr doc from each line and ingest to Solr
>> in fitting batches. You can use a csv library or just parse each line
>> yourself if the format is fixed.
>> 
>> If you need to post csv directly to Solr you’d use a plain http post with
>> content-type csv, but in most cases your app would do that.
>> 
>> Jan Høydahl
>> 
>>> 29. jan. 2023 kl. 20:21 skrev marc nicole <mk...@gmail.com>:
>>> 
>>> Hi guys,
>>> 
>>> I can't find a reference on how to index a dataset.csv file into Solr
>> using
>>> SolrJ.
>>> https://solr.apache.org/guide/6_6/using-solrj.html
>>> 
>>> Thanks.
>> 


Re: How to index a csv dataset into Solr using SolrJ

Posted by marc nicole <mk...@gmail.com>.
The Java code should perform the post. Any piece of code to show to better
explain this?

thanks

Le dim. 29 janv. 2023 à 20:29, Jan Høydahl <ja...@cominvent.com> a écrit :

> Read csv in your app, create a Solr doc from each line and ingest to Solr
> in fitting batches. You can use a csv library or just parse each line
> yourself if the format is fixed.
>
> If you need to post csv directly to Solr you’d use a plain http post with
> content-type csv, but in most cases your app would do that.
>
> Jan Høydahl
>
> > 29. jan. 2023 kl. 20:21 skrev marc nicole <mk...@gmail.com>:
> >
> > Hi guys,
> >
> > I can't find a reference on how to index a dataset.csv file into Solr
> using
> > SolrJ.
> > https://solr.apache.org/guide/6_6/using-solrj.html
> >
> > Thanks.
>

Re: How to index a csv dataset into Solr using SolrJ

Posted by Jan Høydahl <ja...@cominvent.com>.
Read csv in your app, create a Solr doc from each line and ingest to Solr in fitting batches. You can use a csv library or just parse each line yourself if the format is fixed.

If you need to post csv directly to Solr you’d use a plain http post with content-type csv, but in most cases your app would do that.

Jan Høydahl

> 29. jan. 2023 kl. 20:21 skrev marc nicole <mk...@gmail.com>:
> 
> Hi guys,
> 
> I can't find a reference on how to index a dataset.csv file into Solr using
> SolrJ.
> https://solr.apache.org/guide/6_6/using-solrj.html
> 
> Thanks.