You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by rhys J <rh...@gmail.com> on 2019/10/21 17:04:24 UTC

Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

I am trying to import a csv file to my solr core.

It looks like this:

"user_id","name","email","client","classification","default_client","disabled","dm_password","manager"
"A2M","Art Morse","amorse@morsemoving.com","Morse
Moving","Morse","","X","blue0show",""
"ABW","Amy Wiedner","amy.wiedner@pyramid-logistics.com","Pyramid","","","
","shawn",""
"J2P","Joan Padal","joanp@bergerallied.com","Berger","","","
","skew3cues",""
"ALB","Anna Bachman","annab@bergerallied.com","Berger","","","
","wary#scan",""
"B1B","Bridget Baker","bbaker@reliablevan.com","Reliable","","","
","laps,hear",""
"B1K","Bev Klein"," ","Nor-Cal","",""," ","pipe3hour",""
"B1L","Beverly Leonard","bleonard@reliablevan.com","Reliable","","","
","gail6copy",""
"CMD","Christal Davis","christaldavis@smmoving.com","SMMoving","","","
","risk-pair",""
"BEB","Bob Barnum","bobb@bergerts.com","Berger","",""," ","mets=pol",""

I have set up the schema via the API, and have all the fields that are
listed on the top line of the csv file.

When I finish the import, it returns no errors. But when I go to look at
the schema, it's created a 2 fields in the managed-schema file:

<field
name="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
type="text_general"/>

and

 <copyField
source="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
dest="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager__str"
maxChars="256"/>

Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

Posted by rhys J <rh...@gmail.com>.
Thank you, that worked perfectly. I can't believe I didn't notice the
separator was a tab.

Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/21/2019 11:24 AM, rhys J wrote:
> I am using this command:
> 
> curl '
> http://localhost:8983/solr/users/update/csv?commit=true&separator=%09&encapsulator=%20&escape=\&stream.file=/tmp/users.csv
> '

The sequence %20 is a URL encoding of a space. If you intend the 
encapsulator character to be a double quote, you should be using %22 
instead.

The sequence %09 is a tab character, sometimes known as Ctrl-I.  Your 
CSV looks like it's using a comma, which is %2C instead.

The defaults for the CSV import should be a double quote for 
encapsulation and a comma for a separator, with \ as the escape 
character ... so perhaps you should just leave those parameters off of 
the URL.

Thanks,
Shawn

Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

Posted by rhys J <rh...@gmail.com>.
I am using this command:

curl '
http://localhost:8983/solr/users/update/csv?commit=true&separator=%09&encapsulator=%20&escape=\&stream.file=/tmp/users.csv
'

On Mon, Oct 21, 2019 at 1:22 PM Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> What command do you use to get the file into Solr? My guess that you
> are somehow not hitting the correct handler. Perhaps you are sending
> it to extract handler (designed for PDF, MSWord, etc) rather than the
> correct CSV handler.
>
> Solr comes with the examples of how to index CSV command.
> See for example:
>
> https://github.com/apache/lucene-solr/blob/master/solr/example/films/README.txt#L39
> Also reference documentation:
>
> https://lucene.apache.org/solr/guide/8_1/uploading-data-with-index-handlers.html
>
> Regards,
>    Alex.
>
> On Mon, 21 Oct 2019 at 13:04, rhys J <rh...@gmail.com> wrote:
> >
> > I am trying to import a csv file to my solr core.
> >
> > It looks like this:
> >
> >
> "user_id","name","email","client","classification","default_client","disabled","dm_password","manager"
> > "A2M","Art Morse","amorse@morsemoving.com","Morse
> > Moving","Morse","","X","blue0show",""
> > "ABW","Amy Wiedner","amy.wiedner@pyramid-logistics.com
> ","Pyramid","","","
> > ","shawn",""
> > "J2P","Joan Padal","joanp@bergerallied.com","Berger","","","
> > ","skew3cues",""
> > "ALB","Anna Bachman","annab@bergerallied.com","Berger","","","
> > ","wary#scan",""
> > "B1B","Bridget Baker","bbaker@reliablevan.com","Reliable","","","
> > ","laps,hear",""
> > "B1K","Bev Klein"," ","Nor-Cal","",""," ","pipe3hour",""
> > "B1L","Beverly Leonard","bleonard@reliablevan.com","Reliable","","","
> > ","gail6copy",""
> > "CMD","Christal Davis","christaldavis@smmoving.com","SMMoving","","","
> > ","risk-pair",""
> > "BEB","Bob Barnum","bobb@bergerts.com","Berger","",""," ","mets=pol",""
> >
> > I have set up the schema via the API, and have all the fields that are
> > listed on the top line of the csv file.
> >
> > When I finish the import, it returns no errors. But when I go to look at
> > the schema, it's created a 2 fields in the managed-schema file:
> >
> > <field
> >
> name="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> > type="text_general"/>
> >
> > and
> >
> >  <copyField
> >
> source="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> >
> dest="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager__str"
> > maxChars="256"/>
>

Re: Importing a csv file encapsulated by " creates a large copyField field of all fields combined.

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
What command do you use to get the file into Solr? My guess that you
are somehow not hitting the correct handler. Perhaps you are sending
it to extract handler (designed for PDF, MSWord, etc) rather than the
correct CSV handler.

Solr comes with the examples of how to index CSV command.
See for example:
https://github.com/apache/lucene-solr/blob/master/solr/example/films/README.txt#L39
Also reference documentation:
https://lucene.apache.org/solr/guide/8_1/uploading-data-with-index-handlers.html

Regards,
   Alex.

On Mon, 21 Oct 2019 at 13:04, rhys J <rh...@gmail.com> wrote:
>
> I am trying to import a csv file to my solr core.
>
> It looks like this:
>
> "user_id","name","email","client","classification","default_client","disabled","dm_password","manager"
> "A2M","Art Morse","amorse@morsemoving.com","Morse
> Moving","Morse","","X","blue0show",""
> "ABW","Amy Wiedner","amy.wiedner@pyramid-logistics.com","Pyramid","","","
> ","shawn",""
> "J2P","Joan Padal","joanp@bergerallied.com","Berger","","","
> ","skew3cues",""
> "ALB","Anna Bachman","annab@bergerallied.com","Berger","","","
> ","wary#scan",""
> "B1B","Bridget Baker","bbaker@reliablevan.com","Reliable","","","
> ","laps,hear",""
> "B1K","Bev Klein"," ","Nor-Cal","",""," ","pipe3hour",""
> "B1L","Beverly Leonard","bleonard@reliablevan.com","Reliable","","","
> ","gail6copy",""
> "CMD","Christal Davis","christaldavis@smmoving.com","SMMoving","","","
> ","risk-pair",""
> "BEB","Bob Barnum","bobb@bergerts.com","Berger","",""," ","mets=pol",""
>
> I have set up the schema via the API, and have all the fields that are
> listed on the top line of the csv file.
>
> When I finish the import, it returns no errors. But when I go to look at
> the schema, it's created a 2 fields in the managed-schema file:
>
> <field
> name="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> type="text_general"/>
>
> and
>
>  <copyField
> source="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager_"
> dest="_user_id___name___email___client___classification___default_client___disabled___dm_password___manager__str"
> maxChars="256"/>