You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Huynh, Chi-Hao" <hu...@initions.com> on 2014/01/06 17:06:29 UTC

Index for csv-file created successfully, but no data is shown

Dear solr users,

I would appreciate if someone can help me out here. My goal is to index a csv-file.

First of all, I am using the CDH 5 beta distribution of Hadoop, which includes solr 4.4.0, on a single node. I am following the hue tutorial to index and search the data from the yelp dataset challenge http://gethue.tumblr.com/post/65969470780/hadoop-tutorials-season-ii-7-how-to-index-and-search.

Following the tutorial, I have uploaded the config files, including the prepared schema.xml, to zookeeper via the solrctl-command:
>solrctl instancedir --create reviews [path to conf]

After this, I have created the collection via:
>solrctl collection --create reviews -s 1

This works fine, as I can see the collection created in the Solr Admin Web UI and the instancedir in the zookeeper shell.

Then, using the MapReduceIndexerTool and the provided morphline file the index is created and uploaded to solr. According to the command output the index was created successfully:

1481 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Indexing 1 files using 1 real mappers into 1 reducers
52716 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Done. Indexing 1 files using 1 real mappers into 1 reducers took 51.233 secs
52774 [main] INFO org.apache.solr.hadoop.GoLive - Live merging of output shards into Solr cluster...
52829 [pool-4-thread-1] INFO org.apache.solr.hadoop.GoLive - Live merge hdfs://svr-hdp01:8020/tmp/load/results/part-00000 into http://SVR-HDP01:8983/solr
53017 [pool-4-thread-1] INFO org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
53495 [main] INFO org.apache.solr.hadoop.GoLive - Committing live merge...
53496 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http client, config:
53512 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Waiting for client to connect to ZooKeeper
53513 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager@19014023 name:ZooKeeperConnection Watcher:SVR-HDP01:2181/solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None
53513 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Client is connected to ZooKeeper
53514 [main] INFO org.apache.solr.common.cloud.ZkStateReader - Updating cluster state from ZooKeeper...
53652 [main] INFO org.apache.solr.hadoop.GoLive - Done committing live merge
53652 [main] INFO org.apache.solr.hadoop.GoLive - Live merging of index shards into Solr cluster took 0.878 secs
53652 [main] INFO org.apache.solr.hadoop.GoLive - Live merging completed successfully
53652 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Succeeded with job: jobName: org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, jobId: job_1388405934175_0013
53653 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Success. Done. Program took 53.719 secs. Goodbye.

Now, when I go to the web UI and select the created core, I find the core to be empty, there are 0 number of Docs and querying it bears no result. My question is, if I have to upload the csv-file manually to somewhere on the solr server as it seems as if the csv-file was parsed and indexed successfully, but the data is missing that was indexed.

I hope, the description of the problem was clear enough. Thanks a lot!
Kind regards

__________________
initions AG
Chi-Hao Huynh
Weidestraße 120a
D-22081 Hamburg

t: +49 (0) 40 / 41 49 60-62
f: +49 (0) 40 / 41 49 60-11
e: huynh@initios.com<ma...@initios.com>
w: www.initions.com<http://www.initions.com>
Vollständiger Name der Gesellschaft: initions innovative IT solutions AG
Sitz der Gesellschaft: Hamburg
Handelsregister Hamburg B 83929
Aufsichtsratsvorsitzender: Dr. Michael Leue
Vorstand: Dr. Stefan Anschütz, André Paul Henkel, Dr. Helge Plehn

AW: Index for csv-file created successfully, but no data is shown

Posted by "Huynh, Chi-Hao" <hu...@initions.com>.

Thanks Otis,

I didn't know there was a Cloudera Search Mailing list. Will try it out.

-----Ursprüngliche Nachricht-----
Von: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com] 
Gesendet: Dienstag, 7. Januar 2014 03:53
An: solr-user@lucene.apache.org
Betreff: Re: Index for csv-file created successfully, but no data is shown

Hi,

This may be a better question for the Cloudera Search mailing list.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jan 6, 2014 at 11:06 AM, Huynh, Chi-Hao <hu...@initions.com> wrote:

> Dear solr users,
>
> I would appreciate if someone can help me out here. My goal is to 
> index a csv-file.
>
> First of all, I am using the CDH 5 beta distribution of Hadoop, which 
> includes solr 4.4.0, on a single node. I am following the hue tutorial 
> to index and search the data from the yelp dataset challenge 
> http://gethue.tumblr.com/post/65969470780/hadoop-tutorials-season-ii-7
> -how-to-index-and-search
> .
>
> Following the tutorial, I have uploaded the config files, including 
> the prepared schema.xml, to zookeeper via the solrctl-command:
> >solrctl instancedir --create reviews [path to conf]
>
> After this, I have created the collection via:
> >solrctl collection --create reviews -s 1
>
> This works fine, as I can see the collection created in the Solr Admin 
> Web UI and the instancedir in the zookeeper shell.
>
> Then, using the MapReduceIndexerTool and the provided morphline file 
> the index is created and uploaded to solr. According to the command 
> output the index was created successfully:
>
> 1481 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - 
> Indexing
> 1 files using 1 real mappers into 1 reducers
> 52716 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Done.
> Indexing 1 files using 1 real mappers into 1 reducers took 51.233 secs
> 52774 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging of 
> output shards into Solr cluster...
> 52829 [pool-4-thread-1] INFO  org.apache.solr.hadoop.GoLive  - Live 
> merge
> hdfs://svr-hdp01:8020/tmp/load/results/part-00000 into 
> http://SVR-HDP01:8983/solr
> 53017 [pool-4-thread-1] INFO
>  org.apache.solr.client.solrj.impl.HttpClientUtil  - Creating new http 
> client, 
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=fal
> se
> 53495 [main] INFO  org.apache.solr.hadoop.GoLive  - Committing live 
> merge...
> 53496 [main] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil  - 
> Creating new http client, config:
> 53512 [main] INFO  org.apache.solr.common.cloud.ConnectionManager  - 
> Waiting for client to connect to ZooKeeper
> 53513 [main-EventThread] INFO
>  org.apache.solr.common.cloud.ConnectionManager  - Watcher 
> org.apache.solr.common.cloud.ConnectionManager@19014023name:ZooKeeperC
> onnection Watcher:SVR-HDP01:2181/solr got event WatchedEvent 
> state:SyncConnected type:None path:null path:null type:None
> 53513 [main] INFO  org.apache.solr.common.cloud.ConnectionManager  - 
> Client is connected to ZooKeeper
> 53514 [main] INFO  org.apache.solr.common.cloud.ZkStateReader  - 
> Updating cluster state from ZooKeeper...
> 53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Done committing 
> live merge
> 53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging of 
> index shards into Solr cluster took 0.878 secs
> 53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging 
> completed successfully
> 53652 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - 
> Succeeded with job: jobName:
> org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, jobId:
> job_1388405934175_0013
> 53653 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Success.
> Done. Program took 53.719 secs. Goodbye.
>
> Now, when I go to the web UI and select the created core, I find the 
> core to be empty, there are 0 number of Docs and querying it bears no 
> result. My question is, if I have to upload the csv-file manually to 
> somewhere on the solr server as it seems as if the csv-file was parsed 
> and indexed successfully, but the data is missing that was indexed.
>
> I hope, the description of the problem was clear enough. Thanks a lot!
> Kind regards
>
> __________________
> initions AG
> Chi-Hao Huynh
> Weidestraße 120a
> D-22081 Hamburg
>
> t:   +49 (0) 40 / 41 49 60-62
> f:   +49 (0) 40 / 41 49 60-11
> e:  huynh@initios.com<ma...@initios.com>
> w: www.initions.com<http://www.initions.com>
> Vollständiger Name der Gesellschaft: initions innovative IT solutions 
> AG Sitz der Gesellschaft: Hamburg Handelsregister Hamburg B 83929
> Aufsichtsratsvorsitzender: Dr. Michael Leue
> Vorstand: Dr. Stefan Anschütz, André Paul Henkel, Dr. Helge Plehn
>
>

Re: Index for csv-file created successfully, but no data is shown

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

This may be a better question for the Cloudera Search mailing list.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jan 6, 2014 at 11:06 AM, Huynh, Chi-Hao <hu...@initions.com> wrote:

> Dear solr users,
>
> I would appreciate if someone can help me out here. My goal is to index a
> csv-file.
>
> First of all, I am using the CDH 5 beta distribution of Hadoop, which
> includes solr 4.4.0, on a single node. I am following the hue tutorial to
> index and search the data from the yelp dataset challenge
> http://gethue.tumblr.com/post/65969470780/hadoop-tutorials-season-ii-7-how-to-index-and-search
> .
>
> Following the tutorial, I have uploaded the config files, including the
> prepared schema.xml, to zookeeper via the solrctl-command:
> >solrctl instancedir --create reviews [path to conf]
>
> After this, I have created the collection via:
> >solrctl collection --create reviews -s 1
>
> This works fine, as I can see the collection created in the Solr Admin Web
> UI and the instancedir in the zookeeper shell.
>
> Then, using the MapReduceIndexerTool and the provided morphline file the
> index is created and uploaded to solr. According to the command output the
> index was created successfully:
>
> 1481 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Indexing
> 1 files using 1 real mappers into 1 reducers
> 52716 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Done.
> Indexing 1 files using 1 real mappers into 1 reducers took 51.233 secs
> 52774 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging of output
> shards into Solr cluster...
> 52829 [pool-4-thread-1] INFO  org.apache.solr.hadoop.GoLive  - Live merge
> hdfs://svr-hdp01:8020/tmp/load/results/part-00000 into
> http://SVR-HDP01:8983/solr
> 53017 [pool-4-thread-1] INFO
>  org.apache.solr.client.solrj.impl.HttpClientUtil  - Creating new http
> client,
> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
> 53495 [main] INFO  org.apache.solr.hadoop.GoLive  - Committing live
> merge...
> 53496 [main] INFO  org.apache.solr.client.solrj.impl.HttpClientUtil  -
> Creating new http client, config:
> 53512 [main] INFO  org.apache.solr.common.cloud.ConnectionManager  -
> Waiting for client to connect to ZooKeeper
> 53513 [main-EventThread] INFO
>  org.apache.solr.common.cloud.ConnectionManager  - Watcher
> org.apache.solr.common.cloud.ConnectionManager@19014023name:ZooKeeperConnection Watcher:SVR-HDP01:2181/solr got event WatchedEvent
> state:SyncConnected type:None path:null path:null type:None
> 53513 [main] INFO  org.apache.solr.common.cloud.ConnectionManager  -
> Client is connected to ZooKeeper
> 53514 [main] INFO  org.apache.solr.common.cloud.ZkStateReader  - Updating
> cluster state from ZooKeeper...
> 53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Done committing live
> merge
> 53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging of index
> shards into Solr cluster took 0.878 secs
> 53652 [main] INFO  org.apache.solr.hadoop.GoLive  - Live merging completed
> successfully
> 53652 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  -
> Succeeded with job: jobName:
> org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, jobId:
> job_1388405934175_0013
> 53653 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Success.
> Done. Program took 53.719 secs. Goodbye.
>
> Now, when I go to the web UI and select the created core, I find the core
> to be empty, there are 0 number of Docs and querying it bears no result. My
> question is, if I have to upload the csv-file manually to somewhere on the
> solr server as it seems as if the csv-file was parsed and indexed
> successfully, but the data is missing that was indexed.
>
> I hope, the description of the problem was clear enough. Thanks a lot!
> Kind regards
>
> __________________
> initions AG
> Chi-Hao Huynh
> Weidestraße 120a
> D-22081 Hamburg
>
> t:   +49 (0) 40 / 41 49 60-62
> f:   +49 (0) 40 / 41 49 60-11
> e:  huynh@initios.com<ma...@initios.com>
> w: www.initions.com<http://www.initions.com>
> Vollständiger Name der Gesellschaft: initions innovative IT solutions AG
> Sitz der Gesellschaft: Hamburg
> Handelsregister Hamburg B 83929
> Aufsichtsratsvorsitzender: Dr. Michael Leue
> Vorstand: Dr. Stefan Anschütz, André Paul Henkel, Dr. Helge Plehn
>
>