You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by cqlangyi <cq...@163.com> on 2014/03/07 09:48:01 UTC

howto count total word amount of all documents in solr index?

hi there,


i have following questions, please help me out, very appreciate.

say i have a field configured as "text_general" type, and indexed 3 pieces content as documents.
1. "today is a good day"
2. "call your family every day"
3. "come with me"


how could i count the total (even roughly) word amount in these 3 documents, with the above the
result should be "13" at max or something a little less if the stopwords enabled.


thanks a lot.


Cq






At 2014-03-07 16:12:17,solr-user-help@lucene.apache.org wrote:
>Hi! This is the ezmlm program. I'm managing the
>solr-user@lucene.apache.org mailing list.
>
>I'm working for my owner, who can be reached
>at solr-user-owner@lucene.apache.org.
>
>Acknowledgment: I have added the address
>
>   cqlangyi@163.com
>
>to the solr-user mailing list.
>
>Welcome to solr-user@lucene.apache.org!
>
>Please save this message so that you know the address you are
>subscribed under, in case you later want to unsubscribe or change your
>subscription address.
>
>
>--- Administrative commands for the solr-user list ---
>
>I can handle administrative requests automatically. Please
>do not send them to the list address! Instead, send
>your message to the correct command address:
>
>To subscribe to the list, send a message to:
>   <so...@lucene.apache.org>
>
>To remove your address from the list, send a message to:
>   <so...@lucene.apache.org>
>
>Send mail to the following for info and FAQ for this list:
>   <so...@lucene.apache.org>
>   <so...@lucene.apache.org>
>
>Similar addresses exist for the digest list:
>   <so...@lucene.apache.org>
>   <so...@lucene.apache.org>
>
>To get messages 123 through 145 (a maximum of 100 per request), mail:
>   <so...@lucene.apache.org>
>
>To get an index with subject and author for messages 123-456 , mail:
>   <so...@lucene.apache.org>
>
>They are always returned as sets of 100, max 2000 per request,
>so you'll actually get 100-499.
>
>To receive all messages with the same subject as message 12345,
>send a short message to:
>   <so...@lucene.apache.org>
>
>The messages should contain one line or word of text to avoid being
>treated as sp@m, but I will ignore their content.
>Only the ADDRESS you send to is important.
>
>You can start a subscription for an alternate address,
>for example "john@host.domain", just add a hyphen and your
>address (with '=' instead of '@') after the command word:
><so...@lucene.apache.org>
>
>To stop subscription for this address, mail:
><so...@lucene.apache.org>
>
>In both cases, I'll send a confirmation message to that address. When
>you receive it, simply reply to it to complete your subscription.
>
>If despite following these instructions, you do not get the
>desired results, please contact my owner at
>solr-user-owner@lucene.apache.org. Please be patient, my owner is a
>lot slower than I am ;-)
>
>--- Enclosed is a copy of the request I received.
>
>Return-Path: <cq...@163.com>
>Received: (qmail 15386 invoked by uid 99); 7 Mar 2014 08:12:16 -0000
>Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
>    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 08:12:16 +0000
>X-ASF-Spam-Status: No, hits=4.9 required=5.0
>	tests=HTML_MESSAGE,RCVD_IN_PSBL,SPF_PASS
>X-Spam-Check-By: apache.org
>Received-SPF: pass (athena.apache.org: domain of cqlangyi@163.com designates 220.181.13.59 as permitted sender)
>Received: from [220.181.13.59] (HELO m13-59.163.com) (220.181.13.59)
>    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 08:12:10 +0000
>DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com;
>	s=s110527; h=Date:From:Subject:MIME-Version:Message-ID; bh=KIKmb
>	puxu1huGSa5A5RUYvBKNt2RimeBgObxnp/l7gM=; b=N9yyj5qhfT8TXAwfhcRlY
>	mjX4dgzti8JvVtAoO2k69n0r6alQMYT2HiOlNtjTL2XXTiJqreBx4LW07HvP5qIK
>	GRbHPusNhK0s2edW9nRzffFZELJ+wfKwOpB/WLNHQXZqlAKyGP3w5civwG+rprB0
>	vaXbO9dYxInWKc80ZIU5Hc=
>Received: from cqlangyi$163.com ( [222.129.238.198] ) by
> ajax-webmail-wmsvr59 (Coremail) ; Fri, 7 Mar 2014 16:11:45 +0800 (CST)
>X-Originating-IP: [222.129.238.198]
>Date: Fri, 7 Mar 2014 16:11:45 +0800 (CST)
>From: cqlangyi  <cq...@163.com>
>To: 
>	solr-user-sc.1394177943.kmfejmmdgfggfaeokajb-cqlangyi=163.com@lucene.apache.org
>Subject: Re:confirm subscribe to solr-user@lucene.apache.org
>X-Priority: 3
>X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build
> 20131204(24406.5820.5783) Copyright (c) 2002-2014 www.mailtech.cn 163com
>In-Reply-To: <13...@lucene.apache.org>
>References: <13...@lucene.apache.org>
>X-CM-CTRLDATA: 2T34YmZvb3Rlcl9odG09OTE2NDo4MQ==
>Content-Type: multipart/alternative; 
>	boundary="----=_Part_174263_595565442.1394179905833"
>MIME-Version: 1.0
>Message-ID: <77...@163.com>
>X-CM-TRANSID:O8GowADX389DfxlTrCkLAA--.29605W
>X-CM-SenderInfo: pftot0xj1lqiywtou0bp/1tbiGBtHvFEAKkKiBQABsn
>X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU==
>X-Virus-Checked: Checked by ClamAV on apache.org
>







Re: howto count total word amount of all documents in solr index?

Posted by Furkan KAMACI <fu...@gmail.com>.
Hi;

Dou you want that:
http://localhost:8983/solr/#/collection1/schema-browser?field=text_general

Thanks;
Furkan KAMACI


2014-03-07 10:48 GMT+02:00 cqlangyi <cq...@163.com>:

> hi there,
>
>
> i have following questions, please help me out, very appreciate.
>
> say i have a field configured as "text_general" type, and indexed 3 pieces
> content as documents.
> 1. "today is a good day"
> 2. "call your family every day"
> 3. "come with me"
>
>
> how could i count the total (even roughly) word amount in these 3
> documents, with the above the
> result should be "13" at max or something a little less if the stopwords
> enabled.
>
>
> thanks a lot.
>
>
> Cq
>
>
>
>
>
>
> At 2014-03-07 16:12:17,solr-user-help@lucene.apache.org wrote:
> >Hi! This is the ezmlm program. I'm managing the
> >solr-user@lucene.apache.org mailing list.
> >
> >I'm working for my owner, who can be reached
> >at solr-user-owner@lucene.apache.org.
> >
> >Acknowledgment: I have added the address
> >
> >   cqlangyi@163.com
> >
> >to the solr-user mailing list.
> >
> >Welcome to solr-user@lucene.apache.org!
> >
> >Please save this message so that you know the address you are
> >subscribed under, in case you later want to unsubscribe or change your
> >subscription address.
> >
> >
> >--- Administrative commands for the solr-user list ---
> >
> >I can handle administrative requests automatically. Please
> >do not send them to the list address! Instead, send
> >your message to the correct command address:
> >
> >To subscribe to the list, send a message to:
> >   <so...@lucene.apache.org>
> >
> >To remove your address from the list, send a message to:
> >   <so...@lucene.apache.org>
> >
> >Send mail to the following for info and FAQ for this list:
> >   <so...@lucene.apache.org>
> >   <so...@lucene.apache.org>
> >
> >Similar addresses exist for the digest list:
> >   <so...@lucene.apache.org>
> >   <so...@lucene.apache.org>
> >
> >To get messages 123 through 145 (a maximum of 100 per request), mail:
> >   <so...@lucene.apache.org>
> >
> >To get an index with subject and author for messages 123-456 , mail:
> >   <so...@lucene.apache.org>
> >
> >They are always returned as sets of 100, max 2000 per request,
> >so you'll actually get 100-499.
> >
> >To receive all messages with the same subject as message 12345,
> >send a short message to:
> >   <so...@lucene.apache.org>
> >
> >The messages should contain one line or word of text to avoid being
> >treated as sp@m, but I will ignore their content.
> >Only the ADDRESS you send to is important.
> >
> >You can start a subscription for an alternate address,
> >for example "john@host.domain", just add a hyphen and your
> >address (with '=' instead of '@') after the command word:
> ><so...@lucene.apache.org>
> >
> >To stop subscription for this address, mail:
> ><so...@lucene.apache.org>
> >
> >In both cases, I'll send a confirmation message to that address. When
> >you receive it, simply reply to it to complete your subscription.
> >
> >If despite following these instructions, you do not get the
> >desired results, please contact my owner at
> >solr-user-owner@lucene.apache.org. Please be patient, my owner is a
> >lot slower than I am ;-)
> >
> >--- Enclosed is a copy of the request I received.
> >
> >Return-Path: <cq...@163.com>
> >Received: (qmail 15386 invoked by uid 99); 7 Mar 2014 08:12:16 -0000
> >Received: from athena.apache.org (HELO athena.apache.org)
> (140.211.11.136)
> >    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 08:12:16
> +0000
> >X-ASF-Spam-Status: No, hits=4.9 required=5.0
> >       tests=HTML_MESSAGE,RCVD_IN_PSBL,SPF_PASS
> >X-Spam-Check-By: apache.org
> >Received-SPF: pass (athena.apache.org: domain of cqlangyi@163.comdesignates 220.181.13.59 as permitted sender)
> >Received: from [220.181.13.59] (HELO m13-59.163.com) (220.181.13.59)
> >    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 08:12:10
> +0000
> >DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com;
> >       s=s110527; h=Date:From:Subject:MIME-Version:Message-ID; bh=KIKmb
> >       puxu1huGSa5A5RUYvBKNt2RimeBgObxnp/l7gM=; b=N9yyj5qhfT8TXAwfhcRlY
> >       mjX4dgzti8JvVtAoO2k69n0r6alQMYT2HiOlNtjTL2XXTiJqreBx4LW07HvP5qIK
> >       GRbHPusNhK0s2edW9nRzffFZELJ+wfKwOpB/WLNHQXZqlAKyGP3w5civwG+rprB0
> >       vaXbO9dYxInWKc80ZIU5Hc=
> >Received: from cqlangyi$163.com ( [222.129.238.198] ) by
> > ajax-webmail-wmsvr59 (Coremail) ; Fri, 7 Mar 2014 16:11:45 +0800 (CST)
> >X-Originating-IP: [222.129.238.198]
> >Date: Fri, 7 Mar 2014 16:11:45 +0800 (CST)
> >From: cqlangyi  <cq...@163.com>
> >To:
> >       solr-user-sc.1394177943.kmfejmmdgfggfaeokajb-cqlangyi=
> 163.com@lucene.apache.org
> >Subject: Re:confirm subscribe to solr-user@lucene.apache.org
> >X-Priority: 3
> >X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build
> > 20131204(24406.5820.5783) Copyright (c) 2002-2014 www.mailtech.cn 163com
> >In-Reply-To: <13...@lucene.apache.org>
> >References: <13...@lucene.apache.org>
> >X-CM-CTRLDATA: 2T34YmZvb3Rlcl9odG09OTE2NDo4MQ==
> >Content-Type: multipart/alternative;
> >       boundary="----=_Part_174263_595565442.1394179905833"
> >MIME-Version: 1.0
> >Message-ID: <77...@163.com>
> >X-CM-TRANSID:O8GowADX389DfxlTrCkLAA--.29605W
> >X-CM-SenderInfo: pftot0xj1lqiywtou0bp/1tbiGBtHvFEAKkKiBQABsn
> >X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU==
> >X-Virus-Checked: Checked by ClamAV on apache.org
> >
>
>
>
>
>
>
>