You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by cqlangyi <cq...@163.com> on 2014/03/07 09:47:44 UTC

howto: count total word amount of all documents in solr index

hi there,


i have following questions, please help me out, very appreciate.

say i have a field configured as "text_general" type, and indexed 3 pieces content as documents.
1. "today is a good day"
2. "call your family every day"
3. "come with me"


how could i count the total (even roughly) word amount in these 3 documents, with the above the
result should be "13" at max or something a little less if the stopwords enabled.


thanks a lot.


Cq






At 2014-03-07 16:12:17,solr-user-help@lucene.apache.org wrote:
>Hi! This is the ezmlm program. I'm managing the
>solr-user@lucene.apache.org mailing list.
>
>I'm working for my owner, who can be reached
>at solr-user-owner@lucene.apache.org.
>
>Acknowledgment: I have added the address
>
>   cqlangyi@163.com
>
>to the solr-user mailing list.
>
>Welcome to solr-user@lucene.apache.org!
>
>Please save this message so that you know the address you are
>subscribed under, in case you later want to unsubscribe or change your
>subscription address.
>
>
>--- Administrative commands for the solr-user list ---
>
>I can handle administrative requests automatically. Please
>do not send them to the list address! Instead, send
>your message to the correct command address:
>
>To subscribe to the list, send a message to:
>   <so...@lucene.apache.org>
>
>To remove your address from the list, send a message to:
>   <so...@lucene.apache.org>
>
>Send mail to the following for info and FAQ for this list:
>   <so...@lucene.apache.org>
>   <so...@lucene.apache.org>
>
>Similar addresses exist for the digest list:
>   <so...@lucene.apache.org>
>   <so...@lucene.apache.org>
>
>To get messages 123 through 145 (a maximum of 100 per request), mail:
>   <so...@lucene.apache.org>
>
>To get an index with subject and author for messages 123-456 , mail:
>   <so...@lucene.apache.org>
>
>They are always returned as sets of 100, max 2000 per request,
>so you'll actually get 100-499.
>
>To receive all messages with the same subject as message 12345,
>send a short message to:
>   <so...@lucene.apache.org>
>
>The messages should contain one line or word of text to avoid being
>treated as sp@m, but I will ignore their content.
>Only the ADDRESS you send to is important.
>
>You can start a subscription for an alternate address,
>for example "john@host.domain", just add a hyphen and your
>address (with '=' instead of '@') after the command word:
><so...@lucene.apache.org>
>
>To stop subscription for this address, mail:
><so...@lucene.apache.org>
>
>In both cases, I'll send a confirmation message to that address. When
>you receive it, simply reply to it to complete your subscription.
>
>If despite following these instructions, you do not get the
>desired results, please contact my owner at
>solr-user-owner@lucene.apache.org. Please be patient, my owner is a
>lot slower than I am ;-)
>
>--- Enclosed is a copy of the request I received.
>
>Return-Path: <cq...@163.com>
>Received: (qmail 15386 invoked by uid 99); 7 Mar 2014 08:12:16 -0000
>Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
>    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 08:12:16 +0000
>X-ASF-Spam-Status: No, hits=4.9 required=5.0
>	tests=HTML_MESSAGE,RCVD_IN_PSBL,SPF_PASS
>X-Spam-Check-By: apache.org
>Received-SPF: pass (athena.apache.org: domain of cqlangyi@163.com designates 220.181.13.59 as permitted sender)
>Received: from [220.181.13.59] (HELO m13-59.163.com) (220.181.13.59)
>    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 08:12:10 +0000
>DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com;
>	s=s110527; h=Date:From:Subject:MIME-Version:Message-ID; bh=KIKmb
>	puxu1huGSa5A5RUYvBKNt2RimeBgObxnp/l7gM=; b=N9yyj5qhfT8TXAwfhcRlY
>	mjX4dgzti8JvVtAoO2k69n0r6alQMYT2HiOlNtjTL2XXTiJqreBx4LW07HvP5qIK
>	GRbHPusNhK0s2edW9nRzffFZELJ+wfKwOpB/WLNHQXZqlAKyGP3w5civwG+rprB0
>	vaXbO9dYxInWKc80ZIU5Hc=
>Received: from cqlangyi$163.com ( [222.129.238.198] ) by
> ajax-webmail-wmsvr59 (Coremail) ; Fri, 7 Mar 2014 16:11:45 +0800 (CST)
>X-Originating-IP: [222.129.238.198]
>Date: Fri, 7 Mar 2014 16:11:45 +0800 (CST)
>From: cqlangyi  <cq...@163.com>
>To: 
>	solr-user-sc.1394177943.kmfejmmdgfggfaeokajb-cqlangyi=163.com@lucene.apache.org
>Subject: Re:confirm subscribe to solr-user@lucene.apache.org
>X-Priority: 3
>X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build
> 20131204(24406.5820.5783) Copyright (c) 2002-2014 www.mailtech.cn 163com
>In-Reply-To: <13...@lucene.apache.org>
>References: <13...@lucene.apache.org>
>X-CM-CTRLDATA: 2T34YmZvb3Rlcl9odG09OTE2NDo4MQ==
>Content-Type: multipart/alternative; 
>	boundary="----=_Part_174263_595565442.1394179905833"
>MIME-Version: 1.0
>Message-ID: <77...@163.com>
>X-CM-TRANSID:O8GowADX389DfxlTrCkLAA--.29605W
>X-CM-SenderInfo: pftot0xj1lqiywtou0bp/1tbiGBtHvFEAKkKiBQABsn
>X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU==
>X-Virus-Checked: Checked by ClamAV on apache.org
>