You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Carl Roberts <ca...@gmail.com> on 2015/01/24 21:04:05 UTC

How do you parse the data in a field that is returned from a query?

Hi,

How can I parse the data in a field that is returned from a query?

Basically,

I have a multi-valued field that contains values such as these that are 
returned from a query:

           "cpe:/o:freebsd:freebsd:1.1.5.1",
           "cpe:/o:freebsd:freebsd:2.2.3",
           "cpe:/o:freebsd:freebsd:2.2.2",
           "cpe:/o:freebsd:freebsd:2.2.5",
           "cpe:/o:freebsd:freebsd:2.2.4",
           "cpe:/o:freebsd:freebsd:2.0.5",
           "cpe:/o:freebsd:freebsd:2.2.6",
           "cpe:/o:freebsd:freebsd:2.1.6.1",
           "cpe:/o:freebsd:freebsd:2.0.1",
           "cpe:/o:freebsd:freebsd:2.2",
           "cpe:/o:freebsd:freebsd:2.0",
           "cpe:/o:openbsd:openbsd:2.3",
           "cpe:/o:freebsd:freebsd:3.0",
           "cpe:/o:freebsd:freebsd:1.1",
           "cpe:/o:freebsd:freebsd:2.1.6",
           "cpe:/o:openbsd:openbsd:2.4",
           "cpe:/o:bsdi:bsd_os:3.1",
           "cpe:/o:freebsd:freebsd:1.0",
           "cpe:/o:freebsd:freebsd:2.1.7",
           "cpe:/o:freebsd:freebsd:1.2",
           "cpe:/o:freebsd:freebsd:2.1.5",
           "cpe:/o:freebsd:freebsd:2.1.7.1"],

And my problem is that I need to strip the cpe:/o part and I also need 
to tokenize words using the (:) as a separator so that I can then search 
for "freebsd 1.1" or "openbsd 2.4" or just "freebsd".

Thanks in advance.

Joe

Re: How do you parse the data in a field that is returned from a query?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

The main question then is whether the full
"cpe:/o:freebsd:freebsd:2.2.5" string needs to be stored in Solr.

If the desire is to actually strip that prefix all together and never
see it in Solr document, then Jack's suggestion is spot on. If it is
to store as is but to index based on custom tokenization rules, then
it needs to be done after DIH in the field's analyzer chain.

Good news, either way should be doable.

Regards,
   Alex.

----
Sign up for my Solr resources newsletter at http://www.solr-start.com/

On 24 January 2015 at 16:21, Carl Roberts <ca...@gmail.com> wrote:
> Thanks Jack.
>
>
> On 1/24/15, 3:57 PM, Jack Krupansky wrote:
>>
>> Take a look at the RegexTransformer. Or,in some cases your may need to use
>> the raw ScriptTransformer.
>>

Re: How do you parse the data in a field that is returned from a query?

Posted by Carl Roberts <ca...@gmail.com>.

Thanks Jack.

On 1/24/15, 3:57 PM, Jack Krupansky wrote:
> Take a look at the RegexTransformer. Or,in some cases your may need to use
> the raw ScriptTransformer.
>
> See:
> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler
>
> -- Jack Krupansky
>
> On Sat, Jan 24, 2015 at 3:49 PM, Carl Roberts <carl.roberts.zapata@gmail.com
>> wrote:
>> Via this rss-data-config.xml file and a class that I wrote (attached) to
>> download and XML file from a ZIP URL:
>>
>> <dataConfig>
>>      <dataSource type="ZIPURLDataSource" connectionTimeout="15000"
>> readTimeout="30000"/>
>>      <document>
>>          <entity name="cve-2002"
>>                  pk="id"
>> url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"
>>                  processor="XPathEntityProcessor"
>>                  forEach="/nvd/entry">
>>              <field column="id" xpath="/nvd/entry/@id" commonField="false"
>> />
>>              <field column="cve" xpath="/nvd/entry/cve-id"
>> commonField="false" />
>>              <field column="cwe" xpath="/nvd/entry/cwe/@id"
>> commonField="false" />
>>              <field column="vulnerable-configuration"
>> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name"
>> commonField="false" />
>>              <field column="vulnerable-software"
>> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" />
>>              <field column="published" xpath="/nvd/entry/published-datetime"
>> commonField="false" />
>>              <field column="modified" xpath="/nvd/entry/last-modified-datetime"
>> commonField="false" />
>>              <field column="summary" xpath="/nvd/entry/summary"
>> commonField="false" />
>>          </entity>
>>          <entity name="cve-2003"
>>                  pk="id"
>> url="http://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2003.xml.zip"
>>                  processor="XPathEntityProcessor"
>>                  forEach="/nvd/entry">
>>              <field column="id" xpath="/nvd/entry/@id" commonField="false"
>> />
>>              <field column="cve" xpath="/nvd/entry/cve-id"
>> commonField="false" />
>>              <field column="cwe" xpath="/nvd/entry/cwe/@id"
>> commonField="false" />
>>              <field column="vulnerable-configuration"
>> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name"
>> commonField="false" />
>>              <field column="vulnerable-software"
>> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" />
>>              <field column="published" xpath="/nvd/entry/published-datetime"
>> commonField="false" />
>>              <field column="modified" xpath="/nvd/entry/last-modified-datetime"
>> commonField="false" />
>>              <field column="summary" xpath="/nvd/entry/summary"
>> commonField="false" />
>>          </entity>
>>          <!--
>>          <entity name="nvd-rss-update"
>>                  pk="link"
>>                  url="https://nvd.nist.gov/download/nvd-rss.xml"
>>                  processor="XPathEntityProcessor"
>>                  forEach="/RDF/item"
>>                  transformer="DateFormatTransformer"
>>                  preImportDeleteQuery="">
>>              <field column="id" xpath="/RDF/item/title" commonField="true"
>> />
>>              <field column="link" xpath="/RDF/item/link" commonField="true"
>> />
>>              <field column="summary" xpath="/RDF/item/description"
>> commonField="true" />
>>              <field column="date" xpath="/RDF/item/date" commonField="true"
>> />
>>          </entity>
>>          -->
>>      </document>
>> </dataConfig>
>>
>>
>> On 1/24/15, 3:45 PM, Jack Krupansky wrote:
>>
>>> How are you currently importing data?
>>>
>>> -- Jack Krupansky
>>>
>>> On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts <
>>> carl.roberts.zapata@gmail.com
>>>
>>>> wrote:
>>>> Sorry if I was not clear.  What I am asking is this:
>>>>
>>>> How can I parse the data during import to tokenize it by (:) and strip
>>>> the
>>>> cpe:/o?
>>>>
>>>>
>>>>
>>>> On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote:
>>>>
>>>>   You are using keywords here that seem to contradict with each other.
>>>>> Or your use case is not clear.
>>>>>
>>>>> Specifically, you are saying you are getting stuff from a (Solr?)
>>>>> query. So, the results are now outside of Solr. Then you are asking
>>>>> for help to strip stuff off it. Well, it's outside of Solr, do
>>>>> whatever you want with it!
>>>>>
>>>>> But then at the end, you say you want to search for whatever you
>>>>> stripped off. So, that should be back in Solr again?
>>>>>
>>>>> Or are you asking something along these lines:
>>>>> 1. I have a multiValued field with the following sample content... (it
>>>>> does not matter to Solr where it comes from)
>>>>> 2. I wanted it returned as is, but I want to be able to find documents
>>>>> when somebody searches for X, Y, or Z
>>>>> 3. What would be the best analyzer chain to be able to do so?
>>>>>
>>>>> Regards,
>>>>>       Alex.
>>>>> ----
>>>>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>>>>
>>>>>
>>>>> On 24 January 2015 at 15:04, Carl Roberts <
>>>>> carl.roberts.zapata@gmail.com>
>>>>> wrote:
>>>>>
>>>>>   Hi,
>>>>>> How can I parse the data in a field that is returned from a query?
>>>>>>
>>>>>> Basically,
>>>>>>
>>>>>> I have a multi-valued field that contains values such as these that are
>>>>>> returned from a query:
>>>>>>
>>>>>>              "cpe:/o:freebsd:freebsd:1.1.5.1",
>>>>>>              "cpe:/o:freebsd:freebsd:2.2.3",
>>>>>>              "cpe:/o:freebsd:freebsd:2.2.2",
>>>>>>              "cpe:/o:freebsd:freebsd:2.2.5",
>>>>>>              "cpe:/o:freebsd:freebsd:2.2.4",
>>>>>>              "cpe:/o:freebsd:freebsd:2.0.5",
>>>>>>              "cpe:/o:freebsd:freebsd:2.2.6",
>>>>>>              "cpe:/o:freebsd:freebsd:2.1.6.1",
>>>>>>              "cpe:/o:freebsd:freebsd:2.0.1",
>>>>>>              "cpe:/o:freebsd:freebsd:2.2",
>>>>>>              "cpe:/o:freebsd:freebsd:2.0",
>>>>>>              "cpe:/o:openbsd:openbsd:2.3",
>>>>>>              "cpe:/o:freebsd:freebsd:3.0",
>>>>>>              "cpe:/o:freebsd:freebsd:1.1",
>>>>>>              "cpe:/o:freebsd:freebsd:2.1.6",
>>>>>>              "cpe:/o:openbsd:openbsd:2.4",
>>>>>>              "cpe:/o:bsdi:bsd_os:3.1",
>>>>>>              "cpe:/o:freebsd:freebsd:1.0",
>>>>>>              "cpe:/o:freebsd:freebsd:2.1.7",
>>>>>>              "cpe:/o:freebsd:freebsd:1.2",
>>>>>>              "cpe:/o:freebsd:freebsd:2.1.5",
>>>>>>              "cpe:/o:freebsd:freebsd:2.1.7.1"],
>>>>>>
>>>>>> And my problem is that I need to strip the cpe:/o part and I also need
>>>>>> to
>>>>>> tokenize words using the (:) as a separator so that I can then search
>>>>>> for
>>>>>> "freebsd 1.1" or "openbsd 2.4" or just "freebsd".
>>>>>>
>>>>>> Thanks in advance.
>>>>>>
>>>>>> Joe
>>>>>>
>>>>>>

Re: How do you parse the data in a field that is returned from a query?

Posted by Jack Krupansky <ja...@gmail.com>.

Take a look at the RegexTransformer. Or,in some cases your may need to use
the raw ScriptTransformer.

See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

-- Jack Krupansky

On Sat, Jan 24, 2015 at 3:49 PM, Carl Roberts <carl.roberts.zapata@gmail.com
> wrote:

> Via this rss-data-config.xml file and a class that I wrote (attached) to
> download and XML file from a ZIP URL:
>
> <dataConfig>
>     <dataSource type="ZIPURLDataSource" connectionTimeout="15000"
> readTimeout="30000"/>
>     <document>
>         <entity name="cve-2002"
>                 pk="id"
> url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"
>                 processor="XPathEntityProcessor"
>                 forEach="/nvd/entry">
>             <field column="id" xpath="/nvd/entry/@id" commonField="false"
> />
>             <field column="cve" xpath="/nvd/entry/cve-id"
> commonField="false" />
>             <field column="cwe" xpath="/nvd/entry/cwe/@id"
> commonField="false" />
>             <field column="vulnerable-configuration"
> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name"
> commonField="false" />
>             <field column="vulnerable-software"
> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" />
>             <field column="published" xpath="/nvd/entry/published-datetime"
> commonField="false" />
>             <field column="modified" xpath="/nvd/entry/last-modified-datetime"
> commonField="false" />
>             <field column="summary" xpath="/nvd/entry/summary"
> commonField="false" />
>         </entity>
>         <entity name="cve-2003"
>                 pk="id"
> url="http://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2003.xml.zip"
>                 processor="XPathEntityProcessor"
>                 forEach="/nvd/entry">
>             <field column="id" xpath="/nvd/entry/@id" commonField="false"
> />
>             <field column="cve" xpath="/nvd/entry/cve-id"
> commonField="false" />
>             <field column="cwe" xpath="/nvd/entry/cwe/@id"
> commonField="false" />
>             <field column="vulnerable-configuration"
> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name"
> commonField="false" />
>             <field column="vulnerable-software"
> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" />
>             <field column="published" xpath="/nvd/entry/published-datetime"
> commonField="false" />
>             <field column="modified" xpath="/nvd/entry/last-modified-datetime"
> commonField="false" />
>             <field column="summary" xpath="/nvd/entry/summary"
> commonField="false" />
>         </entity>
>         <!--
>         <entity name="nvd-rss-update"
>                 pk="link"
>                 url="https://nvd.nist.gov/download/nvd-rss.xml"
>                 processor="XPathEntityProcessor"
>                 forEach="/RDF/item"
>                 transformer="DateFormatTransformer"
>                 preImportDeleteQuery="">
>             <field column="id" xpath="/RDF/item/title" commonField="true"
> />
>             <field column="link" xpath="/RDF/item/link" commonField="true"
> />
>             <field column="summary" xpath="/RDF/item/description"
> commonField="true" />
>             <field column="date" xpath="/RDF/item/date" commonField="true"
> />
>         </entity>
>         -->
>     </document>
> </dataConfig>
>
>
> On 1/24/15, 3:45 PM, Jack Krupansky wrote:
>
>> How are you currently importing data?
>>
>> -- Jack Krupansky
>>
>> On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts <
>> carl.roberts.zapata@gmail.com
>>
>>> wrote:
>>> Sorry if I was not clear.  What I am asking is this:
>>>
>>> How can I parse the data during import to tokenize it by (:) and strip
>>> the
>>> cpe:/o?
>>>
>>>
>>>
>>> On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote:
>>>
>>>  You are using keywords here that seem to contradict with each other.
>>>> Or your use case is not clear.
>>>>
>>>> Specifically, you are saying you are getting stuff from a (Solr?)
>>>> query. So, the results are now outside of Solr. Then you are asking
>>>> for help to strip stuff off it. Well, it's outside of Solr, do
>>>> whatever you want with it!
>>>>
>>>> But then at the end, you say you want to search for whatever you
>>>> stripped off. So, that should be back in Solr again?
>>>>
>>>> Or are you asking something along these lines:
>>>> 1. I have a multiValued field with the following sample content... (it
>>>> does not matter to Solr where it comes from)
>>>> 2. I wanted it returned as is, but I want to be able to find documents
>>>> when somebody searches for X, Y, or Z
>>>> 3. What would be the best analyzer chain to be able to do so?
>>>>
>>>> Regards,
>>>>      Alex.
>>>> ----
>>>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>>>
>>>>
>>>> On 24 January 2015 at 15:04, Carl Roberts <
>>>> carl.roberts.zapata@gmail.com>
>>>> wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> How can I parse the data in a field that is returned from a query?
>>>>>
>>>>> Basically,
>>>>>
>>>>> I have a multi-valued field that contains values such as these that are
>>>>> returned from a query:
>>>>>
>>>>>             "cpe:/o:freebsd:freebsd:1.1.5.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.3",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.2",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.5",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.4",
>>>>>             "cpe:/o:freebsd:freebsd:2.0.5",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.6",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.6.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.0.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.2",
>>>>>             "cpe:/o:freebsd:freebsd:2.0",
>>>>>             "cpe:/o:openbsd:openbsd:2.3",
>>>>>             "cpe:/o:freebsd:freebsd:3.0",
>>>>>             "cpe:/o:freebsd:freebsd:1.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.6",
>>>>>             "cpe:/o:openbsd:openbsd:2.4",
>>>>>             "cpe:/o:bsdi:bsd_os:3.1",
>>>>>             "cpe:/o:freebsd:freebsd:1.0",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.7",
>>>>>             "cpe:/o:freebsd:freebsd:1.2",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.5",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.7.1"],
>>>>>
>>>>> And my problem is that I need to strip the cpe:/o part and I also need
>>>>> to
>>>>> tokenize words using the (:) as a separator so that I can then search
>>>>> for
>>>>> "freebsd 1.1" or "openbsd 2.4" or just "freebsd".
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Joe
>>>>>
>>>>>
>

Re: How do you parse the data in a field that is returned from a query?

Posted by Carl Roberts <ca...@gmail.com>.

The unzipped XML that I am reading looks like this:



<nvd xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.1" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:patch="http://scap.nist.gov/schema/patch/0.1" 
xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4" 
xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2" 
xmlns:cpe-lang="http://cpe.mitre.org/language/2.0" 
xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0" 
pub_date="2015-01-10T05:37:05" 
xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1 
http://nvd.nist.gov/schema/patch_0.1.xsd 
http://scap.nist.gov/schema/scap-core/0.1 
http://nvd.nist.gov/schema/scap-core_0.1.xsd 
http://scap.nist.gov/schema/feed/vulnerability/2.0 
http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd" nvd_xml_version="2.0">
   <entry id="CVE-1999-0001">
     <vuln:vulnerable-configuration id="http://nvd.nist.gov/">
       <cpe-lang:logical-test operator="OR" negate="false">
         <cpe-lang:fact-ref name="cpe:/o:bsdi:bsd_os:3.1"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.0"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.1"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.1.5.1"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:1.2"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.0"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.0.5"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.5"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.6"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.6.1"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.7"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.1.7.1"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.3"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.4"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.5"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.6"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.8"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:3.0"/>
         <cpe-lang:fact-ref name="cpe:/o:openbsd:openbsd:2.3"/>
         <cpe-lang:fact-ref name="cpe:/o:openbsd:openbsd:2.4"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.2.2"/>
         <cpe-lang:fact-ref name="cpe:/o:freebsd:freebsd:2.0.1"/>
       </cpe-lang:logical-test>
     </vuln:vulnerable-configuration>
     <vuln:vulnerable-software-list>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.8</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:1.1.5.1</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.3</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.2</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.5</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.4</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.0.5</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2.6</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.1.6.1</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.0.1</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.2</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.0</vuln:product>
<vuln:product>cpe:/o:openbsd:openbsd:2.3</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:3.0</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:1.1</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.1.6</vuln:product>
<vuln:product>cpe:/o:openbsd:openbsd:2.4</vuln:product>
<vuln:product>cpe:/o:bsdi:bsd_os:3.1</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:1.0</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.1.7</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:1.2</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.1.5</vuln:product>
<vuln:product>cpe:/o:freebsd:freebsd:2.1.7.1</vuln:product>
     </vuln:vulnerable-software-list>
     <vuln:cve-id>CVE-1999-0001</vuln:cve-id>
<vuln:published-datetime>1999-12-30T00:00:00.000-05:00</vuln:published-datetime>
<vuln:last-modified-datetime>2010-12-16T00:00:00.000-05:00</vuln:last-modified-datetime>
     <vuln:cvss>
       <cvss:base_metrics>
         <cvss:score>5.0</cvss:score>
         <cvss:access-vector>NETWORK</cvss:access-vector>
<cvss:access-complexity>LOW</cvss:access-complexity>
         <cvss:authentication>NONE</cvss:authentication>
<cvss:confidentiality-impact>NONE</cvss:confidentiality-impact>
<cvss:integrity-impact>NONE</cvss:integrity-impact>
<cvss:availability-impact>PARTIAL</cvss:availability-impact>
         <cvss:source>http://nvd.nist.gov</cvss:source>
<cvss:generated-on-datetime>2004-01-01T00:00:00.000-05:00</cvss:generated-on-datetime>
       </cvss:base_metrics>
     </vuln:cvss>
     <vuln:cwe id="CWE-20"/>
     <vuln:references reference_type="UNKNOWN" xml:lang="en">
       <vuln:source>OSVDB</vuln:source>
       <vuln:reference href="http://www.osvdb.org/5707" 
xml:lang="en">5707</vuln:reference>
     </vuln:references>
     <vuln:references reference_type="UNKNOWN" xml:lang="en">
       <vuln:source>CONFIRM</vuln:source>
       <vuln:reference 
href="http://www.openbsd.org/errata23.html#tcpfix" 
xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix</vuln:reference>
     </vuln:references>
     <vuln:summary>ip_input.c in BSD-derived TCP/IP implementations 
allows remote attackers to cause a denial of service (crash or hang) via 
crafted packets.</vuln:summary>
   </entry>

On 1/24/15, 3:49 PM, Carl Roberts wrote:
> Via this rss-data-config.xml file and a class that I wrote (attached) 
> to download and XML file from a ZIP URL:
>
> <dataConfig>
>     <dataSource type="ZIPURLDataSource" connectionTimeout="15000" 
> readTimeout="30000"/>
>     <document>
>         <entity name="cve-2002"
>                 pk="id"
> url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"
>                 processor="XPathEntityProcessor"
>                 forEach="/nvd/entry">
>             <field column="id" xpath="/nvd/entry/@id" 
> commonField="false" />
>             <field column="cve" xpath="/nvd/entry/cve-id" 
> commonField="false" />
>             <field column="cwe" xpath="/nvd/entry/cwe/@id" 
> commonField="false" />
>             <field column="vulnerable-configuration" 
> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false" 
> />
>             <field column="vulnerable-software" 
> xpath="/nvd/entry/vulnerable-software-list/product" 
> commonField="false" />
>             <field column="published" 
> xpath="/nvd/entry/published-datetime" commonField="false" />
>             <field column="modified" 
> xpath="/nvd/entry/last-modified-datetime" commonField="false" />
>             <field column="summary" xpath="/nvd/entry/summary" 
> commonField="false" />
>         </entity>
>         <entity name="cve-2003"
>                 pk="id"
> url="http://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2003.xml.zip"
>                 processor="XPathEntityProcessor"
>                 forEach="/nvd/entry">
>             <field column="id" xpath="/nvd/entry/@id" 
> commonField="false" />
>             <field column="cve" xpath="/nvd/entry/cve-id" 
> commonField="false" />
>             <field column="cwe" xpath="/nvd/entry/cwe/@id" 
> commonField="false" />
>             <field column="vulnerable-configuration" 
> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false" 
> />
>             <field column="vulnerable-software" 
> xpath="/nvd/entry/vulnerable-software-list/product" 
> commonField="false" />
>             <field column="published" 
> xpath="/nvd/entry/published-datetime" commonField="false" />
>             <field column="modified" 
> xpath="/nvd/entry/last-modified-datetime" commonField="false" />
>             <field column="summary" xpath="/nvd/entry/summary" 
> commonField="false" />
>         </entity>
>         <!--
>         <entity name="nvd-rss-update"
>                 pk="link"
>                 url="https://nvd.nist.gov/download/nvd-rss.xml"
>                 processor="XPathEntityProcessor"
>                 forEach="/RDF/item"
>                 transformer="DateFormatTransformer"
>                 preImportDeleteQuery="">
>             <field column="id" xpath="/RDF/item/title" 
> commonField="true" />
>             <field column="link" xpath="/RDF/item/link" 
> commonField="true" />
>             <field column="summary" xpath="/RDF/item/description" 
> commonField="true" />
>             <field column="date" xpath="/RDF/item/date" 
> commonField="true" />
>         </entity>
>         -->
>     </document>
> </dataConfig>
>
> On 1/24/15, 3:45 PM, Jack Krupansky wrote:
>> How are you currently importing data?
>>
>> -- Jack Krupansky
>>
>> On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts 
>> <carl.roberts.zapata@gmail.com
>>> wrote:
>>> Sorry if I was not clear.  What I am asking is this:
>>>
>>> How can I parse the data during import to tokenize it by (:) and 
>>> strip the
>>> cpe:/o?
>>>
>>>
>>>
>>> On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote:
>>>
>>>> You are using keywords here that seem to contradict with each other.
>>>> Or your use case is not clear.
>>>>
>>>> Specifically, you are saying you are getting stuff from a (Solr?)
>>>> query. So, the results are now outside of Solr. Then you are asking
>>>> for help to strip stuff off it. Well, it's outside of Solr, do
>>>> whatever you want with it!
>>>>
>>>> But then at the end, you say you want to search for whatever you
>>>> stripped off. So, that should be back in Solr again?
>>>>
>>>> Or are you asking something along these lines:
>>>> 1. I have a multiValued field with the following sample content... (it
>>>> does not matter to Solr where it comes from)
>>>> 2. I wanted it returned as is, but I want to be able to find documents
>>>> when somebody searches for X, Y, or Z
>>>> 3. What would be the best analyzer chain to be able to do so?
>>>>
>>>> Regards,
>>>>      Alex.
>>>> ----
>>>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>>>
>>>>
>>>> On 24 January 2015 at 15:04, Carl Roberts 
>>>> <ca...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> How can I parse the data in a field that is returned from a query?
>>>>>
>>>>> Basically,
>>>>>
>>>>> I have a multi-valued field that contains values such as these 
>>>>> that are
>>>>> returned from a query:
>>>>>
>>>>>             "cpe:/o:freebsd:freebsd:1.1.5.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.3",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.2",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.5",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.4",
>>>>>             "cpe:/o:freebsd:freebsd:2.0.5",
>>>>>             "cpe:/o:freebsd:freebsd:2.2.6",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.6.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.0.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.2",
>>>>>             "cpe:/o:freebsd:freebsd:2.0",
>>>>>             "cpe:/o:openbsd:openbsd:2.3",
>>>>>             "cpe:/o:freebsd:freebsd:3.0",
>>>>>             "cpe:/o:freebsd:freebsd:1.1",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.6",
>>>>>             "cpe:/o:openbsd:openbsd:2.4",
>>>>>             "cpe:/o:bsdi:bsd_os:3.1",
>>>>>             "cpe:/o:freebsd:freebsd:1.0",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.7",
>>>>>             "cpe:/o:freebsd:freebsd:1.2",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.5",
>>>>>             "cpe:/o:freebsd:freebsd:2.1.7.1"],
>>>>>
>>>>> And my problem is that I need to strip the cpe:/o part and I also 
>>>>> need to
>>>>> tokenize words using the (:) as a separator so that I can then 
>>>>> search for
>>>>> "freebsd 1.1" or "openbsd 2.4" or just "freebsd".
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Joe
>>>>>
>

Re: How do you parse the data in a field that is returned from a query?

Posted by Carl Roberts <ca...@gmail.com>.

Via this rss-data-config.xml file and a class that I wrote (attached) to 
download and XML file from a ZIP URL:

<dataConfig>
     <dataSource type="ZIPURLDataSource" connectionTimeout="15000" 
readTimeout="30000"/>
     <document>
         <entity name="cve-2002"
                 pk="id"
url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"
                 processor="XPathEntityProcessor"
                 forEach="/nvd/entry">
             <field column="id" xpath="/nvd/entry/@id" 
commonField="false" />
             <field column="cve" xpath="/nvd/entry/cve-id" 
commonField="false" />
             <field column="cwe" xpath="/nvd/entry/cwe/@id" 
commonField="false" />
             <field column="vulnerable-configuration" 
xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" 
commonField="false" />
             <field column="vulnerable-software" 
xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" />
             <field column="published" 
xpath="/nvd/entry/published-datetime" commonField="false" />
             <field column="modified" 
xpath="/nvd/entry/last-modified-datetime" commonField="false" />
             <field column="summary" xpath="/nvd/entry/summary" 
commonField="false" />
         </entity>
         <entity name="cve-2003"
                 pk="id"
url="http://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2003.xml.zip"
                 processor="XPathEntityProcessor"
                 forEach="/nvd/entry">
             <field column="id" xpath="/nvd/entry/@id" 
commonField="false" />
             <field column="cve" xpath="/nvd/entry/cve-id" 
commonField="false" />
             <field column="cwe" xpath="/nvd/entry/cwe/@id" 
commonField="false" />
             <field column="vulnerable-configuration" 
xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" 
commonField="false" />
             <field column="vulnerable-software" 
xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" />
             <field column="published" 
xpath="/nvd/entry/published-datetime" commonField="false" />
             <field column="modified" 
xpath="/nvd/entry/last-modified-datetime" commonField="false" />
             <field column="summary" xpath="/nvd/entry/summary" 
commonField="false" />
         </entity>
         <!--
         <entity name="nvd-rss-update"
                 pk="link"
                 url="https://nvd.nist.gov/download/nvd-rss.xml"
                 processor="XPathEntityProcessor"
                 forEach="/RDF/item"
                 transformer="DateFormatTransformer"
                 preImportDeleteQuery="">
             <field column="id" xpath="/RDF/item/title" 
commonField="true" />
             <field column="link" xpath="/RDF/item/link" 
commonField="true" />
             <field column="summary" xpath="/RDF/item/description" 
commonField="true" />
             <field column="date" xpath="/RDF/item/date" 
commonField="true" />
         </entity>
         -->
     </document>
</dataConfig>

On 1/24/15, 3:45 PM, Jack Krupansky wrote:
> How are you currently importing data?
>
> -- Jack Krupansky
>
> On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts <carl.roberts.zapata@gmail.com
>> wrote:
>> Sorry if I was not clear.  What I am asking is this:
>>
>> How can I parse the data during import to tokenize it by (:) and strip the
>> cpe:/o?
>>
>>
>>
>> On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote:
>>
>>> You are using keywords here that seem to contradict with each other.
>>> Or your use case is not clear.
>>>
>>> Specifically, you are saying you are getting stuff from a (Solr?)
>>> query. So, the results are now outside of Solr. Then you are asking
>>> for help to strip stuff off it. Well, it's outside of Solr, do
>>> whatever you want with it!
>>>
>>> But then at the end, you say you want to search for whatever you
>>> stripped off. So, that should be back in Solr again?
>>>
>>> Or are you asking something along these lines:
>>> 1. I have a multiValued field with the following sample content... (it
>>> does not matter to Solr where it comes from)
>>> 2. I wanted it returned as is, but I want to be able to find documents
>>> when somebody searches for X, Y, or Z
>>> 3. What would be the best analyzer chain to be able to do so?
>>>
>>> Regards,
>>>      Alex.
>>> ----
>>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>>
>>>
>>> On 24 January 2015 at 15:04, Carl Roberts <ca...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> How can I parse the data in a field that is returned from a query?
>>>>
>>>> Basically,
>>>>
>>>> I have a multi-valued field that contains values such as these that are
>>>> returned from a query:
>>>>
>>>>             "cpe:/o:freebsd:freebsd:1.1.5.1",
>>>>             "cpe:/o:freebsd:freebsd:2.2.3",
>>>>             "cpe:/o:freebsd:freebsd:2.2.2",
>>>>             "cpe:/o:freebsd:freebsd:2.2.5",
>>>>             "cpe:/o:freebsd:freebsd:2.2.4",
>>>>             "cpe:/o:freebsd:freebsd:2.0.5",
>>>>             "cpe:/o:freebsd:freebsd:2.2.6",
>>>>             "cpe:/o:freebsd:freebsd:2.1.6.1",
>>>>             "cpe:/o:freebsd:freebsd:2.0.1",
>>>>             "cpe:/o:freebsd:freebsd:2.2",
>>>>             "cpe:/o:freebsd:freebsd:2.0",
>>>>             "cpe:/o:openbsd:openbsd:2.3",
>>>>             "cpe:/o:freebsd:freebsd:3.0",
>>>>             "cpe:/o:freebsd:freebsd:1.1",
>>>>             "cpe:/o:freebsd:freebsd:2.1.6",
>>>>             "cpe:/o:openbsd:openbsd:2.4",
>>>>             "cpe:/o:bsdi:bsd_os:3.1",
>>>>             "cpe:/o:freebsd:freebsd:1.0",
>>>>             "cpe:/o:freebsd:freebsd:2.1.7",
>>>>             "cpe:/o:freebsd:freebsd:1.2",
>>>>             "cpe:/o:freebsd:freebsd:2.1.5",
>>>>             "cpe:/o:freebsd:freebsd:2.1.7.1"],
>>>>
>>>> And my problem is that I need to strip the cpe:/o part and I also need to
>>>> tokenize words using the (:) as a separator so that I can then search for
>>>> "freebsd 1.1" or "openbsd 2.4" or just "freebsd".
>>>>
>>>> Thanks in advance.
>>>>
>>>> Joe
>>>>

Re: How do you parse the data in a field that is returned from a query?

Posted by Jack Krupansky <ja...@gmail.com>.

How are you currently importing data?

-- Jack Krupansky

On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts <carl.roberts.zapata@gmail.com
> wrote:

> Sorry if I was not clear.  What I am asking is this:
>
> How can I parse the data during import to tokenize it by (:) and strip the
> cpe:/o?
>
>
>
> On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote:
>
>> You are using keywords here that seem to contradict with each other.
>> Or your use case is not clear.
>>
>> Specifically, you are saying you are getting stuff from a (Solr?)
>> query. So, the results are now outside of Solr. Then you are asking
>> for help to strip stuff off it. Well, it's outside of Solr, do
>> whatever you want with it!
>>
>> But then at the end, you say you want to search for whatever you
>> stripped off. So, that should be back in Solr again?
>>
>> Or are you asking something along these lines:
>> 1. I have a multiValued field with the following sample content... (it
>> does not matter to Solr where it comes from)
>> 2. I wanted it returned as is, but I want to be able to find documents
>> when somebody searches for X, Y, or Z
>> 3. What would be the best analyzer chain to be able to do so?
>>
>> Regards,
>>     Alex.
>> ----
>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>
>>
>> On 24 January 2015 at 15:04, Carl Roberts <ca...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> How can I parse the data in a field that is returned from a query?
>>>
>>> Basically,
>>>
>>> I have a multi-valued field that contains values such as these that are
>>> returned from a query:
>>>
>>>            "cpe:/o:freebsd:freebsd:1.1.5.1",
>>>            "cpe:/o:freebsd:freebsd:2.2.3",
>>>            "cpe:/o:freebsd:freebsd:2.2.2",
>>>            "cpe:/o:freebsd:freebsd:2.2.5",
>>>            "cpe:/o:freebsd:freebsd:2.2.4",
>>>            "cpe:/o:freebsd:freebsd:2.0.5",
>>>            "cpe:/o:freebsd:freebsd:2.2.6",
>>>            "cpe:/o:freebsd:freebsd:2.1.6.1",
>>>            "cpe:/o:freebsd:freebsd:2.0.1",
>>>            "cpe:/o:freebsd:freebsd:2.2",
>>>            "cpe:/o:freebsd:freebsd:2.0",
>>>            "cpe:/o:openbsd:openbsd:2.3",
>>>            "cpe:/o:freebsd:freebsd:3.0",
>>>            "cpe:/o:freebsd:freebsd:1.1",
>>>            "cpe:/o:freebsd:freebsd:2.1.6",
>>>            "cpe:/o:openbsd:openbsd:2.4",
>>>            "cpe:/o:bsdi:bsd_os:3.1",
>>>            "cpe:/o:freebsd:freebsd:1.0",
>>>            "cpe:/o:freebsd:freebsd:2.1.7",
>>>            "cpe:/o:freebsd:freebsd:1.2",
>>>            "cpe:/o:freebsd:freebsd:2.1.5",
>>>            "cpe:/o:freebsd:freebsd:2.1.7.1"],
>>>
>>> And my problem is that I need to strip the cpe:/o part and I also need to
>>> tokenize words using the (:) as a separator so that I can then search for
>>> "freebsd 1.1" or "openbsd 2.4" or just "freebsd".
>>>
>>> Thanks in advance.
>>>
>>> Joe
>>>
>>
>

Re: How do you parse the data in a field that is returned from a query?

Posted by Carl Roberts <ca...@gmail.com>.

Sorry if I was not clear.  What I am asking is this:

How can I parse the data during import to tokenize it by (:) and strip 
the cpe:/o?


On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote:
> You are using keywords here that seem to contradict with each other.
> Or your use case is not clear.
>
> Specifically, you are saying you are getting stuff from a (Solr?)
> query. So, the results are now outside of Solr. Then you are asking
> for help to strip stuff off it. Well, it's outside of Solr, do
> whatever you want with it!
>
> But then at the end, you say you want to search for whatever you
> stripped off. So, that should be back in Solr again?
>
> Or are you asking something along these lines:
> 1. I have a multiValued field with the following sample content... (it
> does not matter to Solr where it comes from)
> 2. I wanted it returned as is, but I want to be able to find documents
> when somebody searches for X, Y, or Z
> 3. What would be the best analyzer chain to be able to do so?
>
> Regards,
>     Alex.
> ----
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 24 January 2015 at 15:04, Carl Roberts <ca...@gmail.com> wrote:
>> Hi,
>>
>> How can I parse the data in a field that is returned from a query?
>>
>> Basically,
>>
>> I have a multi-valued field that contains values such as these that are
>> returned from a query:
>>
>>            "cpe:/o:freebsd:freebsd:1.1.5.1",
>>            "cpe:/o:freebsd:freebsd:2.2.3",
>>            "cpe:/o:freebsd:freebsd:2.2.2",
>>            "cpe:/o:freebsd:freebsd:2.2.5",
>>            "cpe:/o:freebsd:freebsd:2.2.4",
>>            "cpe:/o:freebsd:freebsd:2.0.5",
>>            "cpe:/o:freebsd:freebsd:2.2.6",
>>            "cpe:/o:freebsd:freebsd:2.1.6.1",
>>            "cpe:/o:freebsd:freebsd:2.0.1",
>>            "cpe:/o:freebsd:freebsd:2.2",
>>            "cpe:/o:freebsd:freebsd:2.0",
>>            "cpe:/o:openbsd:openbsd:2.3",
>>            "cpe:/o:freebsd:freebsd:3.0",
>>            "cpe:/o:freebsd:freebsd:1.1",
>>            "cpe:/o:freebsd:freebsd:2.1.6",
>>            "cpe:/o:openbsd:openbsd:2.4",
>>            "cpe:/o:bsdi:bsd_os:3.1",
>>            "cpe:/o:freebsd:freebsd:1.0",
>>            "cpe:/o:freebsd:freebsd:2.1.7",
>>            "cpe:/o:freebsd:freebsd:1.2",
>>            "cpe:/o:freebsd:freebsd:2.1.5",
>>            "cpe:/o:freebsd:freebsd:2.1.7.1"],
>>
>> And my problem is that I need to strip the cpe:/o part and I also need to
>> tokenize words using the (:) as a separator so that I can then search for
>> "freebsd 1.1" or "openbsd 2.4" or just "freebsd".
>>
>> Thanks in advance.
>>
>> Joe

Re: How do you parse the data in a field that is returned from a query?

Posted by Carl Roberts <ca...@gmail.com>.

Yes - I am using DIH and I am reading the info from an XML file using 
the URL datasource, and I want to strip the cpe:/o and tokenize the data 
by (:) during import so I can then search it as I've described. So, my 
question is this:

Is there any built in logic via a transformer class that could do this?  
If not, how would you recommend I do this?

Regards,

Joe

On 1/24/15, 3:38 PM, Jack Krupansky wrote:
> Or, maybe... he's using DIH and getting these values from an RDBMS database
> query and now wants to index them in Solr. Who knows!
>
> It might be simplest to transform the colons to spaces and use a normal
> text field. Although you could use a custom text field type that used a
> regex tokenizer which treated the colons as token separators.
>
>
> -- Jack Krupansky
>
> On Sat, Jan 24, 2015 at 3:28 PM, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
>
>> You are using keywords here that seem to contradict with each other.
>> Or your use case is not clear.
>>
>> Specifically, you are saying you are getting stuff from a (Solr?)
>> query. So, the results are now outside of Solr. Then you are asking
>> for help to strip stuff off it. Well, it's outside of Solr, do
>> whatever you want with it!
>>
>> But then at the end, you say you want to search for whatever you
>> stripped off. So, that should be back in Solr again?
>>
>> Or are you asking something along these lines:
>> 1. I have a multiValued field with the following sample content... (it
>> does not matter to Solr where it comes from)
>> 2. I wanted it returned as is, but I want to be able to find documents
>> when somebody searches for X, Y, or Z
>> 3. What would be the best analyzer chain to be able to do so?
>>
>> Regards,
>>     Alex.
>> ----
>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>
>>
>> On 24 January 2015 at 15:04, Carl Roberts <ca...@gmail.com>
>> wrote:
>>> Hi,
>>>
>>> How can I parse the data in a field that is returned from a query?
>>>
>>> Basically,
>>>
>>> I have a multi-valued field that contains values such as these that are
>>> returned from a query:
>>>
>>>            "cpe:/o:freebsd:freebsd:1.1.5.1",
>>>            "cpe:/o:freebsd:freebsd:2.2.3",
>>>            "cpe:/o:freebsd:freebsd:2.2.2",
>>>            "cpe:/o:freebsd:freebsd:2.2.5",
>>>            "cpe:/o:freebsd:freebsd:2.2.4",
>>>            "cpe:/o:freebsd:freebsd:2.0.5",
>>>            "cpe:/o:freebsd:freebsd:2.2.6",
>>>            "cpe:/o:freebsd:freebsd:2.1.6.1",
>>>            "cpe:/o:freebsd:freebsd:2.0.1",
>>>            "cpe:/o:freebsd:freebsd:2.2",
>>>            "cpe:/o:freebsd:freebsd:2.0",
>>>            "cpe:/o:openbsd:openbsd:2.3",
>>>            "cpe:/o:freebsd:freebsd:3.0",
>>>            "cpe:/o:freebsd:freebsd:1.1",
>>>            "cpe:/o:freebsd:freebsd:2.1.6",
>>>            "cpe:/o:openbsd:openbsd:2.4",
>>>            "cpe:/o:bsdi:bsd_os:3.1",
>>>            "cpe:/o:freebsd:freebsd:1.0",
>>>            "cpe:/o:freebsd:freebsd:2.1.7",
>>>            "cpe:/o:freebsd:freebsd:1.2",
>>>            "cpe:/o:freebsd:freebsd:2.1.5",
>>>            "cpe:/o:freebsd:freebsd:2.1.7.1"],
>>>
>>> And my problem is that I need to strip the cpe:/o part and I also need to
>>> tokenize words using the (:) as a separator so that I can then search for
>>> "freebsd 1.1" or "openbsd 2.4" or just "freebsd".
>>>
>>> Thanks in advance.
>>>
>>> Joe

Re: How do you parse the data in a field that is returned from a query?

Posted by Jack Krupansky <ja...@gmail.com>.

Or, maybe... he's using DIH and getting these values from an RDBMS database
query and now wants to index them in Solr. Who knows!

It might be simplest to transform the colons to spaces and use a normal
text field. Although you could use a custom text field type that used a
regex tokenizer which treated the colons as token separators.


-- Jack Krupansky

On Sat, Jan 24, 2015 at 3:28 PM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> You are using keywords here that seem to contradict with each other.
> Or your use case is not clear.
>
> Specifically, you are saying you are getting stuff from a (Solr?)
> query. So, the results are now outside of Solr. Then you are asking
> for help to strip stuff off it. Well, it's outside of Solr, do
> whatever you want with it!
>
> But then at the end, you say you want to search for whatever you
> stripped off. So, that should be back in Solr again?
>
> Or are you asking something along these lines:
> 1. I have a multiValued field with the following sample content... (it
> does not matter to Solr where it comes from)
> 2. I wanted it returned as is, but I want to be able to find documents
> when somebody searches for X, Y, or Z
> 3. What would be the best analyzer chain to be able to do so?
>
> Regards,
>    Alex.
> ----
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 24 January 2015 at 15:04, Carl Roberts <ca...@gmail.com>
> wrote:
> > Hi,
> >
> > How can I parse the data in a field that is returned from a query?
> >
> > Basically,
> >
> > I have a multi-valued field that contains values such as these that are
> > returned from a query:
> >
> >           "cpe:/o:freebsd:freebsd:1.1.5.1",
> >           "cpe:/o:freebsd:freebsd:2.2.3",
> >           "cpe:/o:freebsd:freebsd:2.2.2",
> >           "cpe:/o:freebsd:freebsd:2.2.5",
> >           "cpe:/o:freebsd:freebsd:2.2.4",
> >           "cpe:/o:freebsd:freebsd:2.0.5",
> >           "cpe:/o:freebsd:freebsd:2.2.6",
> >           "cpe:/o:freebsd:freebsd:2.1.6.1",
> >           "cpe:/o:freebsd:freebsd:2.0.1",
> >           "cpe:/o:freebsd:freebsd:2.2",
> >           "cpe:/o:freebsd:freebsd:2.0",
> >           "cpe:/o:openbsd:openbsd:2.3",
> >           "cpe:/o:freebsd:freebsd:3.0",
> >           "cpe:/o:freebsd:freebsd:1.1",
> >           "cpe:/o:freebsd:freebsd:2.1.6",
> >           "cpe:/o:openbsd:openbsd:2.4",
> >           "cpe:/o:bsdi:bsd_os:3.1",
> >           "cpe:/o:freebsd:freebsd:1.0",
> >           "cpe:/o:freebsd:freebsd:2.1.7",
> >           "cpe:/o:freebsd:freebsd:1.2",
> >           "cpe:/o:freebsd:freebsd:2.1.5",
> >           "cpe:/o:freebsd:freebsd:2.1.7.1"],
> >
> > And my problem is that I need to strip the cpe:/o part and I also need to
> > tokenize words using the (:) as a separator so that I can then search for
> > "freebsd 1.1" or "openbsd 2.4" or just "freebsd".
> >
> > Thanks in advance.
> >
> > Joe
>

Re: How do you parse the data in a field that is returned from a query?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

You are using keywords here that seem to contradict with each other.
Or your use case is not clear.

Specifically, you are saying you are getting stuff from a (Solr?)
query. So, the results are now outside of Solr. Then you are asking
for help to strip stuff off it. Well, it's outside of Solr, do
whatever you want with it!

But then at the end, you say you want to search for whatever you
stripped off. So, that should be back in Solr again?

Or are you asking something along these lines:
1. I have a multiValued field with the following sample content... (it
does not matter to Solr where it comes from)
2. I wanted it returned as is, but I want to be able to find documents
when somebody searches for X, Y, or Z
3. What would be the best analyzer chain to be able to do so?

Regards,
   Alex.
----
Sign up for my Solr resources newsletter at http://www.solr-start.com/

On 24 January 2015 at 15:04, Carl Roberts <ca...@gmail.com> wrote:
> Hi,
>
> How can I parse the data in a field that is returned from a query?
>
> Basically,
>
> I have a multi-valued field that contains values such as these that are
> returned from a query:
>
>           "cpe:/o:freebsd:freebsd:1.1.5.1",
>           "cpe:/o:freebsd:freebsd:2.2.3",
>           "cpe:/o:freebsd:freebsd:2.2.2",
>           "cpe:/o:freebsd:freebsd:2.2.5",
>           "cpe:/o:freebsd:freebsd:2.2.4",
>           "cpe:/o:freebsd:freebsd:2.0.5",
>           "cpe:/o:freebsd:freebsd:2.2.6",
>           "cpe:/o:freebsd:freebsd:2.1.6.1",
>           "cpe:/o:freebsd:freebsd:2.0.1",
>           "cpe:/o:freebsd:freebsd:2.2",
>           "cpe:/o:freebsd:freebsd:2.0",
>           "cpe:/o:openbsd:openbsd:2.3",
>           "cpe:/o:freebsd:freebsd:3.0",
>           "cpe:/o:freebsd:freebsd:1.1",
>           "cpe:/o:freebsd:freebsd:2.1.6",
>           "cpe:/o:openbsd:openbsd:2.4",
>           "cpe:/o:bsdi:bsd_os:3.1",
>           "cpe:/o:freebsd:freebsd:1.0",
>           "cpe:/o:freebsd:freebsd:2.1.7",
>           "cpe:/o:freebsd:freebsd:1.2",
>           "cpe:/o:freebsd:freebsd:2.1.5",
>           "cpe:/o:freebsd:freebsd:2.1.7.1"],
>
> And my problem is that I need to strip the cpe:/o part and I also need to
> tokenize words using the (:) as a separator so that I can then search for
> "freebsd 1.1" or "openbsd 2.4" or just "freebsd".
>
> Thanks in advance.
>
> Joe