You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Raymond Xie <xi...@gmail.com> on 2018/04/01 16:12:20 UTC

Re: How do I create a schema file for FIX data in Solr

Thanks to all.

FIX is a format standard of financial data. It contains lots of tags in
number with value for the tag, like 8=asdf, where 8 is the tag and asdf is
the tag's value. Each tag has its definition.

The sample msg in FIX format was in the original question.

All I need to do is to know how to paste the msg and get all tag's value.

I found so far a parser is what I need to start with., But I am more
concerning about how to create index in Solr on the extracted tag's value,
that is the first step, the next would be to customize the dashboard for
users to search with a value to find out which msg contains that value in
which tag and present users the whole msg as proof.

~~~sent from my cell phone, sorry if there is any typo

Rick Leir <rl...@leirtech.com> 于 2018年3月31日周六 下午6:00写道：

> Raymond
> Will you be streaming the FIX data, perhaps with aggregation? Just a
> thought, I have no experience with FIX. Streaming opens up lots of
> questions.
> Cheers -- Rick
>
> On March 31, 2018 2:33:25 PM EDT, Walter Underwood <wu...@wunderwood.org>
> wrote:
> >Looks like Financial Information Exchange data, but, as Shawn says, the
> >real problem is what you want to do with it.
> >
> >* What fields will be searched? Those are indexed.
> >* What fields will be returned in the result? Those are stored.
> >* What is the data type for each field?
> >
> >I often store the data for most of the fields because it makes
> >debugging search problems so much easier.
> >
> >wunder
> >Walter Underwood
> >wunder@wunderwood.org
> >http://observer.wunderwood.org/  (my blog)
> >
> >> On Mar 31, 2018, at 11:29 AM, Shawn Heisey <ap...@elyograg.org>
> >wrote:
> >>
> >> On 3/31/2018 12:21 PM, Raymond Xie wrote:
> >>> I just started using Solr to create a Searching function on our
> >existing
> >>> data.
> >>>
> >>> The existing data is in FIX format sample as below:
> >> <snip>
> >>> all the red tags (I didn't mark all of them) are fields with
> >definition
> >>> from FIX standard, I need to create index on all the tags, how do I
> >start?
> >>
> >> I do not know what FIX means, and there are no colors in your email.
> >>
> >> Can you elaborate?
> >>
> >> Fine-tuning the schema can be one of the most time-consuming parts of
> >setting up a Solr installation, and there are usually no easy quick
> >answers.  Exactly what to do will depend not only on the data that
> >you're indexing, but also what you want to do with it.
> >>
> >> Thanks,
> >> Shawn
> >>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How do I create a schema file for FIX data in Solr

Posted by Raymond Xie <xi...@gmail.com>.

Don't know why the mail list took away the highlighted color on the tags,
anyway, I have explained the data structure so hopefully you get the idea.

Thanks.

~~~sent from my cell phone, sorry if there is any typo

Raymond Xie <xi...@gmail.com> 于 2018年4月1日周日 下午12:24写道：

> At the moment I have no plans to stream the data.
>
> Note the raw data is saved in a Linux host, I need to do index on those
> raw data and provide search capabilities on the data.
>
> The data is in FIX, I believe I would need to parse the data and create
> index on the parsed data, I have never worked on FIXdata nor Solr, any
> ideas are greatly appreciated. Thanks lots in advance.
>
> Again, if you want to see the data a sample is in the original question.
>
> ~~~sent from my cell phone, sorry if there is any typo
>
> Raymond Xie <xi...@gmail.com> 于 2018年4月1日周日 下午12:12写道：
>
>> Thanks to all.
>>
>> FIX is a format standard of financial data. It contains lots of tags in
>> number with value for the tag, like 8=asdf, where 8 is the tag and asdf is
>> the tag's value. Each tag has its definition.
>>
>> The sample msg in FIX format was in the original question.
>>
>> All I need to do is to know how to paste the msg and get all tag's value.
>>
>> I found so far a parser is what I need to start with., But I am more
>> concerning about how to create index in Solr on the extracted tag's value,
>> that is the first step, the next would be to customize the dashboard for
>> users to search with a value to find out which msg contains that value in
>> which tag and present users the whole msg as proof.
>>
>> ~~~sent from my cell phone, sorry if there is any typo
>>
>> Rick Leir <rl...@leirtech.com> 于 2018年3月31日周六 下午6:00写道：
>>
>>> Raymond
>>> Will you be streaming the FIX data, perhaps with aggregation? Just a
>>> thought, I have no experience with FIX. Streaming opens up lots of
>>> questions.
>>> Cheers -- Rick
>>>
>>> On March 31, 2018 2:33:25 PM EDT, Walter Underwood <
>>> wunder@wunderwood.org> wrote:
>>> >Looks like Financial Information Exchange data, but, as Shawn says, the
>>> >real problem is what you want to do with it.
>>> >
>>> >* What fields will be searched? Those are indexed.
>>> >* What fields will be returned in the result? Those are stored.
>>> >* What is the data type for each field?
>>> >
>>> >I often store the data for most of the fields because it makes
>>> >debugging search problems so much easier.
>>> >
>>> >wunder
>>> >Walter Underwood
>>> >wunder@wunderwood.org
>>> >http://observer.wunderwood.org/  (my blog)
>>> >
>>> >> On Mar 31, 2018, at 11:29 AM, Shawn Heisey <ap...@elyograg.org>
>>> >wrote:
>>> >>
>>> >> On 3/31/2018 12:21 PM, Raymond Xie wrote:
>>> >>> I just started using Solr to create a Searching function on our
>>> >existing
>>> >>> data.
>>> >>>
>>> >>> The existing data is in FIX format sample as below:
>>> >> <snip>
>>> >>> all the red tags (I didn't mark all of them) are fields with
>>> >definition
>>> >>> from FIX standard, I need to create index on all the tags, how do I
>>> >start?
>>> >>
>>> >> I do not know what FIX means, and there are no colors in your email.
>>> >>
>>> >> Can you elaborate?
>>> >>
>>> >> Fine-tuning the schema can be one of the most time-consuming parts of
>>> >setting up a Solr installation, and there are usually no easy quick
>>> >answers.  Exactly what to do will depend not only on the data that
>>> >you're indexing, but also what you want to do with it.
>>> >>
>>> >> Thanks,
>>> >> Shawn
>>> >>
>>>
>>> --
>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>>
>>

Re: How do I create a schema file for FIX data in Solr

Posted by Raymond Xie <xi...@gmail.com>.

At the moment I have no plans to stream the data.

Note the raw data is saved in a Linux host, I need to do index on those raw
data and provide search capabilities on the data.

The data is in FIX, I believe I would need to parse the data and create
index on the parsed data, I have never worked on FIXdata nor Solr, any
ideas are greatly appreciated. Thanks lots in advance.

Again, if you want to see the data a sample is in the original question.

~~~sent from my cell phone, sorry if there is any typo

Raymond Xie <xi...@gmail.com> 于 2018年4月1日周日 下午12:12写道：

> Thanks to all.
>
> FIX is a format standard of financial data. It contains lots of tags in
> number with value for the tag, like 8=asdf, where 8 is the tag and asdf is
> the tag's value. Each tag has its definition.
>
> The sample msg in FIX format was in the original question.
>
> All I need to do is to know how to paste the msg and get all tag's value.
>
> I found so far a parser is what I need to start with., But I am more
> concerning about how to create index in Solr on the extracted tag's value,
> that is the first step, the next would be to customize the dashboard for
> users to search with a value to find out which msg contains that value in
> which tag and present users the whole msg as proof.
>
> ~~~sent from my cell phone, sorry if there is any typo
>
> Rick Leir <rl...@leirtech.com> 于 2018年3月31日周六 下午6:00写道：
>
>> Raymond
>> Will you be streaming the FIX data, perhaps with aggregation? Just a
>> thought, I have no experience with FIX. Streaming opens up lots of
>> questions.
>> Cheers -- Rick
>>
>> On March 31, 2018 2:33:25 PM EDT, Walter Underwood <wu...@wunderwood.org>
>> wrote:
>> >Looks like Financial Information Exchange data, but, as Shawn says, the
>> >real problem is what you want to do with it.
>> >
>> >* What fields will be searched? Those are indexed.
>> >* What fields will be returned in the result? Those are stored.
>> >* What is the data type for each field?
>> >
>> >I often store the data for most of the fields because it makes
>> >debugging search problems so much easier.
>> >
>> >wunder
>> >Walter Underwood
>> >wunder@wunderwood.org
>> >http://observer.wunderwood.org/  (my blog)
>> >
>> >> On Mar 31, 2018, at 11:29 AM, Shawn Heisey <ap...@elyograg.org>
>> >wrote:
>> >>
>> >> On 3/31/2018 12:21 PM, Raymond Xie wrote:
>> >>> I just started using Solr to create a Searching function on our
>> >existing
>> >>> data.
>> >>>
>> >>> The existing data is in FIX format sample as below:
>> >> <snip>
>> >>> all the red tags (I didn't mark all of them) are fields with
>> >definition
>> >>> from FIX standard, I need to create index on all the tags, how do I
>> >start?
>> >>
>> >> I do not know what FIX means, and there are no colors in your email.
>> >>
>> >> Can you elaborate?
>> >>
>> >> Fine-tuning the schema can be one of the most time-consuming parts of
>> >setting up a Solr installation, and there are usually no easy quick
>> >answers.  Exactly what to do will depend not only on the data that
>> >you're indexing, but also what you want to do with it.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>>
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>
>

Re: How do I create a schema file for FIX data in Solr

Posted by Rick Leir <rl...@leirtech.com>.

Ray
Have you looked around for an existing FIX to Solr conduit? If FIX is a common standard then I would expect that someone has done some work on this and github'd it.

Even just FIX to JSON.
Cheers -- Rick

On April 2, 2018 12:34:44 AM EDT, Raymond Xie <xi...@gmail.com> wrote:
>Thank you, Shawn, Rick and other readers,
>
>To Shawn:
>
>For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
>means BeginString, in this example, its value is  FIX.4.4.9, and 9
>means
>body length, it is 653 for this message, 35 is RIO, meaning the message
>type is RIO, 122 stands for OrigSendingTime and has a format of
>UTCTimestamp
>
>You can refer to this page for details: https://www.onixs.biz
>/fix-dictionary/4.2/fields_by_tag.html
>
>All the values are explained as string type.
>
>All the tag numbers are from FIX standard so it doesn't change (in my
>case)
>
>I expect a python program might be needed to parse the message and
>extract
>each tag's value, index is to be made on those extracted value as long
>as
>their field (tag) name.
>
>With index in place, ideally and naturally user will search for any
>keyword, however, in this case, most queries would be based on tag 37
>(Order ID) and 75 (Trade Date), there is another customized tag (not in
>the
>standard) Order Version to be queried on.
>
>I understand the parser creation would be a manual process, as long as
>I
>know or have a small sample program, I will do it myself and maybe
>adjust
>it as per need.
>
>To Rick:
>
>You mentioned creating JSON document, my understanding is a parser
>would be
>needed to generate that JSON document, do you have any existing example
>code?
>
>
>
>
>Thank you guys very much.
>
>
>
>
>
>
>
>
>
>*------------------------------------------------*
>*Sincerely yours,*
>
>
>*Raymond*
>
>On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <ap...@elyograg.org>
>wrote:
>
>> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>>
>>> FIX is a format standard of financial data. It contains lots of tags
>in
>>> number with value for the tag, like 8=asdf, where 8 is the tag and
>asdf is
>>> the tag's value. Each tag has its definition.
>>>
>>> The sample msg in FIX format was in the original question.
>>>
>>> All I need to do is to know how to paste the msg and get all tag's
>value.
>>>
>>> I found so far a parser is what I need to start with., But I am more
>>> concerning about how to create index in Solr on the extracted tag's
>value,
>>> that is the first step, the next would be to customize the dashboard
>for
>>> users to search with a value to find out which msg contains that
>value in
>>> which tag and present users the whole msg as proof.
>>>
>>
>> Most of Solr's functionality is provided by Lucene.  Lucene is a java
>API
>> that implements search functionality.  Solr bolts on some
>functionality on
>> top of Lucene, but doesn't really do anything to fundamentally change
>the
>> fact that you're dealing with a Lucene index.  So I'm going to mostly
>talk
>> about Lucene below.
>>
>> Lucene organizes data in a unit that we call a "document." An easy
>analogy
>> for this is that it is a lot like a row in a single database table. 
>It has
>> fields, each field has a type. Unless custom software is used, there
>is
>> really no support for data other than basic primitive types --
>numbers and
>> strings.  The only complex type that I can think of that Solr
>supports out
>> of the box is geospatial coordinates, and it might even support
>> multi-dimensional coordinates, but I'm not sure.  It's not all that
>complex
>> -- the field just stores and manipulates multiple numbers instead of
>one.
>> The Lucene API does support a FEW things that Solr doesn't implement.
> I
>> don't think those are applicable to what you're trying to do.
>>
>> Let's look at the first part of the data that you included in the
>first
>> message:
>>
>> 8=FIX.4.4 9=653 35=RIO
>>
>> Is "8" always a mixture of letters and numbers and periods? Is "9"
>always
>> a number, and is it always a WHOLE number?  Is "35" always letters?
>> Looking deeper to data that I didn't quote ... is "122" always a
>date/time
>> value?  Are the tag numbers always picked from a well-defined set, or
>do
>> they change?
>>
>> Assuming that the answers in the previous paragraph are found and a
>> configuration is created to deal with all of it ... how are you
>planning to
>> search it?  What kind of queries would you expect somebody to make? 
>That's
>> going to have a huge influence on how you configure things.
>>
>> Writing the schema is usually where people spend the most time when
>> they're setting up Solr.
>>
>> Thanks,
>> Shawn
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How do I create a schema file for FIX data in Solr

Posted by Adhyan Arizki <a....@gmail.com>.

Raymond,

May i suggest you to take a look at the examples given in Solr package?
Essentially you need to understand which field is to be searchable by the
application and what not. These FIX data can be represented i  JSON or XML.

To parse and upload the data to Solr, you can use different libraries out
there. Personally I have used SolrJ and Rsolr and they are essentially the
same.


On Mon, 2 Apr 2018, 12:35 Raymond Xie, <xi...@gmail.com> wrote:

> Thank you, Shawn, Rick and other readers,
>
> To Shawn:
>
> For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
> means BeginString, in this example, its value is  FIX.4.4.9, and 9 means
> body length, it is 653 for this message, 35 is RIO, meaning the message
> type is RIO, 122 stands for OrigSendingTime and has a format of
> UTCTimestamp
>
> You can refer to this page for details: https://www.onixs.biz
> /fix-dictionary/4.2/fields_by_tag.html
>
> All the values are explained as string type.
>
> All the tag numbers are from FIX standard so it doesn't change (in my case)
>
> I expect a python program might be needed to parse the message and extract
> each tag's value, index is to be made on those extracted value as long as
> their field (tag) name.
>
> With index in place, ideally and naturally user will search for any
> keyword, however, in this case, most queries would be based on tag 37
> (Order ID) and 75 (Trade Date), there is another customized tag (not in the
> standard) Order Version to be queried on.
>
> I understand the parser creation would be a manual process, as long as I
> know or have a small sample program, I will do it myself and maybe adjust
> it as per need.
>
> To Rick:
>
> You mentioned creating JSON document, my understanding is a parser would be
> needed to generate that JSON document, do you have any existing example
> code?
>
>
>
>
> Thank you guys very much.
>
>
>
>
>
>
>
>
>
> *------------------------------------------------*
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>
> > On 4/1/2018 10:12 AM, Raymond Xie wrote:
> >
> >> FIX is a format standard of financial data. It contains lots of tags in
> >> number with value for the tag, like 8=asdf, where 8 is the tag and asdf
> is
> >> the tag's value. Each tag has its definition.
> >>
> >> The sample msg in FIX format was in the original question.
> >>
> >> All I need to do is to know how to paste the msg and get all tag's
> value.
> >>
> >> I found so far a parser is what I need to start with., But I am more
> >> concerning about how to create index in Solr on the extracted tag's
> value,
> >> that is the first step, the next would be to customize the dashboard for
> >> users to search with a value to find out which msg contains that value
> in
> >> which tag and present users the whole msg as proof.
> >>
> >
> > Most of Solr's functionality is provided by Lucene.  Lucene is a java API
> > that implements search functionality.  Solr bolts on some functionality
> on
> > top of Lucene, but doesn't really do anything to fundamentally change the
> > fact that you're dealing with a Lucene index.  So I'm going to mostly
> talk
> > about Lucene below.
> >
> > Lucene organizes data in a unit that we call a "document." An easy
> analogy
> > for this is that it is a lot like a row in a single database table.  It
> has
> > fields, each field has a type. Unless custom software is used, there is
> > really no support for data other than basic primitive types -- numbers
> and
> > strings.  The only complex type that I can think of that Solr supports
> out
> > of the box is geospatial coordinates, and it might even support
> > multi-dimensional coordinates, but I'm not sure.  It's not all that
> complex
> > -- the field just stores and manipulates multiple numbers instead of one.
> > The Lucene API does support a FEW things that Solr doesn't implement.  I
> > don't think those are applicable to what you're trying to do.
> >
> > Let's look at the first part of the data that you included in the first
> > message:
> >
> > 8=FIX.4.4 9=653 35=RIO
> >
> > Is "8" always a mixture of letters and numbers and periods? Is "9" always
> > a number, and is it always a WHOLE number?  Is "35" always letters?
> > Looking deeper to data that I didn't quote ... is "122" always a
> date/time
> > value?  Are the tag numbers always picked from a well-defined set, or do
> > they change?
> >
> > Assuming that the answers in the previous paragraph are found and a
> > configuration is created to deal with all of it ... how are you planning
> to
> > search it?  What kind of queries would you expect somebody to make?
> That's
> > going to have a huge influence on how you configure things.
> >
> > Writing the schema is usually where people spend the most time when
> > they're setting up Solr.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: How do I create a schema file for FIX data in Solr

Posted by Raymond Xie <xi...@gmail.com>.

I'm talking to the author to find out, thanks.

~~~sent from my cell phone, sorry if there is any typo

Adhyan Arizki <a....@gmail.com> 于 2018年4月3日周二 下午1:38写道：

> Raymond,
>
> Seems you are having issue with the node environment. Likely the path isn't
> registered correctly judging from the error message. Note though, this is
> no longer related to Solr issue.
>
> On Tue, 3 Apr 2018, 23:00 Raymond Xie, <xi...@gmail.com> wrote:
>
> > Hi Rick,
> >
> > Following your suggestion I found
> https://github.com/SunGard-Labs/fix2json
> > which seems to be a fit;
> >
> > I followed the installation instruction and successfully installed the
> > fix2json on my Ubuntu host.
> >
> > sudo npm install -g fix2json
> >
> > I ran the same command as indicated in the git:
> >
> > fix2json -p dict/FIX50SP2.CME.xml XCME_MD_GE_FUT_20160315.gz
> >
> >
> > and I received error of:
> >
> > /usr/bin/env: ‘node’: No such file or directory
> >
> > It would be appreciated if you can point out what is missing here?
> >
> > Thank you again for your kind help.
> >
> >
> >
> > *------------------------------------------------*
> > *Sincerely yours,*
> >
> >
> > *Raymond*
> >
> > On Mon, Apr 2, 2018 at 9:30 AM, Raymond Xie <xi...@gmail.com>
> wrote:
> >
> > > Thank you Rick for the enlightening.
> > >
> > > I will get the FIX message parsed first and come back here later.
> > >
> > >
> > > *------------------------------------------------*
> > > *Sincerely yours,*
> > >
> > >
> > > *Raymond*
> > >
> > > On Mon, Apr 2, 2018 at 9:15 AM, Rick Leir <rl...@leirtech.com> wrote:
> > >
> > >> Google
> > >>    fix to json,
> > >> there are a few interesting leads.
> > >>
> > >> On April 2, 2018 12:34:44 AM EDT, Raymond Xie <xi...@gmail.com>
> > >> wrote:
> > >> >Thank you, Shawn, Rick and other readers,
> > >> >
> > >> >To Shawn:
> > >> >
> > >> >For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
> > >> >means BeginString, in this example, its value is  FIX.4.4.9, and 9
> > >> >means
> > >> >body length, it is 653 for this message, 35 is RIO, meaning the
> message
> > >> >type is RIO, 122 stands for OrigSendingTime and has a format of
> > >> >UTCTimestamp
> > >> >
> > >> >You can refer to this page for details: https://www.onixs.biz
> > >> >/fix-dictionary/4.2/fields_by_tag.html
> > >> >
> > >> >All the values are explained as string type.
> > >> >
> > >> >All the tag numbers are from FIX standard so it doesn't change (in my
> > >> >case)
> > >> >
> > >> >I expect a python program might be needed to parse the message and
> > >> >extract
> > >> >each tag's value, index is to be made on those extracted value as
> long
> > >> >as
> > >> >their field (tag) name.
> > >> >
> > >> >With index in place, ideally and naturally user will search for any
> > >> >keyword, however, in this case, most queries would be based on tag 37
> > >> >(Order ID) and 75 (Trade Date), there is another customized tag (not
> in
> > >> >the
> > >> >standard) Order Version to be queried on.
> > >> >
> > >> >I understand the parser creation would be a manual process, as long
> as
> > >> >I
> > >> >know or have a small sample program, I will do it myself and maybe
> > >> >adjust
> > >> >it as per need.
> > >> >
> > >> >To Rick:
> > >> >
> > >> >You mentioned creating JSON document, my understanding is a parser
> > >> >would be
> > >> >needed to generate that JSON document, do you have any existing
> example
> > >> >code?
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >Thank you guys very much.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >*------------------------------------------------*
> > >> >*Sincerely yours,*
> > >> >
> > >> >
> > >> >*Raymond*
> > >> >
> > >> >On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <ap...@elyograg.org>
> > >> >wrote:
> > >> >
> > >> >> On 4/1/2018 10:12 AM, Raymond Xie wrote:
> > >> >>
> > >> >>> FIX is a format standard of financial data. It contains lots of
> tags
> > >> >in
> > >> >>> number with value for the tag, like 8=asdf, where 8 is the tag and
> > >> >asdf is
> > >> >>> the tag's value. Each tag has its definition.
> > >> >>>
> > >> >>> The sample msg in FIX format was in the original question.
> > >> >>>
> > >> >>> All I need to do is to know how to paste the msg and get all tag's
> > >> >value.
> > >> >>>
> > >> >>> I found so far a parser is what I need to start with., But I am
> more
> > >> >>> concerning about how to create index in Solr on the extracted
> tag's
> > >> >value,
> > >> >>> that is the first step, the next would be to customize the
> dashboard
> > >> >for
> > >> >>> users to search with a value to find out which msg contains that
> > >> >value in
> > >> >>> which tag and present users the whole msg as proof.
> > >> >>>
> > >> >>
> > >> >> Most of Solr's functionality is provided by Lucene.  Lucene is a
> java
> > >> >API
> > >> >> that implements search functionality.  Solr bolts on some
> > >> >functionality on
> > >> >> top of Lucene, but doesn't really do anything to fundamentally
> change
> > >> >the
> > >> >> fact that you're dealing with a Lucene index.  So I'm going to
> mostly
> > >> >talk
> > >> >> about Lucene below.
> > >> >>
> > >> >> Lucene organizes data in a unit that we call a "document." An easy
> > >> >analogy
> > >> >> for this is that it is a lot like a row in a single database table.
> > >> >It has
> > >> >> fields, each field has a type. Unless custom software is used,
> there
> > >> >is
> > >> >> really no support for data other than basic primitive types --
> > >> >numbers and
> > >> >> strings.  The only complex type that I can think of that Solr
> > >> >supports out
> > >> >> of the box is geospatial coordinates, and it might even support
> > >> >> multi-dimensional coordinates, but I'm not sure.  It's not all that
> > >> >complex
> > >> >> -- the field just stores and manipulates multiple numbers instead
> of
> > >> >one.
> > >> >> The Lucene API does support a FEW things that Solr doesn't
> implement.
> > >> > I
> > >> >> don't think those are applicable to what you're trying to do.
> > >> >>
> > >> >> Let's look at the first part of the data that you included in the
> > >> >first
> > >> >> message:
> > >> >>
> > >> >> 8=FIX.4.4 9=653 35=RIO
> > >> >>
> > >> >> Is "8" always a mixture of letters and numbers and periods? Is "9"
> > >> >always
> > >> >> a number, and is it always a WHOLE number?  Is "35" always letters?
> > >> >> Looking deeper to data that I didn't quote ... is "122" always a
> > >> >date/time
> > >> >> value?  Are the tag numbers always picked from a well-defined set,
> or
> > >> >do
> > >> >> they change?
> > >> >>
> > >> >> Assuming that the answers in the previous paragraph are found and a
> > >> >> configuration is created to deal with all of it ... how are you
> > >> >planning to
> > >> >> search it?  What kind of queries would you expect somebody to make?
> > >> >That's
> > >> >> going to have a huge influence on how you configure things.
> > >> >>
> > >> >> Writing the schema is usually where people spend the most time when
> > >> >> they're setting up Solr.
> > >> >>
> > >> >> Thanks,
> > >> >> Shawn
> > >> >>
> > >> >>
> > >>
> > >> --
> > >> Sorry for being brief. Alternate email is rickleir at yahoo dot com
> > >
> > >
> > >
> >
>

Re: How do I create a schema file for FIX data in Solr

Posted by Adhyan Arizki <a....@gmail.com>.

Raymond,

Seems you are having issue with the node environment. Likely the path isn't
registered correctly judging from the error message. Note though, this is
no longer related to Solr issue.

On Tue, 3 Apr 2018, 23:00 Raymond Xie, <xi...@gmail.com> wrote:

> Hi Rick,
>
> Following your suggestion I found https://github.com/SunGard-Labs/fix2json
> which seems to be a fit;
>
> I followed the installation instruction and successfully installed the
> fix2json on my Ubuntu host.
>
> sudo npm install -g fix2json
>
> I ran the same command as indicated in the git:
>
> fix2json -p dict/FIX50SP2.CME.xml XCME_MD_GE_FUT_20160315.gz
>
>
> and I received error of:
>
> /usr/bin/env: ‘node’: No such file or directory
>
> It would be appreciated if you can point out what is missing here?
>
> Thank you again for your kind help.
>
>
>
> *------------------------------------------------*
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Mon, Apr 2, 2018 at 9:30 AM, Raymond Xie <xi...@gmail.com> wrote:
>
> > Thank you Rick for the enlightening.
> >
> > I will get the FIX message parsed first and come back here later.
> >
> >
> > *------------------------------------------------*
> > *Sincerely yours,*
> >
> >
> > *Raymond*
> >
> > On Mon, Apr 2, 2018 at 9:15 AM, Rick Leir <rl...@leirtech.com> wrote:
> >
> >> Google
> >>    fix to json,
> >> there are a few interesting leads.
> >>
> >> On April 2, 2018 12:34:44 AM EDT, Raymond Xie <xi...@gmail.com>
> >> wrote:
> >> >Thank you, Shawn, Rick and other readers,
> >> >
> >> >To Shawn:
> >> >
> >> >For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
> >> >means BeginString, in this example, its value is  FIX.4.4.9, and 9
> >> >means
> >> >body length, it is 653 for this message, 35 is RIO, meaning the message
> >> >type is RIO, 122 stands for OrigSendingTime and has a format of
> >> >UTCTimestamp
> >> >
> >> >You can refer to this page for details: https://www.onixs.biz
> >> >/fix-dictionary/4.2/fields_by_tag.html
> >> >
> >> >All the values are explained as string type.
> >> >
> >> >All the tag numbers are from FIX standard so it doesn't change (in my
> >> >case)
> >> >
> >> >I expect a python program might be needed to parse the message and
> >> >extract
> >> >each tag's value, index is to be made on those extracted value as long
> >> >as
> >> >their field (tag) name.
> >> >
> >> >With index in place, ideally and naturally user will search for any
> >> >keyword, however, in this case, most queries would be based on tag 37
> >> >(Order ID) and 75 (Trade Date), there is another customized tag (not in
> >> >the
> >> >standard) Order Version to be queried on.
> >> >
> >> >I understand the parser creation would be a manual process, as long as
> >> >I
> >> >know or have a small sample program, I will do it myself and maybe
> >> >adjust
> >> >it as per need.
> >> >
> >> >To Rick:
> >> >
> >> >You mentioned creating JSON document, my understanding is a parser
> >> >would be
> >> >needed to generate that JSON document, do you have any existing example
> >> >code?
> >> >
> >> >
> >> >
> >> >
> >> >Thank you guys very much.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >*------------------------------------------------*
> >> >*Sincerely yours,*
> >> >
> >> >
> >> >*Raymond*
> >> >
> >> >On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <ap...@elyograg.org>
> >> >wrote:
> >> >
> >> >> On 4/1/2018 10:12 AM, Raymond Xie wrote:
> >> >>
> >> >>> FIX is a format standard of financial data. It contains lots of tags
> >> >in
> >> >>> number with value for the tag, like 8=asdf, where 8 is the tag and
> >> >asdf is
> >> >>> the tag's value. Each tag has its definition.
> >> >>>
> >> >>> The sample msg in FIX format was in the original question.
> >> >>>
> >> >>> All I need to do is to know how to paste the msg and get all tag's
> >> >value.
> >> >>>
> >> >>> I found so far a parser is what I need to start with., But I am more
> >> >>> concerning about how to create index in Solr on the extracted tag's
> >> >value,
> >> >>> that is the first step, the next would be to customize the dashboard
> >> >for
> >> >>> users to search with a value to find out which msg contains that
> >> >value in
> >> >>> which tag and present users the whole msg as proof.
> >> >>>
> >> >>
> >> >> Most of Solr's functionality is provided by Lucene.  Lucene is a java
> >> >API
> >> >> that implements search functionality.  Solr bolts on some
> >> >functionality on
> >> >> top of Lucene, but doesn't really do anything to fundamentally change
> >> >the
> >> >> fact that you're dealing with a Lucene index.  So I'm going to mostly
> >> >talk
> >> >> about Lucene below.
> >> >>
> >> >> Lucene organizes data in a unit that we call a "document." An easy
> >> >analogy
> >> >> for this is that it is a lot like a row in a single database table.
> >> >It has
> >> >> fields, each field has a type. Unless custom software is used, there
> >> >is
> >> >> really no support for data other than basic primitive types --
> >> >numbers and
> >> >> strings.  The only complex type that I can think of that Solr
> >> >supports out
> >> >> of the box is geospatial coordinates, and it might even support
> >> >> multi-dimensional coordinates, but I'm not sure.  It's not all that
> >> >complex
> >> >> -- the field just stores and manipulates multiple numbers instead of
> >> >one.
> >> >> The Lucene API does support a FEW things that Solr doesn't implement.
> >> > I
> >> >> don't think those are applicable to what you're trying to do.
> >> >>
> >> >> Let's look at the first part of the data that you included in the
> >> >first
> >> >> message:
> >> >>
> >> >> 8=FIX.4.4 9=653 35=RIO
> >> >>
> >> >> Is "8" always a mixture of letters and numbers and periods? Is "9"
> >> >always
> >> >> a number, and is it always a WHOLE number?  Is "35" always letters?
> >> >> Looking deeper to data that I didn't quote ... is "122" always a
> >> >date/time
> >> >> value?  Are the tag numbers always picked from a well-defined set, or
> >> >do
> >> >> they change?
> >> >>
> >> >> Assuming that the answers in the previous paragraph are found and a
> >> >> configuration is created to deal with all of it ... how are you
> >> >planning to
> >> >> search it?  What kind of queries would you expect somebody to make?
> >> >That's
> >> >> going to have a huge influence on how you configure things.
> >> >>
> >> >> Writing the schema is usually where people spend the most time when
> >> >> they're setting up Solr.
> >> >>
> >> >> Thanks,
> >> >> Shawn
> >> >>
> >> >>
> >>
> >> --
> >> Sorry for being brief. Alternate email is rickleir at yahoo dot com
> >
> >
> >
>

Re: How do I create a schema file for FIX data in Solr

Posted by Raymond Xie <xi...@gmail.com>.

Hi Rick,

Following your suggestion I found https://github.com/SunGard-Labs/fix2json
which seems to be a fit;

I followed the installation instruction and successfully installed the
fix2json on my Ubuntu host.

sudo npm install -g fix2json

I ran the same command as indicated in the git:

fix2json -p dict/FIX50SP2.CME.xml XCME_MD_GE_FUT_20160315.gz


and I received error of:

/usr/bin/env: ‘node’: No such file or directory

It would be appreciated if you can point out what is missing here?

Thank you again for your kind help.



*------------------------------------------------*
*Sincerely yours,*


*Raymond*

On Mon, Apr 2, 2018 at 9:30 AM, Raymond Xie <xi...@gmail.com> wrote:

> Thank you Rick for the enlightening.
>
> I will get the FIX message parsed first and come back here later.
>
>
> *------------------------------------------------*
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Mon, Apr 2, 2018 at 9:15 AM, Rick Leir <rl...@leirtech.com> wrote:
>
>> Google
>>    fix to json,
>> there are a few interesting leads.
>>
>> On April 2, 2018 12:34:44 AM EDT, Raymond Xie <xi...@gmail.com>
>> wrote:
>> >Thank you, Shawn, Rick and other readers,
>> >
>> >To Shawn:
>> >
>> >For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
>> >means BeginString, in this example, its value is  FIX.4.4.9, and 9
>> >means
>> >body length, it is 653 for this message, 35 is RIO, meaning the message
>> >type is RIO, 122 stands for OrigSendingTime and has a format of
>> >UTCTimestamp
>> >
>> >You can refer to this page for details: https://www.onixs.biz
>> >/fix-dictionary/4.2/fields_by_tag.html
>> >
>> >All the values are explained as string type.
>> >
>> >All the tag numbers are from FIX standard so it doesn't change (in my
>> >case)
>> >
>> >I expect a python program might be needed to parse the message and
>> >extract
>> >each tag's value, index is to be made on those extracted value as long
>> >as
>> >their field (tag) name.
>> >
>> >With index in place, ideally and naturally user will search for any
>> >keyword, however, in this case, most queries would be based on tag 37
>> >(Order ID) and 75 (Trade Date), there is another customized tag (not in
>> >the
>> >standard) Order Version to be queried on.
>> >
>> >I understand the parser creation would be a manual process, as long as
>> >I
>> >know or have a small sample program, I will do it myself and maybe
>> >adjust
>> >it as per need.
>> >
>> >To Rick:
>> >
>> >You mentioned creating JSON document, my understanding is a parser
>> >would be
>> >needed to generate that JSON document, do you have any existing example
>> >code?
>> >
>> >
>> >
>> >
>> >Thank you guys very much.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >*------------------------------------------------*
>> >*Sincerely yours,*
>> >
>> >
>> >*Raymond*
>> >
>> >On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <ap...@elyograg.org>
>> >wrote:
>> >
>> >> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>> >>
>> >>> FIX is a format standard of financial data. It contains lots of tags
>> >in
>> >>> number with value for the tag, like 8=asdf, where 8 is the tag and
>> >asdf is
>> >>> the tag's value. Each tag has its definition.
>> >>>
>> >>> The sample msg in FIX format was in the original question.
>> >>>
>> >>> All I need to do is to know how to paste the msg and get all tag's
>> >value.
>> >>>
>> >>> I found so far a parser is what I need to start with., But I am more
>> >>> concerning about how to create index in Solr on the extracted tag's
>> >value,
>> >>> that is the first step, the next would be to customize the dashboard
>> >for
>> >>> users to search with a value to find out which msg contains that
>> >value in
>> >>> which tag and present users the whole msg as proof.
>> >>>
>> >>
>> >> Most of Solr's functionality is provided by Lucene.  Lucene is a java
>> >API
>> >> that implements search functionality.  Solr bolts on some
>> >functionality on
>> >> top of Lucene, but doesn't really do anything to fundamentally change
>> >the
>> >> fact that you're dealing with a Lucene index.  So I'm going to mostly
>> >talk
>> >> about Lucene below.
>> >>
>> >> Lucene organizes data in a unit that we call a "document." An easy
>> >analogy
>> >> for this is that it is a lot like a row in a single database table.
>> >It has
>> >> fields, each field has a type. Unless custom software is used, there
>> >is
>> >> really no support for data other than basic primitive types --
>> >numbers and
>> >> strings.  The only complex type that I can think of that Solr
>> >supports out
>> >> of the box is geospatial coordinates, and it might even support
>> >> multi-dimensional coordinates, but I'm not sure.  It's not all that
>> >complex
>> >> -- the field just stores and manipulates multiple numbers instead of
>> >one.
>> >> The Lucene API does support a FEW things that Solr doesn't implement.
>> > I
>> >> don't think those are applicable to what you're trying to do.
>> >>
>> >> Let's look at the first part of the data that you included in the
>> >first
>> >> message:
>> >>
>> >> 8=FIX.4.4 9=653 35=RIO
>> >>
>> >> Is "8" always a mixture of letters and numbers and periods? Is "9"
>> >always
>> >> a number, and is it always a WHOLE number?  Is "35" always letters?
>> >> Looking deeper to data that I didn't quote ... is "122" always a
>> >date/time
>> >> value?  Are the tag numbers always picked from a well-defined set, or
>> >do
>> >> they change?
>> >>
>> >> Assuming that the answers in the previous paragraph are found and a
>> >> configuration is created to deal with all of it ... how are you
>> >planning to
>> >> search it?  What kind of queries would you expect somebody to make?
>> >That's
>> >> going to have a huge influence on how you configure things.
>> >>
>> >> Writing the schema is usually where people spend the most time when
>> >> they're setting up Solr.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >>
>>
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>
>
>

Re: How do I create a schema file for FIX data in Solr

Posted by Raymond Xie <xi...@gmail.com>.

Thank you Rick for the enlightening.

I will get the FIX message parsed first and come back here later.


*------------------------------------------------*
*Sincerely yours,*


*Raymond*

On Mon, Apr 2, 2018 at 9:15 AM, Rick Leir <rl...@leirtech.com> wrote:

> Google
>    fix to json,
> there are a few interesting leads.
>
> On April 2, 2018 12:34:44 AM EDT, Raymond Xie <xi...@gmail.com>
> wrote:
> >Thank you, Shawn, Rick and other readers,
> >
> >To Shawn:
> >
> >For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
> >means BeginString, in this example, its value is  FIX.4.4.9, and 9
> >means
> >body length, it is 653 for this message, 35 is RIO, meaning the message
> >type is RIO, 122 stands for OrigSendingTime and has a format of
> >UTCTimestamp
> >
> >You can refer to this page for details: https://www.onixs.biz
> >/fix-dictionary/4.2/fields_by_tag.html
> >
> >All the values are explained as string type.
> >
> >All the tag numbers are from FIX standard so it doesn't change (in my
> >case)
> >
> >I expect a python program might be needed to parse the message and
> >extract
> >each tag's value, index is to be made on those extracted value as long
> >as
> >their field (tag) name.
> >
> >With index in place, ideally and naturally user will search for any
> >keyword, however, in this case, most queries would be based on tag 37
> >(Order ID) and 75 (Trade Date), there is another customized tag (not in
> >the
> >standard) Order Version to be queried on.
> >
> >I understand the parser creation would be a manual process, as long as
> >I
> >know or have a small sample program, I will do it myself and maybe
> >adjust
> >it as per need.
> >
> >To Rick:
> >
> >You mentioned creating JSON document, my understanding is a parser
> >would be
> >needed to generate that JSON document, do you have any existing example
> >code?
> >
> >
> >
> >
> >Thank you guys very much.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >*------------------------------------------------*
> >*Sincerely yours,*
> >
> >
> >*Raymond*
> >
> >On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <ap...@elyograg.org>
> >wrote:
> >
> >> On 4/1/2018 10:12 AM, Raymond Xie wrote:
> >>
> >>> FIX is a format standard of financial data. It contains lots of tags
> >in
> >>> number with value for the tag, like 8=asdf, where 8 is the tag and
> >asdf is
> >>> the tag's value. Each tag has its definition.
> >>>
> >>> The sample msg in FIX format was in the original question.
> >>>
> >>> All I need to do is to know how to paste the msg and get all tag's
> >value.
> >>>
> >>> I found so far a parser is what I need to start with., But I am more
> >>> concerning about how to create index in Solr on the extracted tag's
> >value,
> >>> that is the first step, the next would be to customize the dashboard
> >for
> >>> users to search with a value to find out which msg contains that
> >value in
> >>> which tag and present users the whole msg as proof.
> >>>
> >>
> >> Most of Solr's functionality is provided by Lucene.  Lucene is a java
> >API
> >> that implements search functionality.  Solr bolts on some
> >functionality on
> >> top of Lucene, but doesn't really do anything to fundamentally change
> >the
> >> fact that you're dealing with a Lucene index.  So I'm going to mostly
> >talk
> >> about Lucene below.
> >>
> >> Lucene organizes data in a unit that we call a "document." An easy
> >analogy
> >> for this is that it is a lot like a row in a single database table.
> >It has
> >> fields, each field has a type. Unless custom software is used, there
> >is
> >> really no support for data other than basic primitive types --
> >numbers and
> >> strings.  The only complex type that I can think of that Solr
> >supports out
> >> of the box is geospatial coordinates, and it might even support
> >> multi-dimensional coordinates, but I'm not sure.  It's not all that
> >complex
> >> -- the field just stores and manipulates multiple numbers instead of
> >one.
> >> The Lucene API does support a FEW things that Solr doesn't implement.
> > I
> >> don't think those are applicable to what you're trying to do.
> >>
> >> Let's look at the first part of the data that you included in the
> >first
> >> message:
> >>
> >> 8=FIX.4.4 9=653 35=RIO
> >>
> >> Is "8" always a mixture of letters and numbers and periods? Is "9"
> >always
> >> a number, and is it always a WHOLE number?  Is "35" always letters?
> >> Looking deeper to data that I didn't quote ... is "122" always a
> >date/time
> >> value?  Are the tag numbers always picked from a well-defined set, or
> >do
> >> they change?
> >>
> >> Assuming that the answers in the previous paragraph are found and a
> >> configuration is created to deal with all of it ... how are you
> >planning to
> >> search it?  What kind of queries would you expect somebody to make?
> >That's
> >> going to have a huge influence on how you configure things.
> >>
> >> Writing the schema is usually where people spend the most time when
> >> they're setting up Solr.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How do I create a schema file for FIX data in Solr

Posted by Rick Leir <rl...@leirtech.com>.

Google 
   fix to json, 
there are a few interesting leads.

On April 2, 2018 12:34:44 AM EDT, Raymond Xie <xi...@gmail.com> wrote:
>Thank you, Shawn, Rick and other readers,
>
>To Shawn:
>
>For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
>means BeginString, in this example, its value is  FIX.4.4.9, and 9
>means
>body length, it is 653 for this message, 35 is RIO, meaning the message
>type is RIO, 122 stands for OrigSendingTime and has a format of
>UTCTimestamp
>
>You can refer to this page for details: https://www.onixs.biz
>/fix-dictionary/4.2/fields_by_tag.html
>
>All the values are explained as string type.
>
>All the tag numbers are from FIX standard so it doesn't change (in my
>case)
>
>I expect a python program might be needed to parse the message and
>extract
>each tag's value, index is to be made on those extracted value as long
>as
>their field (tag) name.
>
>With index in place, ideally and naturally user will search for any
>keyword, however, in this case, most queries would be based on tag 37
>(Order ID) and 75 (Trade Date), there is another customized tag (not in
>the
>standard) Order Version to be queried on.
>
>I understand the parser creation would be a manual process, as long as
>I
>know or have a small sample program, I will do it myself and maybe
>adjust
>it as per need.
>
>To Rick:
>
>You mentioned creating JSON document, my understanding is a parser
>would be
>needed to generate that JSON document, do you have any existing example
>code?
>
>
>
>
>Thank you guys very much.
>
>
>
>
>
>
>
>
>
>*------------------------------------------------*
>*Sincerely yours,*
>
>
>*Raymond*
>
>On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <ap...@elyograg.org>
>wrote:
>
>> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>>
>>> FIX is a format standard of financial data. It contains lots of tags
>in
>>> number with value for the tag, like 8=asdf, where 8 is the tag and
>asdf is
>>> the tag's value. Each tag has its definition.
>>>
>>> The sample msg in FIX format was in the original question.
>>>
>>> All I need to do is to know how to paste the msg and get all tag's
>value.
>>>
>>> I found so far a parser is what I need to start with., But I am more
>>> concerning about how to create index in Solr on the extracted tag's
>value,
>>> that is the first step, the next would be to customize the dashboard
>for
>>> users to search with a value to find out which msg contains that
>value in
>>> which tag and present users the whole msg as proof.
>>>
>>
>> Most of Solr's functionality is provided by Lucene.  Lucene is a java
>API
>> that implements search functionality.  Solr bolts on some
>functionality on
>> top of Lucene, but doesn't really do anything to fundamentally change
>the
>> fact that you're dealing with a Lucene index.  So I'm going to mostly
>talk
>> about Lucene below.
>>
>> Lucene organizes data in a unit that we call a "document." An easy
>analogy
>> for this is that it is a lot like a row in a single database table. 
>It has
>> fields, each field has a type. Unless custom software is used, there
>is
>> really no support for data other than basic primitive types --
>numbers and
>> strings.  The only complex type that I can think of that Solr
>supports out
>> of the box is geospatial coordinates, and it might even support
>> multi-dimensional coordinates, but I'm not sure.  It's not all that
>complex
>> -- the field just stores and manipulates multiple numbers instead of
>one.
>> The Lucene API does support a FEW things that Solr doesn't implement.
> I
>> don't think those are applicable to what you're trying to do.
>>
>> Let's look at the first part of the data that you included in the
>first
>> message:
>>
>> 8=FIX.4.4 9=653 35=RIO
>>
>> Is "8" always a mixture of letters and numbers and periods? Is "9"
>always
>> a number, and is it always a WHOLE number?  Is "35" always letters?
>> Looking deeper to data that I didn't quote ... is "122" always a
>date/time
>> value?  Are the tag numbers always picked from a well-defined set, or
>do
>> they change?
>>
>> Assuming that the answers in the previous paragraph are found and a
>> configuration is created to deal with all of it ... how are you
>planning to
>> search it?  What kind of queries would you expect somebody to make? 
>That's
>> going to have a huge influence on how you configure things.
>>
>> Writing the schema is usually where people spend the most time when
>> they're setting up Solr.
>>
>> Thanks,
>> Shawn
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: How do I create a schema file for FIX data in Solr

Posted by Raymond Xie <xi...@gmail.com>.

Thank you, Shawn, Rick and other readers,

To Shawn:

For  *8=FIX.4.4 9=653 35=RIO* as an example, in the FIX standard: 8
means BeginString, in this example, its value is  FIX.4.4.9, and 9 means
body length, it is 653 for this message, 35 is RIO, meaning the message
type is RIO, 122 stands for OrigSendingTime and has a format of UTCTimestamp

You can refer to this page for details: https://www.onixs.biz
/fix-dictionary/4.2/fields_by_tag.html

All the values are explained as string type.

All the tag numbers are from FIX standard so it doesn't change (in my case)

I expect a python program might be needed to parse the message and extract
each tag's value, index is to be made on those extracted value as long as
their field (tag) name.

With index in place, ideally and naturally user will search for any
keyword, however, in this case, most queries would be based on tag 37
(Order ID) and 75 (Trade Date), there is another customized tag (not in the
standard) Order Version to be queried on.

I understand the parser creation would be a manual process, as long as I
know or have a small sample program, I will do it myself and maybe adjust
it as per need.

To Rick:

You mentioned creating JSON document, my understanding is a parser would be
needed to generate that JSON document, do you have any existing example
code?




Thank you guys very much.









*------------------------------------------------*
*Sincerely yours,*


*Raymond*

On Sun, Apr 1, 2018 at 2:16 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 4/1/2018 10:12 AM, Raymond Xie wrote:
>
>> FIX is a format standard of financial data. It contains lots of tags in
>> number with value for the tag, like 8=asdf, where 8 is the tag and asdf is
>> the tag's value. Each tag has its definition.
>>
>> The sample msg in FIX format was in the original question.
>>
>> All I need to do is to know how to paste the msg and get all tag's value.
>>
>> I found so far a parser is what I need to start with., But I am more
>> concerning about how to create index in Solr on the extracted tag's value,
>> that is the first step, the next would be to customize the dashboard for
>> users to search with a value to find out which msg contains that value in
>> which tag and present users the whole msg as proof.
>>
>
> Most of Solr's functionality is provided by Lucene.  Lucene is a java API
> that implements search functionality.  Solr bolts on some functionality on
> top of Lucene, but doesn't really do anything to fundamentally change the
> fact that you're dealing with a Lucene index.  So I'm going to mostly talk
> about Lucene below.
>
> Lucene organizes data in a unit that we call a "document." An easy analogy
> for this is that it is a lot like a row in a single database table.  It has
> fields, each field has a type. Unless custom software is used, there is
> really no support for data other than basic primitive types -- numbers and
> strings.  The only complex type that I can think of that Solr supports out
> of the box is geospatial coordinates, and it might even support
> multi-dimensional coordinates, but I'm not sure.  It's not all that complex
> -- the field just stores and manipulates multiple numbers instead of one.
> The Lucene API does support a FEW things that Solr doesn't implement.  I
> don't think those are applicable to what you're trying to do.
>
> Let's look at the first part of the data that you included in the first
> message:
>
> 8=FIX.4.4 9=653 35=RIO
>
> Is "8" always a mixture of letters and numbers and periods? Is "9" always
> a number, and is it always a WHOLE number?  Is "35" always letters?
> Looking deeper to data that I didn't quote ... is "122" always a date/time
> value?  Are the tag numbers always picked from a well-defined set, or do
> they change?
>
> Assuming that the answers in the previous paragraph are found and a
> configuration is created to deal with all of it ... how are you planning to
> search it?  What kind of queries would you expect somebody to make?  That's
> going to have a huge influence on how you configure things.
>
> Writing the schema is usually where people spend the most time when
> they're setting up Solr.
>
> Thanks,
> Shawn
>
>

Re: How do I create a schema file for FIX data in Solr

Posted by Shawn Heisey <ap...@elyograg.org>.

On 4/1/2018 10:12 AM, Raymond Xie wrote:
> FIX is a format standard of financial data. It contains lots of tags in
> number with value for the tag, like 8=asdf, where 8 is the tag and asdf is
> the tag's value. Each tag has its definition.
>
> The sample msg in FIX format was in the original question.
>
> All I need to do is to know how to paste the msg and get all tag's value.
>
> I found so far a parser is what I need to start with., But I am more
> concerning about how to create index in Solr on the extracted tag's value,
> that is the first step, the next would be to customize the dashboard for
> users to search with a value to find out which msg contains that value in
> which tag and present users the whole msg as proof.

Most of Solr's functionality is provided by Lucene.  Lucene is a java 
API that implements search functionality.  Solr bolts on some 
functionality on top of Lucene, but doesn't really do anything to 
fundamentally change the fact that you're dealing with a Lucene index.  
So I'm going to mostly talk about Lucene below.

Lucene organizes data in a unit that we call a "document." An easy 
analogy for this is that it is a lot like a row in a single database 
table.  It has fields, each field has a type. Unless custom software is 
used, there is really no support for data other than basic primitive 
types -- numbers and strings.  The only complex type that I can think of 
that Solr supports out of the box is geospatial coordinates, and it 
might even support multi-dimensional coordinates, but I'm not sure.  
It's not all that complex -- the field just stores and manipulates 
multiple numbers instead of one.  The Lucene API does support a FEW 
things that Solr doesn't implement.  I don't think those are applicable 
to what you're trying to do.

Let's look at the first part of the data that you included in the first 
message:

8=FIX.4.4 9=653 35=RIO

Is "8" always a mixture of letters and numbers and periods? Is "9" 
always a number, and is it always a WHOLE number?  Is "35" always 
letters?  Looking deeper to data that I didn't quote ... is "122" always 
a date/time value?  Are the tag numbers always picked from a 
well-defined set, or do they change?

Assuming that the answers in the previous paragraph are found and a 
configuration is created to deal with all of it ... how are you planning 
to search it?  What kind of queries would you expect somebody to make?  
That's going to have a huge influence on how you configure things.

Writing the schema is usually where people spend the most time when 
they're setting up Solr.

Thanks,
Shawn