You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Henning von Bargen <H....@Triestram-Partner.de> on 2000/07/11 09:56:24 UTC

Off-Topic: Is there any standard compressed ASCII format for vali d XML documents?

Hi folks,
I know this is off-topic, but there are so many XML experts here, so I hope
I'll get an answer.

Is there any "standard" format for compressing XML files?
I know that I could use GNU zip or so, but what I need is a simple (to
compress and to uncompress) compression that produces ASCII format.

If there is no standard, I'd propose a very simple solution.

Obviously, a valid XML document could be compressed in a simple way by
- replacing end tags by </>
- maybe replacing long tagnames by a relative reference to other tags, 
  i.e. ":" = reeuse the same tag name as the preceeding sibling
       ":1" = use the tag name of the first child of the preceeding sibling
       ":2" = use the tag name of the second child of the preceeding sibling

Then 

<PAGE> 
 <ROWSET skipRows="0" maxRows="3"> 
  <ROW ID="1"> 
   <KE_DATENBESTAND>B</KE_DATENBESTAND> 
   <AUFT_ID>99-5459</AUFT_ID> 
   <STATUS>AKTIV</STATUS> 
   <DAT_EINGANG>10.07.2000</DAT_EINGANG> 
   <DAT_FREIGABE /> 
   <BEZUG /> 
   <BETREFF /> 
   <BNNAME>MESSING</BNNAME> 
  </ROW> 
  <ROW ID="2"> 
   <KE_DATENBESTAND>B</KE_DATENBESTAND> 
   <AUFT_ID>99-5460</AUFT_ID> 
   <STATUS>AKTIV</STATUS> 
   <DAT_EINGANG>10.07.2000</DAT_EINGANG> 
   <DAT_FREIGABE /> 
   <BEZUG /> 
   <BETREFF /> 
   <BNNAME>MESSING</BNNAME> 
  </ROW> 
  <ROW ID="3"> 
   <KE_DATENBESTAND>B</KE_DATENBESTAND> 
   <AUFT_ID>99-5462</AUFT_ID> 
   <STATUS>AKTIV</STATUS> 
   <DAT_EINGANG>10.07.2000</DAT_EINGANG> 
   <DAT_FREIGABE /> 
   <BEZUG /> 
   <BETREFF /> 
   <BNNAME>MESSING</BNNAME> 
  </ROW> 
 </ROWSET> 
</PAGE>

771 characters

becomes

<PAGE> 
 <ROWSET skipRows="0" maxRows="3"> 
  <ROW ID="1"> 
   <KE_DATENBESTAND>B</> 
   <AUFT_ID>99-5459</> 
   <STATUS>AKTIV</> 
   <DAT_EINGANG>10.07.2000</> 
   <DAT_FREIGABE /> 
   <BEZUG /> 
   <BETREFF /> 
   <BNNAME>MASTER</> 
  </> 
  <: ID="2"> 
   <:1>B</> 
   <:2>99-5460</> 
   <:3>AKTIV</> 
   <:4>10.07.2000</> 
   <:5 /> 
   <:6 /> 
   <:7 /> 
   <:8>JONES</> 
  </> 
  <: ID="3"> 
   <:1>B</> 
   <:2>99-5462</> 
   <:3>AKTIV</> 
   <:4>10.07.2000</> 
   <:5 /> 
   <:6 /> 
   <:7 /> 
   <:8>SMITH</> 
  </> 
 </> 
</>

502 characters, well, about 60% of the original size. Obviously better
compression if more similar records are transmitted.

One drawback of this simple solution is that it works good only with
table-like structured data.

Background:
A web server and a RDBMS are communicating over a 64k link,
the link is shared with other application, so there is very low bandwidth.
The RDBMS produces data in XML, the web server (i.e. Cocoon) uses style
sheets
to pretty-format everything.
The RDBMS could produce data in ASCII-compressed XML in order to use less
bandwidth, a modified DOM or SAX parser that consumes compressed XML should
be easy to implement.

Henning

Re: Off-Topic: Is there any standard compressed ASCII format for valid XML documents?

Posted by Stefano Mazzocchi <st...@apache.org>.
Henning von Bargen wrote:
> 
> Hi folks,
> I know this is off-topic, but there are so many XML experts here, so I hope
> I'll get an answer.
> 
> Is there any "standard" format for compressing XML files?

XMill

> I know that I could use GNU zip or so, but what I need is a simple (to
> compress and to uncompress) compression that produces ASCII format.

???
 
> If there is no standard, I'd propose a very simple solution.
> 
> Obviously, a valid XML document could be compressed in a simple way by
> - replacing end tags by </>
> - maybe replacing long tagnames by a relative reference to other tags,
>   i.e. ":" = reeuse the same tag name as the preceeding sibling
>        ":1" = use the tag name of the first child of the preceeding sibling
>        ":2" = use the tag name of the second child of the preceeding sibling
> 
> Then
> 
> <PAGE>
>  <ROWSET skipRows="0" maxRows="3">
>   <ROW ID="1">
>    <KE_DATENBESTAND>B</KE_DATENBESTAND>
>    <AUFT_ID>99-5459</AUFT_ID>
>    <STATUS>AKTIV</STATUS>
>    <DAT_EINGANG>10.07.2000</DAT_EINGANG>
>    <DAT_FREIGABE />
>    <BEZUG />
>    <BETREFF />
>    <BNNAME>MESSING</BNNAME>
>   </ROW>
>   <ROW ID="2">
>    <KE_DATENBESTAND>B</KE_DATENBESTAND>
>    <AUFT_ID>99-5460</AUFT_ID>
>    <STATUS>AKTIV</STATUS>
>    <DAT_EINGANG>10.07.2000</DAT_EINGANG>
>    <DAT_FREIGABE />
>    <BEZUG />
>    <BETREFF />
>    <BNNAME>MESSING</BNNAME>
>   </ROW>
>   <ROW ID="3">
>    <KE_DATENBESTAND>B</KE_DATENBESTAND>
>    <AUFT_ID>99-5462</AUFT_ID>
>    <STATUS>AKTIV</STATUS>
>    <DAT_EINGANG>10.07.2000</DAT_EINGANG>
>    <DAT_FREIGABE />
>    <BEZUG />
>    <BETREFF />
>    <BNNAME>MESSING</BNNAME>
>   </ROW>
>  </ROWSET>
> </PAGE>
> 
> 771 characters
> 
> becomes
> 
> <PAGE>
>  <ROWSET skipRows="0" maxRows="3">
>   <ROW ID="1">
>    <KE_DATENBESTAND>B</>
>    <AUFT_ID>99-5459</>
>    <STATUS>AKTIV</>
>    <DAT_EINGANG>10.07.2000</>
>    <DAT_FREIGABE />
>    <BEZUG />
>    <BETREFF />
>    <BNNAME>MASTER</>
>   </>
>   <: ID="2">
>    <:1>B</>
>    <:2>99-5460</>
>    <:3>AKTIV</>
>    <:4>10.07.2000</>
>    <:5 />
>    <:6 />
>    <:7 />
>    <:8>JONES</>
>   </>
>   <: ID="3">
>    <:1>B</>
>    <:2>99-5462</>
>    <:3>AKTIV</>
>    <:4>10.07.2000</>
>    <:5 />
>    <:6 />
>    <:7 />
>    <:8>SMITH</>
>   </>
>  </>
> </>
> 
> 502 characters, well, about 60% of the original size. Obviously better
> compression if more similar records are transmitted.

Sorry, XMill is much smarter than this :

> One drawback of this simple solution is that it works good only with
> table-like structured data.
> 
> Background:
> A web server and a RDBMS are communicating over a 64k link,
> the link is shared with other application, so there is very low bandwidth.
> The RDBMS produces data in XML, the web server (i.e. Cocoon) uses style
> sheets
> to pretty-format everything.
> The RDBMS could produce data in ASCII-compressed XML in order to use less
> bandwidth, a modified DOM or SAX parser that consumes compressed XML should
> be easy to implement.

Sorry, but I don't remember the URL... try searching for it...

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------



Re: Off-Topic: Is there any standard compressed ASCII format for vali d XML documents?

Posted by Jonathan Stimmel <jo...@stimmel.net>.
On Tue, Jul 11, 2000 at 09:56:24AM +0200, Henning von Bargen wrote:

> Is there any "standard" format for compressing XML files?
> I know that I could use GNU zip or so, but what I need is a simple (to
> compress and to uncompress) compression that produces ASCII format.
> 
> If there is no standard, I'd propose a very simple solution.

> The RDBMS could produce data in ASCII-compressed XML in order to use less
> bandwidth, a modified DOM or SAX parser that consumes compressed XML should
> be easy to implement.

If you're really concerned with bandwidth, use a "real" compression
algorithm like gzip; text usually compresses very well. Or better yet,
apply a compression algorithm on the network connection itself, so that
all your apps can benefit from it (why bother developing separate
compression algorithm for each app, when you can have them all benefit
from a one-time setup? =)