作者:傲气战歌网 来源:www.27zg.com 发表时间:2014-03-14 03:00
Print Subscribe to XML
XML, SOAP and Binary DataFebruary 26, 2003
Editor's note: XML.com is happy to publish this white paper from Don Box and his colleagues, which addresses a long term issue in XML, namely the coordinated transport of opaque binary data in conjunction with an XML document. Please use the forum facility at the end of this article to leave your comments and questions -- ED.
Version 1.0
February 24, 2003
© 2003 BEA Systems, Microsoft Corporation. All rights reserved.
AbstractThis white paper discusses the architectural issues encountered when using opaque non-XML data in XML applications, including (but not limited to) Web services and SOAP.
StatusThis white paper is provided as-is and for review and evaluation only. Microsoft and BEA hope to solicit your contributions and suggestions in the near future. BEA and Microsoft make no warrantees or representations regarding this document in any manner whatsoever.
Table of ContentsThe desire to integrate XML with pre-existing data formats has been a long-standing and persistent issue for the XML community. Users often want to leverage the structured, extensible markup conventions of XML without abandoning existing data formats that do not readily adhere to XML 1.0 syntax. Often, users want to leave their existing non-XML formats as is, to be treated as opaque sequences of octets by XML tools and infrastructure. Such an approach allows widely used formats such as JPEG and WAV to peacefully coexist with XML.
As XML is increasingly used as a message format (e.g., SOAP), the interest in integrating opaque data with XML has increased to the point where there are at least two competing proposals for doing so (SOAP With Attachments (SwA) and WS-Attachments). Because SwA was the first widely-publicized mechanism for dealing with binary data, it has had a large influence on how the community views the issues surrounding this topic.
Unfortunately, SwA (as well as WS-Attachments) conflates several orthogonal issues. Specifically, both SwA and WS-Attachments assume that a URI-based referencing mechanism by itself is sufficient for supporting opaque binary values in messages. Moreover, at least one of the proposals (SwA) attempts to solve problems that are in no way limited to SOAP, that is, how URI that appear as XML element or attributes content are to be resolved in the presence of multipart MIME.
As field experience with both SwA and WS-Attachments has shown, the lack of an XML-focused approach to opaque data has lead to solutions that are unnecessarily complex for developers and software components. This white paper attempts to present the various issues raised by dealing with opaque data in XML, without nominating a particular solution.
2. Current Approaches to Opaque Data in XML 2.1 EmbeddingTraditionally, two techniques for dealing with opaque data in XML have been used; "by value" or "by reference." The former is achieved by embedding opaque data as element or attribute content. XML supports opaque data as content through the use of either base64 or hexadecimal text encoding. This approach is codified by XML Schema's two binary data types, xs:base64Binary and xs:hexBinary. The lexical representation of the xs:hexBinary is a simple hexadecimal character sequence; the lexical representation of xs:base64Binary uses the base64 algorithm as defined by RFC 2045 [rfc2045]. The underlying value space of both types is identical: an ordered sequence of octets.
The following XML instance demonstrates the use of base64 in simple XML document.
<m:data xmlns:m='http://example.org/people' > <photo>/aWKKapGGyQ=</photo> <sound>sdcfo2JTiXE=</sound> <hash>Faa7vROi2VQ=</hash> </m:data>In this example, the photo, sound, and hash elements each contain a base64 string (i.e., a sequence of characters) that represents the following octet sequences:
fd a5 8a 29 aa 46 1b 24 (photo) b1 d7 1f a3 62 53 89 71 (sound) 15 a6 bb bd 13 a2 d9 54 (hash)The fact that the children of the photo, sound, and hash elements are encoded as base64 is implicit (although discoverable through an XML Schema or RELAX NG schema), but can be made explicit using xsi:type or an application-specific annotation.
It is well-known that base64 encoded data expands by a factor of 1.33x original size, and that hexadecimal encoded data expands by a factor of 2x (assuming an underlying UTF-8 text encoding in both cases; if the underlying text encoding is UTF-16, these numbers double). Also of concern is the overhead in processing costs (both real and perceived) for these formats, especially when decoding back into raw binary. When comparing base64 decoding to a straight-through copy of opaque data, the throughput of at least one popular programming system decreased by a factor of 3 or more.
上一篇:Extending RSS