An Intro to External Data Representation and Marshalling

Usually, a program represents its data and information as data structures when dealing with them at runtime. For example, a library management system might represent the data about a certain book using a “Book” object which in turn may consist of primitive data items like “title, ISBN” and complex objects like “author”.

However, this library management system cannot represent the information about the book using the said “Book” data structure when sending data to another library management system. For that, the “Book” data structure needs to be flattened (converted to a sequence of bytes) before transmission and rebuilt at the destination. This applies to almost every data structure as they are not compatible to be transmitted through mediums like networks.

This is where concepts like marshalling and external data representation come into play.

External Data Representation (XDR)

XDR is a standard data serialization format that can be used to transmit data among different computer architectures. Conversion of data from local representation to XDR is called encoding and conversion from XDR to a local representation is called decoding. XDR implementations are portable between different operating systems and independent of the transport layer.

XDR uses a base of four bytes and serialized in big-endian order. Smaller data types will also occupy four bytes after encoding regardless of their size. Variable-length data types such as strings and opaque are padded to a total divisible by four bytes. Floating point numbers are represented using the IEEE 754 format.

Marshalling

Marshalling is the process of transforming a collection of data items into a form that is suitable for transmission in a message. At the destination, the message is unmarshalled to produce the relevant data items. In other words, marshalling converts structured data items and primitive values into an external data representation, and unmarshalling generates primitive values from their external data representation and rebuild the data structures.

So now that we know about XDR and marshalling, let’s talk about the three different approaches for it.

CORBA’s Common Data Representation (CDR)

Common Object Request Broker Architecture or CORBA for short is a standard defined by Object Management Group (OMG) to facilitate the communication among diverse platforms. Common Data Representation aka CDR is used in CORBA distributed objects during remote invocations to represent structured or primitive data types that are passed as arguments or results.

CDR can represent 15 primitive data types including short, long, float, double, char, etc. Each argument or result in a remote invocation is represented by a sequence of bytes in the invocation or result message.

CORBA Common Data Representation

Java Object Serialization

Java provides automatic serialization to the primitives data types and objects that are marked as serializable by implementing either java.io.Serializable or java.io.Externalizable interfaces. These serializable objects then can be used as arguments and results in Java Remote Method Invocation(RMI).

What java object serialization essentially does is flattening an object or a set of connected objects into a stream of bits that can be transmitted over a network or stored in a disk for later use. Deserialization is about reconstructing the object or the set of objects using the serialized stream of bits. When an object is serialized, all of the objects that are reachable from that object are serialized as well unless specified as transient.

In addition to the states of the objects, some information about the classes of the serialized objects is also included in the serialized form. This information is helpful for the recipient to load the appropriate class upon deserialization of the object. The class information contains a version number for the class to keep track of the correct version of the class. This version number changes whenever a class is modified. So, it can be used to verify whether the recipient has the proper version of the class.

Extensible Markup Language (XML)

XML is a markup language that defines a set of rules for encoding data in a format that is both human and machine-readable. It was introduced by the World Wide Web Consortium(W3C) for general use on the web. XML uses a textual encoding to represent both its data and the structure.

XML follows a hierarchical structure which makes it easier for humans to understand the documents intuitively and for the machines to parse the data easily using data structures like trees. XML data items are tagged with ‘markup’ strings. The tags are used to describe the logical structure of the data and to associate attribute-value pairs with logical structures.

However, unlike other markup languages, XML is not limited to a fixed set of tags. It is extensible in the sense that the users can define and use their own tags for data representation. But if the XML document is intended to be used by more than one application, then the tag names must be agreed between them. This is done using the XML schema and namespaces.

Following snippet is a example SOAP message which is represented using XML. (source: https://www.w3schools.com/xml/xml_soap.asp)

<?xml version="1.0"?>

<soap:Envelope
xmlns:soap="http://www.w3.org/2003/05/soap-envelope/"
soap:encodingStyle="http://www.w3.org/2003/05/soap-encoding">

<soap:Body>
<m:GetPrice xmlns:m="https://www.w3schools.com/prices">
<m:Item>Apples</m:Item>
</m:GetPrice>
</soap:Body>

</soap:Envelope>

Conclusion

  • Even though programs represented data using data structures at runtime, they need to be converted into sequences of bytes before transmitting to another system.
  • External Data Representation is a standard data serialization format that can be used to transmit data among different computer architectures.
  • Marshalling is the process of transforming a collection of data items into a form that is suitable for transmission in a message. Unmarshalling is the reverse process.
  • CORBA’s CDR, Java object serialization and XML are the three different approaches to external data representation and marshalling.

That’s it for this article. See you in the next one! 😊