An Intro to External Data Representation and Marshalling

Usually, a program represents its data and information as data structures when dealing with them at runtime. For example, a library management system might represent the data about a certain book using a “Book” object which in turn may consist of primitive data items like “title, ISBN” and complex objects like “author”.

However, this library management system cannot represent the information about the book using the said “Book” data structure when sending data to another library management system. For that, the “Book” data structure needs to be flattened (converted to a sequence of bytes) before transmission and rebuilt at the destination. This applies to almost every data structure as they are not compatible to be transmitted through mediums like networks.

This is where concepts like marshalling and external data representation come into play.

External Data Representation (XDR)

XDR uses a base of four bytes and serialized in big-endian order. Smaller data types will also occupy four bytes after encoding regardless of their size. Variable-length data types such as strings and opaque are padded to a total divisible by four bytes. Floating point numbers are represented using the IEEE 754 format.

Marshalling

So now that we know about XDR and marshalling, let’s talk about the three different approaches for it.

CORBA’s Common Data Representation (CDR)

CDR can represent 15 primitive data types including short, long, float, double, char, etc. Each argument or result in a remote invocation is represented by a sequence of bytes in the invocation or result message.

CORBA Common Data Representation

Java Object Serialization

What java object serialization essentially does is flattening an object or a set of connected objects into a stream of bits that can be transmitted over a network or stored in a disk for later use. Deserialization is about reconstructing the object or the set of objects using the serialized stream of bits. When an object is serialized, all of the objects that are reachable from that object are serialized as well unless specified as transient.

In addition to the states of the objects, some information about the classes of the serialized objects is also included in the serialized form. This information is helpful for the recipient to load the appropriate class upon deserialization of the object. The class information contains a version number for the class to keep track of the correct version of the class. This version number changes whenever a class is modified. So, it can be used to verify whether the recipient has the proper version of the class.

Extensible Markup Language (XML)

XML follows a hierarchical structure which makes it easier for humans to understand the documents intuitively and for the machines to parse the data easily using data structures like trees. XML data items are tagged with ‘markup’ strings. The tags are used to describe the logical structure of the data and to associate attribute-value pairs with logical structures.

However, unlike other markup languages, XML is not limited to a fixed set of tags. It is extensible in the sense that the users can define and use their own tags for data representation. But if the XML document is intended to be used by more than one application, then the tag names must be agreed between them. This is done using the XML schema and namespaces.

Following snippet is a example SOAP message which is represented using XML. (source: https://www.w3schools.com/xml/xml_soap.asp)

<?xml version="1.0"?>

<soap:Envelope
xmlns:soap="http://www.w3.org/2003/05/soap-envelope/"
soap:encodingStyle="http://www.w3.org/2003/05/soap-encoding">

<soap:Body>
<m:GetPrice xmlns:m="https://www.w3schools.com/prices">
<m:Item>Apples</m:Item>
</m:GetPrice>
</soap:Body>

</soap:Envelope>

Conclusion

  • External Data Representation is a standard data serialization format that can be used to transmit data among different computer architectures.
  • Marshalling is the process of transforming a collection of data items into a form that is suitable for transmission in a message. Unmarshalling is the reverse process.
  • CORBA’s CDR, Java object serialization and XML are the three different approaches to external data representation and marshalling.

That’s it for this article. See you in the next one! 😊

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store