The concept of a remote procedure call (RPC) represents a major intellectual breakthrough in distributed computing. In RPC, procedures in processes on remote computers can be called as if they are procedures in the local address space. The underlying RPC system then hides important aspects of distribution, including the encoding and decoding of parameters and results, the passing of messages and the preserving of the required semantics for the procedure call.

This approach directly and elegantly supports client-server computing with servers offering a set of operations through a service interface and clients calling these operations directly as if they were available locally. RPC systems therefore offer (at a minimum) access and location transparency.

Sounds real, how does message passing fit into this? In a certain sense, sending a message to a remote host and getting a reply back is a lot like making a function call in a programming language, in both cases, you start with one or more parameters and you get back a result.

Like what? Imagine a procedure named get_IP_address(host_name) that works by sending a UDP packet to a DNS server and waiting for the reply, timing out and trying again if one is not forthcoming quickly enough, all the details of networking can be hidden from the programmer.

What’s so special about it though, extra layer only? Idea behind RPC is to make a remote procedure call look as much as possible like a local one, in the simplest form, to call a remote procedure, the client program must be bound with a small library procedure, called the client stub, that represents the server procedure in the client’s address space, similarly, the server is bound with a procedure called the server stub.

Give me the steps of how this works?

  1. Client makes a local procedure call to the client stub, which has the same name as the server procedure, with the parameters pushed onto the stack in the normal way.
  2. Client stub packing the parameters into a message and making system call to send the message, called marshalling.
  3. Operating system sending the message from the client machine to the server machine.
  4. OS in server passing the incoming packet to the server stub.
  5. Server stub calling the server procedure with the unmarshalled parameters.

An RPC aims at hiding most of the intricacies of message passing, and is ideal for client-server applications, however, realizing RPCs in a transparent manner is easier said than done, in many cases, where communication does not follow rather strict pattern of client-server pattern, turns out thinkiing in term of messages is more appropriate.

External data representation and marshalling:

Information stored in running programs is represented as data structures - by sets of interconnected objects - whereas the information in messages consists of sequence of bytes, the data structures must be flattened before transmission and rebuilt on arrival. The individual primitive data items transmitted in messages can be data values of many different types, and not all computers store primitives values such as integers in the same order.

Float point numbers, big endian vs little-endian etc, ASCII? etc.

One of the following methods can be used to enable any two computers to exchange binary data values:

  1. The values are converted to an agreed external format before transmission and converted to the local form on receipt.
  2. The values are transmitted in sender’s format, together with an indication of the format used, and the receipt converts the values if necessary.

Marshalling is the process of taking a collection of data items and assembling them into a form suitable for transmission in a message.

  1. CORBA’s common data representation, which is concerned with an external representation for the structured and primitive types that can be passed as the arguments and results of remote method invocations in CORBA.

  2. Java’s object serialization, which is concerned with the flattening and external data representation of any single object or tree of objects that may need to be transmitted in a message or stored on a disk.

  3. XML, which defines a textual format for representing structure data, orginially intended for documents containing textual self-describing structure data - for example, documents accessible on the Web - but now also used to represent data sent in messages exchanged by clients and servers in web services.

In the first two cases, the marshaling and unmarshalling are inteded to be carried out by a middleware layer without any involvement on the part of the application programmer. Even in the case of XML, which is textual and therefore more accessible to hand-encoding, software for marshaling and unmarshalling is available for all commonly used platforms and programming environments.

Two other techniques for external data representation are worthy of mention. Google uses an approach called protocol buffers to capture representation of both stored and transmitted data, Also considerable interest in JSON as an approach, both represent a step towards more lightweight approaches to data representation when compared to XML.

Issues important in understanding implementation of RPC:

  1. Programming with interfaces: Modules are implemented so as to hide all the information about them except that which is available through its interfaces. In a distributed program, the modules can run in separate processes, so the definition of service interfaces is influenced by the distributed nature of the underlying architecture. For example: not possible for a client module running in one process to accesss the variables in a module in another process, but accessed by means of some getter and setter procedures added automatically to the interface. An IDL provides a notation for defining interfaces in which each of the parameters of an operation may be described as for input or output in addition to have its type specified. IDLs are designed to allow procedures implemented in different languages to invoke one another, in contrast, can also be embed remote procedure calling into a language itself, main benefit is that application development often becomes much simpler.

  2. Call semantics: A higher-level concern is that the operation may not be idempotent i.e. safe to repeat, simplest case is idempotent DNS requests and replies, does not matter whether the server never received he request, or it was the reply that was lost, the answer, when it finally arrives will be same, however, not all operations are idempotent, for example, they have side-effects such as incrementing a counter, in this case, may be necessary to set up a TCP connection and send the request over it rather than using UDP. For local procedure calls, the semantics are exactly once, meaning that every procedure is executed exactly once, while choices for RPC invocation semantics.

    • Maybe semantics
    • At-most-once semantics
    • At-least-once semantics
  3. Pointers and reference difficulties:

    • When one of the remote procedure’s parameters is a pointer to an object, the client must transmit a copy of that object and its pointer to the server. If the remote procedure modifies the object through its pointer, the server returns the pointer and its modified copy. Or some other tricks.
    • Calling and called procedure can communicate through global variables, in addition to communicating via parameters, but if the called procedure is moved to a remote machine, the code will fail because the global variables are no longer shared.
    • In weakly typed languages, like C, perfectly legal to write a procedure that computes inner product without specifying how larger either one is, each could be terminated by a special value known only to the calling and called procedures, under which, essentially impossible for the client stub to marshal the parameters.
    • Not always possible to deduce the types of the parameters, such as printf which may have any number of parameters as mixture of integers, shorts, longs etc.

Implementation

The client that accesses a service includes one stub procedure for each procedure in the service interface. The stub procedure behaves like a local procedure to the client, but instead of executing the call, it marshals the procedure identifier and the arguments into a request message, which it sends via its communication module to the server.

When the reply message arrives, it unmarshals the results. The server process contains a dispatcher together with one server stub procedure and one service procedure for each procedure in the service interface.

The dispatcher selects one of the server stub procedures according to the procedure identifier in the request message then unmarshals the arguments in the request message, calls the corresponding service procedure and marshals the return values for the reply message.

The service procedures implement the procedures in the service interface. The client and server stub procedures and the dispatcher can be generated automatically by an interface compiler from the interface definition of the service.

Remote Method Invocation

Remote method invocation (RMI) strongly resembles remote procedure calls but in a world of distributed objects. With this approach, a calling object can invoke a method in a remote object. As with RPC, the underlying details are generally hidden from the user. RMI implementations may, though, go further by supporting object identity and the associated ability to pass object identifiers as parameters in remote calls.

The following differences lead to added expressiveness when it comes to the programming of complex distriuted application and services

  • The programmer is able to use the full expressive power of object-oriented programming in the development of distributed systems software, including the use of objects, classes and inheritance, and can also employ related object-oriented design methodologies and associated tools.

  • Building on the concept of object identity in object-oriented systems, all objects in an RMI-based system have unique object references (whether they are local or remote), such object references can also be passed as parameters, thus offering significantly richer parameter-passing semantics than in RPC.

RMI allows programmer to pass parameters not only by value, as input or output parameters, but only by object reference. Passing references is particularly attractive if the underlying parameter is large or complex.

The key added design issues relates to the object model and, in particular, achieving objects, each of which consists of a set of data and a set of methods.

  • The object model: An object-oriented program, for example in Java or C++, consists of a collection of interacting objects, each of which consists of a set of data and a set of objects. An object communicates with other objects by invoking their methods, generally passing arguments and receiving results. If a language supports garbage collection, then any associated RMI system should allow garbage collection of remote objects. Distributed garbage collection is generally achieved by cooperation between the existing local garbage collection, usually based on reference counting.

The role of a proxy is to make remote method invocation transparent to clients by behaving like a local object to the invoker; but instead of executing an invocation, it forwards it in a message to a remote object.

Java RMI

Java RMI extends the Java object model to provide support for distributed objects in the Java language. In particular, it allows objects to invoke methods on remote objects using the same syntax as for local invocations.

The embedding of RMI in a language reaches a high degree of access transparency is often simpler as many issues related to parameter passing can be circumvented altogether.

In essence, a client being executed by its own Java virtual machine can invoke a method of an object managed by another virtual machine.

An object making a remote invocation is aware that its target is remote because it must handle RemoteExceptions; and the implementer of a remote object is aware that it is remote because it must implement the Remote interface.

Remote interfaces are defined by extending an interface called Remote provided in the java.rmi package. The methods must throw RemoteException, but application-specific exceptions may also be thrown.

Example of Shape interface:

import java.rmi.*;
 
public interface HelloInterface extends Remote
{
	String sayHello() throws RemoteException;
}
 
public class HelloClass extends UnicastRemoteObject implements HelloInterface
{
	public HelloClass() throws RemoteException {
	//Constructor
	}
	
	@Override
	public String sayHello() throws RemoteException {
		return "Hello, World!";
	}
}
 

Java’s object serialization is used for marshalling arguements and results in Java RMI. When the type of parameter or result value is passed as a remote interface, the corresponding argument or result is always passed as a remote object reference, while all serializable non-remote objects are copied and passed by value.

RMIregistry: The RMIregistry is the binder for Java RMI, an instance of which normally run on every server computer that hosts remote objects. It maintains a table mapping textual, URL-style names to references to remote objects hosted on that computer. It is accessed by methods of the Naming class, whose methods take as an argument an URL-formatted strings.

JINI

Jini, also called Apache River, is a network architecture for the construction of distributed systems in the form of modular co-operating services. JavaSpaces is a part of the Jini.

Locating services is done through a lookup service. Services try to contact a lookup service (LUS), either by unicast interaction, when it knows the actual location of the lookup service, or by dynamic multicast discovery. The lookup service returns an object called the service registrar that can be used by services to register themselves so they can be found by clients.

Clients can use the lookup service to retrieve a proxy object to the service; calls to the proxy translate the call to a service request, performs this request on the service, and returns the result to the client. This strategy is more convenient than Java remote method invocation, which requires the client to know the location fo the remote service in advance.

Jini uses a lookup service to broker communication between the client and service. This appears to be a centralized model (though the communication between client and service can be seen as decentralized) that does not scale well to very large systems.