The DataSpace Framework

Juan J. Collas
Moreira Consulting, Inc.


10-Oct-97

Introduction

The DataSpace is a generalization and simplification of the DataMaster concept introducted by Dave Neumann in the impressive Monitor suite for WebObjects. DataSpace is a server process that maintains state for client processes. It takes ideas from JavaSoft's JavaSpace, and is designed to be a general distributed persistent store for Foundation objects. Due to its design, it is also transactional in nature, and can be used as a shared store for applications that need to exchange information.

Credits

Much of the credit for this work goes to Dave Neumann, for providing an excellent testbed for these ideas.

Goals

The following are goals for the DataSpace architecture:

  • Simple. There should be only one additional process to maintain state for any set of clients.
  • Robust. The DataSpace server should provide redundancy through the simple process of starting additional server processes.
  • Replicable. Multiple DataSpaces should be able to share client information through simple replication mechanisms.
  • Fast. Setting and retrieving data from a DataSpace should be as quick as accessing the data from a local disk store.
  • Correct. Data should not be corrupted when written to and read from a database.
  • Flexible. Clients can create new shared spaces in the DataSpace server for private communication.
  • Lightweight. Minimization of DO connections should be part of the core architecture.

Non-Goals

  • Buzzword compliance. The DataSpace is an Apple-specific implementation. It is not presently written in Java. It uses DO, not IIOP.
  • Directory Server. The DataSpace is designed to store persistent data in a lightly structured manner. It can probably scale to 100s of thousands of elements. It can manage large amounts of data by utilizing multiple DataSpace servers.
  • Hierarchical. The DataSpace is not hierarchical. Data is stored in domains which do not participate in a lookup hierarchy.

Design

The DataSpace server is implemented by a new class called DataSpace. The dataspace is accessed as a proxy to a DataSpace server. The proxy is created with the following method:

id dataSpace = [NSDataSpace defaultSpace];

Once instantiated, the space provides access to domains which are available as dictionaries. For example, to write into a domain called "Monitor", you can use the following methods:

[dataSpace setObject:theChildren forKey:@"Children" domain:@"Monitor"];

To access data in a domain, you can use:

[dataSpace objectForKey:@"Children" domain:@"Monitor"];

To remove a key from a domain,

[dataSpace removeObjectForKey:@"Children" domain:@"Monitor"];

Connection Management

The DataSpace class maintains a single DO connection to the dataspace server. The server is not multithreaded for performance and transactional integrity reasons. All requests for data from a domain are sent to the server. If the server dies, the client class will attempt to connect to a new dataspace server, and will invalidate its caches. If it fails, it might make sense for the datamaster client to become a server.

This implies that the set of dataspace servers should somehow do their best to replicate information between themselves.

Name Management

The Dataspace is an excellent way to manage the state of clients. If a client dies, the dataspace is aware of it, and can notify interested clients. If a client attaches to the dataspace, this information can also be made available to interested parties.

There is a domain called NameServer which allows clients to register themselves with the dataspace, and for other clients to quickly locate an instance of a registered name.

For example, here's a client registering itself with the dataspace.

[dataSpace setObject:self forKey:@"0-DLNMonitor" domain:@"NameServer"];

Any clients that want access to this instance can simply do:

remoteMonitor = [dataSpace objectForKey:@"0-DLNMonitor" domain:@"NameServer"];

Of course, if the server for this proxy dies, the dataspace will pass along the connection death notification that it receives.

Since there might be multiple servers registered for a given name, the API for setting and getting values might also include some array-specific methods to reduce communication time:

[dataSpace addObject:anObj forKey:aKey domain:aDomain];
[dataSpace removeObject:anObj forKey:aKey domain:aDomain];

Referents

A referent is a proxy that knows how to recreate itself if its connection dies. This implies that the referent contains its host and rootName. The referent acts like a normal proxy, forwarding messages across the network. If its connection dies, it can automatically re-establish a new proxy from the host and rootName. A referent can be described as a tuple of host, rootname, and proxy.

Domains

Domains are a way to manage the dataspace areas. Domains are created when first referenced. If the domain does not exist, it is created. When there is no data in a domain, the domain is removed. If a domain is not specified, the domain is set to the name of the application (whether an AppKit or WebObjects application).

Domains may be smart, which means there are classes tied to a domain name that have additional capabilities, such as cleaning up old data, or communicating with an LDAP store to persist information.

Here are some domains:

  • NameServer. The keys are the names of instances, the values are arrays containing the instance proxies.
  • WOSessionStore. The keys are session ids, the values are NSData objects representing the stored data.
  • ThreadStore. The keys are session ids, the values are the current state of the thread (needs definition)
  • Monitor. Contains configuration information for instances. Instances can get this information without communicating directly with the monitor.
  • MonitorPreferences. The set of preferences that the Monitor application uses.

Backing Stores

We may define a bundle based architecture to add additional storage types. These may include

  • PPL (Persistent Property lists)
  • LDAP
  • File System access
  • Volatile (for namesevers, since we shouldn't persist this information.) This might also be Dictionary PL, read-only
  • Task start/stop

Managing the Persistent Store

The default implementation of the persistent data would be done through an NSPPL. These are apparently designed to provide a simple, Foundation-based backing store capable of incremental addition of data, handling 10s of megabytes. NSPPLs already have machinery to manage updates between the RAM cache and the disk.

Caching

There needs to be some way to minimize reading data from the dataspace server that hasn't been changed since the last time the client accessed it. The NSDataSpace class should use a notification scheme so the server can tell the client when data has changed. Otherwise, the client can maintain a local cache of data selected from the server.

When data is accessed, the client will cache the data by the (domain, key) tuple and register for a notification on data modification. When that tuple changes value, a notification is sent to the client.

Replication

Changes to a DataSpace should be propagated to all the other dataspaces. This might either be a push or pull mechanism. Synchronizing them could be interesting.

The first dataspace will be the master, and additional dataspace servers will be clones, with changes to the master pushed to the clones. When the master dies, the next in line will become the writeable store.

Migrating Monitor V4 to DataSpace

The existing Monitor suite uses a set of servers to provide persistence management. It is expected that those servers would become domains in the DataSpace server. For example, to use the nameserver, the client would now use:

 [dataSpace objectForKey:@"Hosts" domain:@"NameServer"];
For the ESS, data would be written to the domain 'WOSessionDomain'. For SuperStateStore, the domain would be the name of the application.

DataMaster

The DataSpace is the replacement for the DataMaster. However, the DataSpace server can manage multiple domains, and can thus take over the functionality of the ExternalStateServer, the NameServer, the SuperStateServer and the ThreadStorage server.

A DataSpace server is created by one of two means:

  1. Running the Unix application DataSpace, which creates a DataSpace server.
  2. Creating an instance of NSDataSpaceManager.

DMSessionStore

The DMSessionStore class currently deals with the process of connecting to a DataMaster and creating thread records. The entire interface for storing and retrieving data from the DataSpace can be reduced to the follwing two calls:

[[NSDataSpace defaultSpace] setObject:aSessionData forKey:aSessionKey domain:@"WOSessionStore"];
aSessionData = [[NSDataSpace defaultSpace] objectForKey:aSessionKey domain:@"WOSessionStore"];

DMSuperStateStore

Super state storage is the same as DMSessionStore, except the domain is tied to the application name. You can use the simple form of the DataSpace accessors:

[[NSDataSpace defaultSpace] setObject:aSessionData forKey:aSessionKey];
aSessionData = [[NSDataSpace defaultSpace] objectForKey:aSessionKey];

The domain is derived from the process name or the WebObjects name for the application.

DMThreadStore

The thread store is trickier. There key is still a sessionKey, but the value is an NSDictionary containing the status (BUSY, DONE) and the operationData, or message string. The domain is 'ThreadStore', and the client should register for a notification to receive updates to the value.

Monitor

Monitor may change to become a client of the DataSpace, setting and retrieving values such as start time and stop times.

WOMonitorableApplication

Monitorable applications no longer connect to the Monitor. Instead, they write their availability into a 'Monitorable' domain, which the Monitor registers interest in. Once they make themselves available, the Monitor can connect to them to query statistics and such. The monitorable applications also should register for information about themselves in the 'Declared' domain, so they can pick up configuration changes made by the monitor.

MonitorProxy

The MonitorProxy may be subsumed as a specialized domain in the DataSpace. If we can define a new domain called 'Tasks' which actually start processes when a value is set for an object and kill them when the object is removed from the domain, this functionality will simply consist of setting a key in a DataSpace on a given host.

References