EQUIP refactoring notes

Chris Greenhalgh, 2004-12-20, updated 2005-01

Introduction

The design goals for EQUIP have shifted over time. In its first version it is something of an over-arching and all-inclusive framework, with IDL, code loading, etc. Over time this has shifted to empasise ease of use for both programmers and users. For some forms of use this is supported in ECT through its hosting of standard components and provision of GUI tools. However this still leaves ECT in a framework/hosting role. In addition to this we wish to consider easier use of EQUIP/ECT from non-framework applications across a range of languages and platforms, e.g. C/C++ applications (such as Chromium and/or OpenGL applications), C# applications, applications on less capable platforms (PDAs, phones).

Goals

Easy to learn for progammers

small, self-contained APIs

Easy to integrate for programmers

easy data-type mapping/specification
API/library available on a range of languages and platforms including C#/Windows unmanaged C/Windows, Java
wire-protocol standardised and relatively simple to implement against for a new platform
flexible threading options (including very simple/no internal threads)
flexible communication options (including very simple, e.g. over HTTP for phones)
works with firewalls, NAT, etc.

Easy to build/extend
Easy to add value

flow control/management
logging, record & reuse
state management
consistency management
persistence
tool support
configuration/deployment support
helps to make reusable libraries/modules/services
existing set of reusable services/modules/libraries

Backward compatibility?!

Facets

One way of looking at the essence of EQUIP is as a transparently distributable Model in the sense of the Model-View-Controller pattern.

EQUIP combines a number of functions/roles that could/should be more clearly separated (for extension, management, etc.), including:

a place to put objects
persistence of those objects
a query mechanism
communication
replication
broadcast/distribution
cacheing
event distribution
shared blackboard
a common description of state manipulation (events)

Although EQUIP allowed arbitrary extensions to objects (methods, etc.) they are first and foremost Data Objects.

A dataspace is a (logical??) bag for putting objects in.

Like JMS and Hibernate access to objects in a dataspace should probably be managed via single threaded and generally short-lived sessions (or similar). This provides a clean threading/coordination model and can support transactions.

It feels like it could be important to have multiple simultaneous views on the 'same' dataspace, e.g. how another process is seeing this, how i am seeing it now, how i was seeing it then... It should be possible to compare and diff these as well.

What can we say about the Data Objects that EQUIP2 might 'manage'?

we would like people to be able to use their own classes wherever possible
these should include POJOs
it may not be possible to use unannotated objects fully (cf. hibernate) (separate metadata possible).
for some uses the methods equals and hashCode must be implement (cf. hibernate)
are they mutable??
can they be detached from sessions??
can sessions be concurrent??
are they locked? how, when, why??
how are mutations/modifications detected?

==?
equals??
some metadata-specified 'key'?
code modification - dirty bits?
cloning and field comparison?

how are they externalised - for communication, for storage

java Serializable?
field introspection?
custom interface? (see data type/model?!)

What will happen to them? when? how? where do they come from? where do they go?

an application obtains a reference to a Data Space object.
it establishes a connection (or something like that) to the Data Space
in the course of some activity is opens a session with that connection
within that session it

creates some Data Objects and adds them
looks up some Data Objects by query of some sort (template match, field match, key??)
modifies some such Data Objects
deletes some such Data Objects

closes the session (or aborts it)
...
closes the connection

On the responsive side...

...
it establishes a long-running session (session factory? monitor?) and configures it to run certain code under certain circumstances

esp. particular changes in the Data Objects in the data space
timer events?

as these occur the corresponding code is run within appropriately constructed session(s)
...
closes the session/unconfigures the monitor/whatever

Following hibernate we might identify 'managed' Data Objects as those currently actively managed by EQUIP, i.e. within a (or more than one) session. These might be instrumented, tracked, etc. Other objects are ignored by EQUIP and recieve no special treatment.

Orthogonally the process may configure/manage...

the data space

the persistence of Data Object and changes
relationships to other data spaces
indexing

the relationship(s) between sessions

the replication or communication of Data Object, changes, etc.

including prioritisation, reliability, discard policies, etc.

the durability, reliability, consistency etc. of changes made in sessions
the links between changes and events/triggers
fetching/cacheing policy in relation to sessions and also queries

Data Object structure issues... matching (search/query) and mutation both expose an implicit or explicit model of the internal structure of a Data Object, i.e. what are the terms/elements used to describe 'match', and when is an object the 'same' but 'changed', or a 'new' object? Different technologies and approaches have different internal/structural models, e.g.

relational database - table
OO - objects with fields/properties and inheritance/interfaces
tuple - untyped vector
RDF - statement soup
OWL - DL classes and individuals

Communication

It is important to:

anticipate and have support for intermittant connectivity

[replication?! asynchronous and/or generative communication?!]
[flexible policies/methods/responses to peer dis/connection and GC]

to allow reliable communication over unreliable channels

[application-level reliability - message identification & acking]

support multiple communication channels, with different QoS & cost (e.g. bluetooth, wired serial, USB, Ethernet, WiFi, GPRS, WAP, SMS), and to take this into account

[unified device communication management]

to have a minimal J2ME MIDP1.0 communication option

[HTTP/WAP, client-driven, only]

to separate 'business' logic from details of communication (e.g. platform variations)

[dataspace paradigm (for buffers etc.)?! reactive programming; continuations?!]

to allow introspection and remote management (where possible)

[remote access to all dataspaces via common protocol?!]

to have flexible threading options

[managed 'task' support?!]

to integrate with transaction/persistence options/facilities

[persistent/transactional communication buffers?!]

to allow application adaptation to conditions

[introspection interface for available channels (qos, cost, usage), confirmed sends]

Walk-through:

a client application wishes to perform an RPC-like operation on a particular server...
it places a request object in the client dataspace which is the outbound queue to that particular server
the communication manager determines (following poll or DS change notification) that there are outbound requests...
the communication manager performs a scheduling round against candidate messages, and may select and schedule one or more communication activities [suppose it schedule this request...]
the task manager allocates a thread to the schedule communication activity
the communication activity attempts an HTTP post (say) to the identified server [how did it know to use an HTTP post? how did it get the server URL? how does it know whether to do this reliably?]
if a response is successfully received then...??

the communication activity removes the request object?
the communication activity creates a response object in another dataspace?

which causes a response continuation to be scheduled [which response continuation?]

which removes the response object

and removes the request object?

the communication activity modifies the request object into a completed request object, including response?

which causes a response continuation to be scheduled [which response continuation?]

which removes the response object

Tricky questions about relationship to persistence, transactions, failures/restarts, continuations...

if the outgoing dataspace is non-persistent then requests will be lost on restart
on restart the application must restart(!) from persistent information only (static configuration, RMS information incl. persistent dataspace(s))

this might cause an attempt to communicate with some other process for initialisation/restart purposes

if the outgoing dataspace is persistent, then requests will be retained, and they are now deemed to be part of the durable state

it would make sense (avoid redundant communication, at least) to have any successful response also persistent

certainly response must be persistent if the persistent request is modified or removed

normal continuations are not persistent (?!)

but could kind of be...

we can make the call datastructure persistent

we can't call arbitrary methods without reflection on J2ME (cf. normal web service)
but we can instantiate arbitrary classes with a no-arg constructor, cast to a common interface and call a known method of that interface, i.e. a kind of functor

So, in the outbound dataspace we need:

the information needed to construct the actual request (minus anything which may be defined according to the dataspace itself)
any additional information needed to control/manage the request (e.g. protocol)
any additional information needed to control/manage the handling of any response (e.g. continuation class information)
any information needed by any response-handling task (e.g. continuation-specific state relating to the request)

Should metadata be an additional orthogonal capability of the dataspace, or should a metadata-holding type be used in the first place?

for metadata-holding type use equip2.core.objects.ValueAndMetadata