EQUIP refactoring notes
Chris Greenhalgh, 2004-12-20, updated 2005-01
Introduction
The design goals for EQUIP have shifted over time. In its first version
it is something of an over-arching and all-inclusive framework, with
IDL, code loading, etc. Over time this has shifted to empasise ease of
use for both programmers and users. For some forms of use this is
supported in ECT through its hosting of standard components and
provision of GUI tools. However this still leaves ECT in a
framework/hosting role. In addition to this we wish to consider easier
use of EQUIP/ECT from non-framework applications across a range of
languages and platforms, e.g. C/C++ applications (such as Chromium
and/or OpenGL applications), C# applications, applications on less
capable platforms (PDAs, phones).
Goals
- Easy to learn for progammers
- small, self-contained APIs
- Easy to integrate for programmers
- easy data-type mapping/specification
- API/library available on a range of languages and platforms
including C#/Windows unmanaged C/Windows, Java
- wire-protocol standardised and relatively simple to implement
against for a new platform
- flexible threading options (including very simple/no internal
threads)
- flexible communication options (including very simple, e.g.
over HTTP for phones)
- works with firewalls, NAT, etc.
- Easy to build/extend
- Easy to add value
- flow control/management
- logging, record & reuse
- state management
- consistency management
- persistence
- tool support
- configuration/deployment support
- helps to make reusable libraries/modules/services
- existing set of reusable services/modules/libraries
- Backward compatibility?!
Facets
One way of looking at the essence of EQUIP is as a transparently
distributable Model in the sense of the Model-View-Controller pattern.
EQUIP combines a number of functions/roles that could/should be more
clearly separated (for extension, management, etc.), including:
- a place to put objects
- persistence of those objects
- a query mechanism
- communication
- replication
- broadcast/distribution
- cacheing
- event distribution
- shared blackboard
- a common description of state manipulation (events)
Although EQUIP allowed arbitrary extensions to objects (methods,
etc.) they are first and foremost Data Objects.
A dataspace is a (logical??) bag for putting objects in.
Like JMS and Hibernate access to objects in a dataspace should
probably be managed via single threaded and generally short-lived
sessions (or similar). This provides a clean threading/coordination
model and can support transactions.
It feels like it could be important to have multiple simultaneous
views on the 'same' dataspace, e.g. how another process is seeing this,
how i am seeing it now, how i was seeing it then... It should be
possible to compare and diff these as well.
What can we say about the Data Objects that EQUIP2 might 'manage'?
- we would like people to be able to use their own classes wherever
possible
- these should include POJOs
- it may not be possible to use unannotated objects fully (cf.
hibernate) (separate metadata possible).
- for some uses the methods equals and hashCode must be implement
(cf. hibernate)
- are they mutable??
- can they be detached from sessions??
- can sessions be concurrent??
- are they locked? how, when, why??
- how are mutations/modifications detected?
- ==?
- equals??
- some metadata-specified 'key'?
- code modification - dirty bits?
- cloning and field comparison?
- how are they externalised - for communication, for storage
- java Serializable?
- field introspection?
- custom interface? (see data type/model?!)
What will happen to them? when? how? where do they come from? where
do they go?
- an application obtains a reference to a Data Space object.
- it establishes a connection (or something like that) to the Data
Space
- in the course of some activity is opens a session with that
connection
- within that session it
- creates some Data Objects and adds them
- looks up some Data Objects by query of some sort (template
match, field match, key??)
- modifies some such Data Objects
- deletes some such Data Objects
- closes the session (or aborts it)
- ...
- closes the connection
On the responsive side...
- ...
- it establishes a long-running session (session factory? monitor?)
and configures it to run certain code under certain circumstances
- esp. particular changes in the Data Objects in the data space
- timer events?
- as these occur the corresponding code is run within appropriately
constructed session(s)
- ...
- closes the session/unconfigures the monitor/whatever
Following hibernate we might identify 'managed' Data Objects as
those currently actively managed by EQUIP, i.e. within a (or more than
one) session. These might be instrumented, tracked, etc. Other objects
are ignored by EQUIP and recieve no special treatment.
Orthogonally the process may configure/manage...
- the data space
- the persistence of Data Object and changes
- relationships to other data spaces
- indexing
- the relationship(s) between sessions
- the replication or communication of Data Object, changes,
etc.
- including prioritisation, reliability, discard policies, etc.
- the durability, reliability, consistency etc. of changes made
in sessions
- the links between changes and events/triggers
- fetching/cacheing policy in relation to sessions and also
queries
Data Object structure issues... matching (search/query) and mutation
both expose an implicit or explicit model of the internal structure of
a Data Object, i.e. what are the terms/elements used to describe
'match', and when is an object the 'same' but 'changed', or a 'new'
object? Different technologies and approaches have different
internal/structural models, e.g.
- relational database - table
- OO - objects with fields/properties and inheritance/interfaces
- tuple - untyped vector
- RDF - statement soup
- OWL - DL classes and individuals
Communication
It is important to:
- anticipate and have support for intermittant connectivity
- [replication?! asynchronous and/or generative communication?!]
- [flexible policies/methods/responses to peer dis/connection and
GC]
- to allow reliable communication over unreliable channels
- [application-level reliability - message identification &
acking]
- support multiple communication channels, with different QoS &
cost (e.g. bluetooth, wired serial, USB, Ethernet, WiFi, GPRS, WAP,
SMS), and to take this into account
- [unified device communication management]
- to have a minimal J2ME MIDP1.0 communication option
- [HTTP/WAP, client-driven, only]
- to separate 'business' logic from details of communication (e.g.
platform variations)
- [dataspace paradigm (for buffers etc.)?! reactive programming;
continuations?!]
- to allow introspection and remote management (where possible)
- [remote access to all dataspaces via common protocol?!]
- to have flexible threading options
- [managed 'task' support?!]
- to integrate with transaction/persistence options/facilities
- [persistent/transactional communication buffers?!]
- to allow application adaptation to conditions
- [introspection interface for available channels (qos, cost,
usage), confirmed sends]
Walk-through:
- a client application wishes to perform an RPC-like operation on a
particular server...
- it places a request object in the client dataspace which is the
outbound queue to that particular server
- the communication manager determines (following poll or DS change
notification) that there are outbound requests...
- the communication manager performs a scheduling round against
candidate messages, and may select and schedule one or more
communication activities [suppose it schedule this request...]
- the task manager allocates a thread to the schedule communication
activity
- the communication activity attempts an HTTP post (say) to the
identified server [how did it know to use an HTTP post? how did it get
the server URL? how does it know whether to do this reliably?]
- if a response is successfully received then...??
- the communication activity removes the request object?
- the communication activity creates a response object in another
dataspace?
- which causes a response continuation to be scheduled [which
response continuation?]
- which removes the response object
- and removes the request object?
- the communication activity modifies the request object into a
completed request object, including response?
- which causes a response continuation to be scheduled [which
response continuation?]
- which removes the response object
- ???
Tricky questions about relationship to persistence, transactions,
failures/restarts, continuations...
- if the outgoing dataspace is non-persistent then requests will be
lost on restart
- on restart the application must restart(!) from persistent
information only (static configuration, RMS information incl.
persistent dataspace(s))
- this might cause an
attempt to communicate with some other process for
initialisation/restart purposes
- if the outgoing dataspace is
persistent, then requests will be retained, and they are now deemed to
be part of the durable state
- it would make sense (avoid redundant communication, at least)
to have any successful response also persistent
- certainly response must be persistent if the persistent
request is modified or removed
- normal continuations are not persistent (?!)
- but could kind of be...
- we can make the call datastructure persistent
- we can't call arbitrary methods without reflection on J2ME
(cf. normal web service)
- but we can instantiate arbitrary classes with a no-arg
constructor, cast to a common interface and call a known method of that
interface, i.e. a kind of functor
So, in the outbound dataspace we need:
- the information needed to construct the actual request (minus
anything which may be defined according to the dataspace itself)
- any additional information needed to control/manage the request
(e.g. protocol)
- any additional information needed to control/manage the handling
of any response (e.g. continuation class information)
- any information needed by any response-handling task (e.g.
continuation-specific state relating to the request)
Should metadata be an additional orthogonal capability of the
dataspace, or should a metadata-holding type be used in the first place?
- for metadata-holding type use equip2.core.objects.ValueAndMetadata