Summary
The main concern of this work is the development
and investigation of new building blocks aiming at rapid and
efficient learning. We chose the domain of continuous,
high-dimensional, non-linear mapping tasks, as they often play
an important role in sensorimotor transformations in the field
of robotics. The design of better re-usable building blocks, not
only adaptive neural network modules, but also hardware, as well
as software modules can be considered as the desire for
efficient learning in a broader sense. The construction of those
building blocks is driven by the given experimental situation.
Similar to a training exercise, the procedural knowledge of, for
example, interacting with a device is usually incorporated in a
building block, e.g. a piece of software. The criterion to call
this activity
“learning”
is whether this “knowledge” can be later used, more
precisely, re-used in
form of “association” or “generalization” in a new, previously
unexpected application situation.
The first part of this work was directed at the
robotics infrastructure investment: the building and development
of a test and research platform around an industrial robot
manipulator Puma560 and a hydraulic multifinger hand. We were
particularly concerned about the interoperability of the complex
hardware by general purpose Unix computers in order to gain the
flexibility needed to interface the robots to distributed
information processing architectures.
For more intelligent and task-oriented action
schemata the availability of fast and robust sensory environment
feedback is a limiting factor. Nevertheless, we encountered a
significant lack in suitable and commercially available sensor
sub-systems. As a consequence, we started to enlarge the robot's
sensory equipment in the direction of force, torque, and haptic
sensing. We developed a multi-layer tactile sensor for detailed
information on the current contact state with respect to forces,
locations and dynamic events. In particular, the detection of
incipient slip and timely changes of contact forces are
important to improve stable fine control on multi-contact grasp
and release operations of the articulated robot hand. Returning
to the more narrow sense of rapid learning, what is
important? To be practical, learning algorithms must provide
solutions that can compete with solutions hand-crafted by a
human who has analyzed the system. The criteria for success can
vary, but usually the costs of gathering data and of teaching
the system are a major factor on the side of the learning
system, while the effort to analyze the problem and to design an
algorithm is on the side of the hand crafted solution.
Here we suggest the “Parameterized
Self-Organizing Map” as a versatile module for the rapid
learning of high-dimensional, non-linear, smooth relations. As
shown in a row of application examples, the PSOM learning
mechanism offers excellent generalization capabilities based on
a remarkably small number of training examples. Internally, the
PSOM builds an m-dimensional
continuous mapping manifold, which is embedded in a higher
d-dimensional task space (d
m).
This manifold is supported by a set of reference vectors in
conjunction with a set of basis functions. One favorable choice
of basis functions is the class of (m-fold)
products of Lagrange approximation polynomials. Then, the (m-dimensional)
grid of reference vectors parameterizes a topologically
structured data model.
This topologically ordered model provides
curvature information — information which is not available
within other learning techniques. If this assumed model is a
good approximation, it significantly contributes to achieve the
presented generalization accuracy. The difference of information
contents — with and without such a topological order— was
emphasized in the context of the robot finger kinematics
example.
On the one hand, the PSOM is the continuous
analog of the standard discrete “Self-Organizing Map” and
inherits the well-known SOM's unsupervised learning capabilities
(Kohonen 1995). One the other hand, the PSOM offers a most rapid
form of “learning”, i.e. the form of
immediate
construction of the desired manifold. This
requires to assign the training data set to the set of internal
node locations. In other words, for this procedure the training
data set must be known, or must be inferred (e.g. with the SOM
scheme). The applicability is demonstrated in a number of
examples employing training data sets with the known topology of
a multi-dimensional Cartesian grid. The resulting PSOM is
immediately usable—without any need for time consuming
adaptation sequences. This feature is extremely advantageous in
all cases where the training data can be sampled actively. For
example, in robotics, many sensorimotor transformations can be
sampled in a structured manner, without any additional cost.
Irrespectively of how the data model was initially generated the
PSOM can be fine-tuned on-line. Using the described error
minimization procedure, a PSOM can be refined even in the cases
of coarsely sampled data, when the original training data was
corrupted by noise, or the underlying task is changing. This is
illustrated by the problem of adapting to sudden changes in the
robot's geometry and its corresponding kinematics.
The PSOM manifold is also called
parameterized associative map since it performs
auto-associative completion of partial inputs. This
facilitates multidirectional mapping in contrast to only
uni-directional feed-forward networks. Which components of the
embedding space are selected as inputs, is simply determined by
specifying the diagonal elements pk
of the projection matrix P.
This mechanism allows to easily augment the embedding space by
further sub-spaces. As pointed out, the PSOM algorithm can be
implemented, such that inactive components do not affect the
normal PSOM operation.
Several examples demonstrate how to profitably
utilize the multi-way association capabilities: e.g. feature
sets can be completed by a PSOM in such a manner that they are
invariant against certain operations (e.g. shifted/rotated
object) and provide at the same pass the unknown operation
parameter (e.g. translation, angles). The same mechanism offers
a very natural and flexible way of sensor data fusion.
The incremental availability of more and more results from
different sensors can be used to improve the measurement
accuracy and confidence of recognition. Furthermore, the PSOM
multi-way capability enables an effective way of inter-sensor
coordination and sensor system guidance by predictions.
Generally, in robotics the availability of
precise mappings from and to different variable spaces,
including sensor, actuator, and reference coordinate spaces,
plays a crucial role. The applicability of the PSOM is
demonstrated in the robot finger application, where it solves
the classical forward and inverse kinematics problem in
Cartesian, as well as in the actuator piston coordinates —
within the same PSOM. Here, a set of only 27 training data
points turns out sufficient to approximate the 3D inverse
kinematics relation with a mean positioning deviation of about
1% of the entire workspace range.
The ability to augment the PSOM embedding space
allows to easily add a “virtual sensor” space to the usual
sensorimotor map. In conjunction with the ability of rapid
learning this opens the interesting possibility to demonstrate
desired robot task performance. After this learning by
demonstration phase, robot tasks can also be specified as
perceptual expectations in this newly learned space.
The coefficients pk
can weight the components relative to each other, which
is useful when input components are differently confident,
important, or of uneven scale. This choice can be changed on
demand and can even be modulated during the iterative completion
process. Internally, the PSOM associative completion process
performs an iterative search for the best-matching
parameter location in the mapping manifold. This minimization
procedure can be viewed as a recurrent network dynamics with an
continuous attractor manifold instead of just attractor
points like in conventional recurrent associative memories. The
required iteration effort is the price for rapid learning.
Fortunately, it can be kept small by applying a suitable,
adaptive second order minimization procedure (Sect. 4.5). In
conjunction with an algorithmic formulation optimized for
efficient computation also for high-dimensional problems, the
completion procedure converges already in a couple of
iterations.
For special purposes, the search path in this
procedure can be directed. By modulating the cost function
during the best-match iteration the PSOM algorithm offers to
partly comply to an additional, second-rank goal function,
possibly contradicting the primary target function. By this
means, a mechanism is available to flexibly optimize a mix of
extra constraints on demand. For example, the six-dimensional
inverse Puma kinematics can be handled by one PSOM in the given
workspace. For under-specified positioning tasks the same PSOM
can implement several options to flexibly resolve the
redundancies problem.
Despite the fact that the PSOM builds a
global parametric model of the map, it also bears the aspect
of a local model, which maps each reference point exactly
(without any interferences by other training points, due to the
orthogonal set of basis functions). The PSOM's character of
being a local learning method can be gradually enhanced by
applying the “Local-PSOMs” scheme. The L-PSOM algorithm
constructs the constant sized PSOM on a dynamically determined
sub-grid and keeps the computational effort constant when the
number of training points increases. Our results suggest an
excellent cost–benefit relation when using more than four nodes.
A further possibility to improve the mapping
accuracy is the use of “Chebyshev spaced PSOM”. The
C-PSOM exploits the superior approximation capabilities of the
Chebyshev polynomials for the design of the internal basis
functions. When using four or more nodes per axis, the data
sampling and the associated node values are taken according to
the distribution of the Chebyshev polynomial's zeros. This
imposes no extra effort but offers a significant precision
advantage.
A further main concern of this work is how to
structure learning systems such that learning can be efficient.
Here, we demonstrated a hierarchical approach for context
dependent learning. It is motivated by a decomposition of the
learning phase into two different stages: A longer, initial
“investment learning” phase “invests” effort in the
collection of expertise in prototypical context
situations. In return, in the following “one-shot
adaptation”
stage the system is able to extremely rapidly adapt to
a new changing context situation. While PSOMs are very well
suited for this approach, the underlying idea to “compile” the
effect of a longer learning phase into a one-step learning
architecture is more general and is independent of the PSOMs.
The META-BOX
controls the parameterization of a set of context
specific “skills” which are implemented in a parameterized box -
denoted T-BOX.
Iterative learning of a new context task is replaced by the
dynamic re-parameterization
through the META-BOX-mapping,
dependent on the characterizing observation of the context. This
emphasizes an important point for the construction of more
powerful learning systems: in addition to focusing on output
value learning, we should enlarge our view towards mappings
which produce other mappings as their result. Similarly,
this embracing consideration received increasing attention in
the realm of functional programming languages.
To implement this approach, we used a
hierarchical architecture of mappings, called the
“mixture-of-expertise” architecture. While in principle
various kinds of network types could be used for these mappings,
a practically feasible solution must be based on a network type
that allows to construct the required basis mappings from a
rather small number of training examples. In addition, since we
use interpolation in weight/parameter space, similar mappings
should give rise to similar weight sets to make interpolation of
expertise meaningful.
We illustrated three versions of this approach
when the output mapping was a coordinate transformation between
the reference frame of the camera and the object centered frame.
They differed in the choice of the utilized T-BOX.
The results showed that on the T-BOX
level the learning PSOM network can fully compete with
the dedicated engineering solution, additionally offering
multi-way mapping capabilities. At theMETA-BOX
level the PSOM approach is a particularly suitable
solution because, first, it requires only a small number of
prototypical training situations, and second, the context
characterization task can profit from the sensor fusion
capabilities of the same PSOM, also called Meta-PSOM.
We also demonstrated the potential of this
approach with the task of 2D and 3D visuo-motor mappings,
learnable with a single observation after changing the
underlying sensorimotor transformation, here e.g. by
repositioning the camera, or the pair of individual cameras.
After learning by a single observation, the achieved accuracy
compares rather well with the direct learning procedure. As more
data becomes available, the T-PSOM can be fine-tuned to improve
the performance to the level of the directly trained T-PSOM.
The presented arrangement of a basis T-PSOM and
two Meta-PSOMs further demonstrates the possibility to split the
hierarchical “mixture-ofexpertise” architecture into modules for
independently changing parameter sets. When the number of
involved free context parameters is growing, this factorization
is increasingly crucial to keep the number of pre-trained
prototype mappings manageable. The two hierarchical
architectures, the “mixture-of-expert” and the introduced
“mixture-of-expertise” scheme, complement each other. While the
PSOM as well as the T-BOX/META-BOX
approach are very efficient learning modules for the
continuous and smooth mapping domain, the “mixture-of-expert”
scheme is superior in managing mapping domains which require
non-continuous or non-smooth interfaces. As pointed out, the T-BOX-concept
is not restricted to a particular network type, and the
“mixture-of-expertise” can be considered as a learning module by
itself. As a result, the conceptual combination of the presented
building blocks opens many interesting possibilities and
applications.