logo

Summary

The main concern of this work is the development and investigation of new building blocks aiming at rapid and efficient learning. We chose the domain of continuous, high-dimensional, non-linear mapping tasks, as they often play an important role in sensorimotor transformations in the field of robotics. The design of better re-usable building blocks, not only adaptive neural network modules, but also hardware, as well as software modules can be considered as the desire for efficient learning in a broader sense. The construction of those building blocks is driven by the given experimental situation. Similar to a training exercise, the procedural knowledge of, for example, interacting with a device is usually incorporated in a building block, e.g. a piece of software. The criterion to call this activity “learning” is whether this “knowledge” can be later used, more precisely, re-used in form of “association” or “generalization” in a new, previously unexpected application situation.

The first part of this work was directed at the robotics infrastructure investment: the building and development of a test and research platform around an industrial robot manipulator Puma560 and a hydraulic multifinger hand. We were particularly concerned about the interoperability of the complex hardware by general purpose Unix computers in order to gain the flexibility needed to interface the robots to distributed information processing architectures.

For more intelligent and task-oriented action schemata the availability of fast and robust sensory environment feedback is a limiting factor. Nevertheless, we encountered a significant lack in suitable and commercially available sensor sub-systems. As a consequence, we started to enlarge the robot's sensory equipment in the direction of force, torque, and haptic sensing. We developed a multi-layer tactile sensor for detailed information on the current contact state with respect to forces, locations and dynamic events. In particular, the detection of incipient slip and timely changes of contact forces are important to improve stable fine control on multi-contact grasp and release operations of the articulated robot hand. Returning to the more narrow sense of rapid learning, what is important? To be practical, learning algorithms must provide solutions that can compete with solutions hand-crafted by a human who has analyzed the system. The criteria for success can vary, but usually the costs of gathering data and of teaching the system are a major factor on the side of the learning system, while the effort to analyze the problem and to design an algorithm is on the side of the hand crafted solution.

Here we suggest the “Parameterized Self-Organizing Map” as a versatile module for the rapid learning of high-dimensional, non-linear, smooth relations. As shown in a row of application examples, the PSOM learning mechanism offers excellent generalization capabilities based on a remarkably small number of training examples. Internally, the PSOM builds an m-dimensional continuous mapping manifold, which is embedded in a higher d-dimensional task space (d  m). This manifold is supported by a set of reference vectors in conjunction with a set of basis functions. One favorable choice of basis functions is the class of (m-fold) products of Lagrange approximation polynomials. Then, the (m-dimensional) grid of reference vectors parameterizes a topologically structured data model.

This topologically ordered model provides curvature information — information which is not available within other learning techniques. If this assumed model is a good approximation, it significantly contributes to achieve the presented generalization accuracy. The difference of information contents — with and without such a topological order— was emphasized in the context of the robot finger kinematics example.

On the one hand, the PSOM is the continuous analog of the standard discrete “Self-Organizing Map” and inherits the well-known SOM's unsupervised learning capabilities (Kohonen 1995). One the other hand, the PSOM offers a most rapid form of “learning”, i.e. the form of immediate

construction of the desired manifold. This requires to assign the training data set to the set of internal node locations. In other words, for this procedure the training data set must be known, or must be inferred (e.g. with the SOM scheme). The applicability is demonstrated in a number of examples employing training data sets with the known topology of a multi-dimensional Cartesian grid. The resulting PSOM is immediately usable—without any need for time consuming adaptation sequences. This feature is extremely advantageous in all cases where the training data can be sampled actively. For example, in robotics, many sensorimotor transformations can be sampled in a structured manner, without any additional cost. Irrespectively of how the data model was initially generated the PSOM can be fine-tuned on-line. Using the described error minimization procedure, a PSOM can be refined even in the cases of coarsely sampled data, when the original training data was corrupted by noise, or the underlying task is changing. This is illustrated by the problem of adapting to sudden changes in the robot's geometry and its corresponding kinematics.

The PSOM manifold is also called parameterized associative map since it performs auto-associative completion of partial inputs. This facilitates multidirectional mapping in contrast to only uni-directional feed-forward networks. Which components of the embedding space are selected as inputs, is simply determined by specifying the diagonal elements pk of the projection matrix P. This mechanism allows to easily augment the embedding space by further sub-spaces. As pointed out, the PSOM algorithm can be implemented, such that inactive components do not affect the normal PSOM operation.

Several examples demonstrate how to profitably utilize the multi-way association capabilities: e.g. feature sets can be completed by a PSOM in such a manner that they are invariant against certain operations (e.g. shifted/rotated object) and provide at the same pass the unknown operation parameter (e.g. translation, angles). The same mechanism offers a very natural and flexible way of sensor data fusion. The incremental availability of more and more results from different sensors can be used to improve the measurement accuracy and confidence of recognition. Furthermore, the PSOM multi-way capability enables an effective way of inter-sensor coordination and sensor system guidance by predictions.

Generally, in robotics the availability of precise mappings from and to different variable spaces, including sensor, actuator, and reference coordinate spaces, plays a crucial role. The applicability of the PSOM is demonstrated in the robot finger application, where it solves the classical forward and inverse kinematics problem in Cartesian, as well as in the actuator piston coordinates — within the same PSOM. Here, a set of only 27 training data points turns out sufficient to approximate the 3D inverse kinematics relation with a mean positioning deviation of about 1% of the entire workspace range.

The ability to augment the PSOM embedding space allows to easily add a “virtual sensor” space to the usual sensorimotor map. In conjunction with the ability of rapid learning this opens the interesting possibility to demonstrate desired robot task performance. After this learning by demonstration phase, robot tasks can also be specified as perceptual expectations in this newly learned space.

The coefficients pk can weight the components relative to each other, which is useful when input components are differently confident, important, or of uneven scale. This choice can be changed on demand and can even be modulated during the iterative completion process. Internally, the PSOM associative completion process performs an iterative search for the best-matching parameter location in the mapping manifold. This minimization procedure can be viewed as a recurrent network dynamics with an continuous attractor manifold instead of just attractor points like in conventional recurrent associative memories. The required iteration effort is the price for rapid learning. Fortunately, it can be kept small by applying a suitable, adaptive second order minimization procedure (Sect. 4.5). In conjunction with an algorithmic formulation optimized for efficient computation also for high-dimensional problems, the completion procedure converges already in a couple of iterations.

For special purposes, the search path in this procedure can be directed. By modulating the cost function during the best-match iteration the PSOM algorithm offers to partly comply to an additional, second-rank goal function, possibly contradicting the primary target function. By this means, a mechanism is available to flexibly optimize a mix of extra constraints on demand. For example, the six-dimensional inverse Puma kinematics can be handled by one PSOM in the given workspace. For under-specified positioning tasks the same PSOM can implement several options to flexibly resolve the redundancies problem.

Despite the fact that the PSOM builds a global parametric model of the map, it also bears the aspect of a local model, which maps each reference point exactly (without any interferences by other training points, due to the orthogonal set of basis functions). The PSOM's character of being a local learning method can be gradually enhanced by applying the “Local-PSOMs” scheme. The L-PSOM algorithm constructs the constant sized PSOM on a dynamically determined sub-grid and keeps the computational effort constant when the number of training points increases. Our results suggest an excellent cost–benefit relation when using more than four nodes.

A further possibility to improve the mapping accuracy is the use of “Chebyshev spaced PSOM”. The C-PSOM exploits the superior approximation capabilities of the Chebyshev polynomials for the design of the internal basis functions. When using four or more nodes per axis, the data sampling and the associated node values are taken according to the distribution of the Chebyshev polynomial's zeros. This imposes no extra effort but offers a significant precision advantage.

A further main concern of this work is how to structure learning systems such that learning can be efficient. Here, we demonstrated a hierarchical approach for context dependent learning. It is motivated by a decomposition of the learning phase into two different stages: A longer, initial “investment learning” phase “invests” effort in the collection of expertise in prototypical context situations. In return, in the following “one-shot adaptation”

 

stage the system is able to extremely rapidly adapt to a new changing context situation. While PSOMs are very well suited for this approach, the underlying idea to “compile” the effect of a longer learning phase into a one-step learning architecture is more general and is independent of the PSOMs. The META-BOX controls the parameterization of a set of context specific “skills” which are implemented in a parameterized box - denoted T-BOX. Iterative learning of a new context task is replaced by the dynamic re-parameterization

  

through the META-BOX-mapping, dependent on the characterizing observation of the context. This emphasizes an important point for the construction of more powerful learning systems: in addition to focusing on output value learning, we should enlarge our view towards mappings which produce other mappings as their result. Similarly, this embracing consideration received increasing attention in the realm of functional programming languages.

To implement this approach, we used a hierarchical architecture of mappings, called the “mixture-of-expertise” architecture. While in principle various kinds of network types could be used for these mappings, a practically feasible solution must be based on a network type that allows to construct the required basis mappings from a rather small number of training examples. In addition, since we use interpolation in weight/parameter space, similar mappings should give rise to similar weight sets to make interpolation of expertise meaningful.

We illustrated three versions of this approach when the output mapping was a coordinate transformation between the reference frame of the camera and the object centered frame. They differed in the choice of the utilized T-BOX. The results showed that on the T-BOX level the learning PSOM network can fully compete with the dedicated engineering solution, additionally offering multi-way mapping capabilities. At theMETA-BOX

 

level the PSOM approach is a particularly suitable solution because, first, it requires only a small number of prototypical training situations, and second, the context characterization task can profit from the sensor fusion capabilities of the same PSOM, also called Meta-PSOM.

We also demonstrated the potential of this approach with the task of 2D and 3D visuo-motor mappings, learnable with a single observation after changing the underlying sensorimotor transformation, here e.g. by repositioning the camera, or the pair of individual cameras. After learning by a single observation, the achieved accuracy compares rather well with the direct learning procedure. As more data becomes available, the T-PSOM can be fine-tuned to improve the performance to the level of the directly trained T-PSOM.

The presented arrangement of a basis T-PSOM and two Meta-PSOMs further demonstrates the possibility to split the hierarchical “mixture-ofexpertise” architecture into modules for independently changing parameter sets. When the number of involved free context parameters is growing, this factorization is increasingly crucial to keep the number of pre-trained prototype mappings manageable. The two hierarchical architectures, the “mixture-of-expert” and the introduced “mixture-of-expertise” scheme, complement each other. While the PSOM as well as the T-BOX/META-BOX approach are very efficient learning modules for the continuous and smooth mapping domain, the “mixture-of-expert” scheme is superior in managing mapping domains which require non-continuous or non-smooth interfaces. As pointed out, the T-BOX-concept is not restricted to a particular network type, and the “mixture-of-expertise” can be considered as a learning module by itself. As a result, the conceptual combination of the presented building blocks opens many interesting possibilities and applications.