In the last two months I did several experiments on my iCub simulator.
I had enhanced my reaching module and tested it with various parameters and state/action representations.
In each experiment I train reaching CACLA for 1000-2000 episodes and test it on the fly after every 50 episodes.
In most of experiments I used exploration factor λ = 1.0 and it was decreasing continuously after each episode (I divide λ by 0.995 after epizode so it becomes 0.37 after 200 episodes and 0.1 after 450 episodes). Therefore, we can say, that iCub was doing exploration in the first 350-400 episodes and than just exploitation and fine-tuning.
The initial state of arm position in the space was generated randomly before every episode from defined subspace. The final state was set to one point (by 3D cartezian coordinates) and it was not changed in most of the experiments. This task is much easier than space aproximation with robot arm and I did it beacuse I needed to test my implementation, to test behaviour of the networks with different parameters, etc.
Reward function for CACLA was simply euclidean distance between hand and final position scaled to <-1,1>.
I tried also version were I squared the final reward (with preservation of sign), because I thought that iCub satisfies himself, when he get close to object and is only very slightly motivated to get even closer. Hovewer, I find, that learning was more difficult for iCub with this reward function.
The state representations I used were:
- 3d coordinates of target position (x, y, z)
- 4 DoF which we are manipulating and 3d coordinates of target position (a, b, c, d) (x, y, z) where DoF where scaled to <-1, 1>
- 3d coordinates of hand center and 3d coordinates of target position (hx, hy, hz) (x, y, z)
The action generated by actor was 4-dimensional vector (a’, b’, c’, d’) which was interpreted as target absolute angles for particular DoF. In first experiments I try also relative change of angle (this takes more time to learn, so I didn’t used it in later experiments).
I’ve made some first grasping experiments. In these I learn to grasp static object located in space (not on a table) by controling 8 DoF, however I placed some constraints here.
Actor generates 3-dimensional vector (t, p, f), where each component is in range <-1,1> and is rescaled to iCub DoF absolute angles. Component t controls thumb flexion, p controls palm flexion and f controls simultaneously all other fingers flexion.
July 2017 M T W T F S S « Mar 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
iCub on Twitter