Calin Mocanu

Hextech Mecha-hand, First Assembly

2019-06-11T00:00:00+00:00

Got my first 3D printer a month ago so naturally my first project after printing Benchy is to make a 20 degree of freedom mecha-hand. Because simple beginner stuff, right?! Turns out the engineer was inside me all along, and I now have a functioning, near full human movement, mechanical arm. Well, I have the physical components assembled, but not the electronics to make usable yet. I’m working on those now!

Video of hand making a thumbs-up.

I’m trying to create a low cost version of a mechanical robot hand with near human range of motion. The structure is 3D printed and I’m using DC motors and cheap potentiometers to actuate it and get feedback. The fingers are driven by a pulley system mounted on the arm, much like the tendons and muscles of a human hand. There are also a couple of extra flexing points in the palm which gives it nearly the full human range of motion. With those included it has opposable thumbs and should be able to grasp pretty well. When un-powered it’s rigid because of the inherent tension in the pulley system, but it feels like a strong handshake when grasping. I built custom pressure sensors for the fingertips so it can sense when touching objects. I’ll also be implementing active compliance so it can react and mold around objects. There’s some potential it can be used as a prosthetic hand, but I’m more focused on robotics with deep reinforcement learning.

Here’s how it looks like assembled with motors for the first time:

Modelling

This is a rundown of the initial hand design and how many iterations I’ve gone through for each part. As my first 3D printed project I had to print quite a few iterations to understand what’s possible and what’s not with FDM printing. First lesson was to print parts sequentially, as the printer head leaves a lot of strings when it moves between components. The second lesson was on how bridging works, or rather doesn’t work when you want very precise dimensions. Because we print layers of plastic on top of each-other we always need something on the layer below for the plastic to stick to. We can go up to 60° away from vertical before the plastic starts to droop down too much. And finally I learned that tiny bits of plastic don’t stick enough to the printer bed and the Kraken hits you with a stringy mess. The last issue is really annoying when you’re trying to make small mechanisms… But at least the parts print fast.

With that said, onwards through my design history. Every paragraph below is roughly a whole prototyping day:

Make a pulley to act like tendons and muscles for the hand:
- First version really bad because I’m new to 3D printing, parts stuck together, and strings everywhere.
- Second version fits better, but needs more tweaking.
- 3rd iteration works pretty nice.

Of-course the “final” pulley was very far from final. Even if it looks like a simple part, it turned out to require a lot of work in the end. The issues with the version above is that I’m routing the nylon wire all around the plastic, introducing a lot of friction points unnecessarily. I’ll later fix this by simply pulling the strings sideways. But another more challenging issue is that I lack any means to tension up the nylon wires. We’ll get back to that later.

Design a finger:
- Initially I wanted to use xbox joysticks as 2D joints, but they don’t work because they don’t allow much movement.
- I designed a couple of versions of the skeleton with double bearings for every joint.
- Can only find cheap potentiometers for adjusting volume instead of ones with a shaft hole, have to improvise a connector.
- Putting the potentiometer on the same axis with a bearing requires a tiny fitting part that doesn’t stay in, and really brittle construction.
- Flipping potentiometer and having bridges span the axel makes for a huge 2D joint.
- Replaced one of the bearings with a potentiometer, much smaller, but still big. The 2nd axis bridge is still too large, flimsy and gets in the way.
- Put the bridge through the finger shaft. Can make the 2D joint as small as the potentiometer widths and much stronger.
- To be able to print it, I need an interlocking piece to affix the tip shaft to the bearing.
- Stringing the tendons was a bit tough, but managed to rout it around the finger and through the middle. First test of the finger seems to work awesome.
- Need to improve stringing because any bend around plastic puts a lot of friction on the string. I have a plan to ditch most bends, including the servo housing shenanigans.

Every finger iteration got smaller and smaller until I reached the size of the cheap potentiometers I bought. Turns out it’s really hard to search for potentiometers with a hole in them. Especially if you want them to be extra cheap. After I built the hand I found some that are tinier, maybe I’ll redesign it after the electronics.

Modelling the palm:
- Making the palm from 3 pieces so that the thumb and pinkie can pivot to get opposable thumbs.
- Custom bearing for the wrist joint. Needs to take axial loads, so can’t use the tiny bearings.
- Realized that hand/forearm rotation also needs to be measured, which means a rotation point, so we need another bearing. Making the wrist a 3D joint using 2 custom plastic bearings.
- Made the joint and barely fit all the nuts.
- Added the pivots to all the moving parts. Maybe the thumb and pinkie springs are not that good of an idea.
- Printed everything and it fits awesome \O.O/ (on the 3rd try).
- Put a lot more holes in the palm. Not sure if I can make it thinner as I need space for bearings/pots/balls etc.
- Minor fixes:
  - Make sure the thinnest layers are 0.5mm as printing thinner leads to dotting.
  - Chamfer the top layer of railing so there’s no hidden overhangs. Especially useful to make the bearing rails smooth.
  - Chamfer the bottom layer as the first layer is squished making holes too tight.

Other:
- Modelled finger tip as a flexible single layer plastic piece.
- Modelled arm part as 2 sticks supported by the motor mounts.
- Made a tiny plastic hook, to tie the nylon to the springs.
- Made a tube fork to split the 2 nylon wires from a single tube to a pair of tubes, when they need to go in different places.

The fingertips are made of a zig-zag plastic piece that’s meant to easily squeeze under pressure. With a stress gauge glued on the part that bends the most, and a second gauge on a flat side to complete one side of an H-bridge. On top of them I glued a thin rubber sheet to protect the gauges and give the fingers some grip.

Iterating

Switched to 1mm tube diameter for flexibility, fits perfectly the 2 wires needed from a motor. Also now routing wires directly to the pivot points to avoid routing through plastic as the friction is too much. Also optimized the pulley housing to removing the plastic tracks.

Holy Glob, gluing and soldering the stress gauges is an absolute pain. Tried getting a hot air gun but it melts the plastic, got to stick with soldering with the hot stick. Also no way to put the wires on first as it melts the gauges. Must mount gauges then solder.

Turns out if the stress gauges are stretched too much, especially near the soldering pads, they break. Surprise! Had to thicken the fingertip wall and invert the gauge, so the solder pads are compressed by the wires instead of stretched. Works like a charm now!

First assembly including wires! Absolute unit of a hand.

And it’s garbage! The 1mm thin PTFE tubes are too thin to support any weight, have to redesign back to thicker tubes. Will also redesign to be a tad smaller. Time to chop some fingers. Having springs all over the place was a bad also idea, I will change to some tensioning screws near the motors and rely on the nylon elasticity. We have to pull the nylon wires pretty tight to overcome the compression in the PTFE tubes.

Got new tubes and redesigned/re-printed holes to fit them. Also made the hand smaller. But holy fucking Christ, tying up the nylon wires and keeping tension on them is a major pain in the arse. Also even with thicker tubes, turns out long ones still don’t work so well, and because I only added a tiny screw for tensioning, there’s no way to account for the deformation and there’s so much lag in movement. Need shorter tubes and better tensioning device.

Redesigned housing to fit 4 motors on one side. Placing them on alternating sides so we have space for wires. But it’s too wide and too tall. I had to redesign it again with 3 motors per layer, but 2 on top and 1 on the bottom so it’s only tall on the top side.

Much work on the spindle and tensioner design. Opted for a single rail spindle, and just looping the wire around it and relying on tension and friction. That should simplify wiring a bit. For the tensioner I tried 4 designs. The original one is tightened with a 8mm screw, but that’s too short to tighten significantly and because the plastic is hard, it’s also difficult to wire it, as it needs tension while tying knots. Next 2 designs involved a wheel with one way teeth that will pull the string tight. But the wheel needed to be small enough to fit on the spindle which doesn’t allow enough length for the teeth to bend so it can only turn one way. Either the teeth are too small and they break, or they don’t bend at all. Alternative locking mechanism involved teeth perpendicular to surface. They lock in place by the tension of the wires. But this causes the wire to wrap around itself which leads to breaking it. Tried to place a rod to guide the wire around it, but it doesn’t work. 4th and hopefully final design is a staircase with a knob. I place the wire tube on the knob and pull it as high on the stairs as I can. It’s then held in place by wire tension.

I did not include the nylon wires and tubes in the 3D model. But as you can see in the close-ups below, the fingers are actuated by a pulley system. A nylon wire runs through a spindle around the motor shaft, it’s routed through the PTFE tubes to the fingers and then tied on the finger sections from both sides. Spinning the motor one way tightens one side and loosens the other, driving the finger in that direction. Spinning the motor the other way does the opposite and drives the finger back. This is the same principle that bicycles use to drive the breaks and gears.

It’s Working

First assembly with motors, and it works!

Video of hand making a thumbs-up.

Up next, I need to wire everything to an ESP32 and an Arduino to measure all the joints and control the motors. Yes, I need both chips to have enough GPIO pins to do everything. Even then I need a few extra IC like the PCA9685 led driver to control 16 motor speeds using just 2 I2C wires, and L293D H-bridges to control the direction of each motor, with possibly a few shift registers if I still run out of GPIO pins. By the way, the ESP32 chip looks amazing with 18 ADC inputs and WiFi and Bluetooth.

Really Good Papers (part 1)

2019-01-29T00:00:00+00:00

I’ve started a challenge to read a machine learning paper a day. Now after a few of months and a lot of catching up to do I’ve gone through a couple hundred papers.

These are my picks for the best ones I found:

The Shattered Gradients Problem: If resnets are the answer, then what is the question? David Balduzzi, Marcus Frean, Lennox Leary, JP Lewis, Kurt Wan-Duo Ma, Brian McWilliams (Balduzzi et al., 2017)
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning. Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling (Foerester et al., 2018)
Modular Networks: Learning to Decompose Neural Computation. Louis Kirsch, Julius Kunze, David Barber (Kirsch et al., 2018)
Attention Is All You Need. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (Vaswani et al., 2017)
Intrinsic Social Motivation via Causal Influence in Multi-Agent RL. Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas (Jaques et al., 2018)
Episodic Curiosity through Reachability. Nikolay Savinov, Anton Raichuk, Raphaël Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly (Savinov et al., 2018)
Optimizing Agent Behavior over Long Time Scales by Transporting Value. Chia-Chun Hung, Timothy Lillicrap, Josh Abramson, Yan Wu, Mehdi Mirza, Federico Carnevale, Arun Ahuja, Greg Wayne (Hung et al., 2018)
Uncertainty in Neural Networks: Bayesian Ensembling. Tim Pearce, Mohamed Zaki, Alexandra Brintrup, Nicolas Anastassacos, Andy Neely (Pearce et al., 2018)
The Laplacian in RL: Learning Representations with Efficient Approximations. Yifan Wu, George Tucker, Ofir Nachum (Wu et al., 2018)
How Does Batch Normalization Help Optimization? Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry (Santurkar et al., 2018)
Understanding disentangling in β-VAE. Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, Alexander Lerchner (Burgess et al., 2018)
Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (He et al., 2015)
Understanding the difficulty of training deep feedforward neural networks. Xavier Glorot, Yoshua Bengio (Glorot & Bengio, 2010)
Combined Reinforcement Learning via Abstract Representations. Vincent François-Lavet, Yoshua Bengio, Doina Precup, Joelle Pineau (François-Lavet et al., 2018)
MIT AGI: Building machines that see, learn, and think like people. Josh Tenenbaum (Tenenbaum @ MIT AGI, 2018)
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel (Jaderberg et al., 2018)
Scalable agent alignment via reward modeling: a research direction. Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg (Leike et al., 2018)
Curiosity-driven Exploration by Self-supervised Prediction. Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell (Pathak et al., 2017)
GAN Q-learning. Thang Doan, Bogdan Mazoure, Clare Lyle (Doan et al., 2018)
Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning. Noah Frazier-Logue, Stephen José Hanson (Frazier-Logue & Hanson, 2018)

My interpretation

That’s a chunky list in the classic style of M. Brundage or his mechanical dopleganger. Here it is again with my interpretation of each paper and why I think they are really good:

1. Shattered Gradients

(Balduzzi et al., 2017) analyse the surprising performance of residual networks compared to plain feed forward nets, especially at high layer count. They show that feed forward gradients approach white noise exponentially with layers. On the other hand resnets gradients transition to white noise at sublinear rate with layers. They theorize that white noise gradients provide little information when training and design the Looks Linear initialization to alleviate the problem. They successfully train a 200 layer feed forward net in an empirical experiment, obtaining the same performance as a resnet on CIFAR-10.

Deeper networks have exponentially increasing representation power. While residual networks can be deep they are also much more constrained than plain old networks. This paper thus greatly increases the design space available.

2. Bayesian Action Decoder

(Foerester et al., 2018) describe a neat inductive bias for agents to learn what private information other collaborating agents have about the state. They condition a set of collaborating agents to deterministically choose policy based on the public information available. The agents then make their private observations and act according to the policy. Because the policy is based on the public state and known to all, other agents can then infer the private state based on the action taken. They successfully test it on Hanabi and get near optimal scores. Also Hanabi sounds like a fun game!

This method of inferring the private information of other agents is conceptually simple. It also avoids the intractable recurrence of modelling other agents’ beliefs, and beliefs about beliefs…

3. Modular Networks

(Kirsch et al., 2018) implement a principled modular neural net that succeeds in specializing sub-networks to different tasks, or different parts of a sequential task as in text processing. Without using an entropy regularization term they avoid the problem of “module collapse”, when a single module is selected for all tasks. Unlike mixture of experts models, where computation is still carried through for all experts, the modules sharply specialize for each task. The work shows a neat examples of a recurrent network using a specialised module for words following “the”. Unfortunately they report poor generalization when testing on CIFAR-10 images.

To move from small scale, focused, experiments to extremely large, generalist, networks I believe we need modules if we ever want them to run on commodity hardware. Similarly to how the brain saves on energy and chemical budgets by using 10% at a time.

4. Attention Is All You Need

A transformative 🐆 work by the Google team (Vaswani et al., 2017), achieving state of the art results in language tasks by doubling down on attention and foregoing recurrence and convolutions. The main intuition is that attention can connect distant parts of the source sequence through a constant number of neural net steps, $O(1)$. In contrast recurrent nets need to keep distant inputs in memory for each step of the sequence, $O(n)$. At an intermediate level sit convolutional nets which need $O(log_k(n))$ layers for $k$ wide filters. The downside is that attention requires $O(n^2)$ compute, but this is trivially parallelizable.

I believe most problems can be solved by analysing a few influential factors in a sea of clutter, in this sense, attention is the appropriate inductive bias if only we could bypass the poor computational scaling.

5. Causal Influence in Multi-Agent RL

(Jaques et al., 2018) develop an intrinsic reward to encourage cooperation between independently trained agents. Unlike related works the agents don’t rely on common training, they use no cheap communications channel, and do not share rewards. Instead each agent receives an intrinsic reward proportional to their empowerment on other agents, in less fancy words, how much they can influence the behaviour of others. Influence is measured by counterfactual reasoning in the learned model of other agents’ behaviour, where each agents checks which of its actions can greatly change the actions of others. In their toy collaborative example an agent learns to act as a scout by spinning around when there’s no high reward in sight, allowing others to keep collecting low rewards instead of exploring.

The paper presents empowerment as a proxy for coordination. Unlike other multi-agent works, both acting, rewards and training is independent for each agent.

6. Episodic Curiosity through Reachability

(Savinov et al., 2018) rephrase intrinsic curiosity by learning a model that orders states by how reachable they are from each other. A curiosity incentive is then given whenever a state is significantly different than any state in memory. With this method they overcome the noisy TV problem where an agent is drawn to naturally unpredictable states. With the reachability metric, a TV can be deemed close in action space and eventually considered explored. They learn the reachability metric by sampling 2 states from a trajectory and training a network to predict whether they are within $k$ time steps.

Consistent exploratory behaviour is essential in many situations, as random exploration is exponentially unlikely to find distant rewards. Curiosity has been proposed as a solution, but current approaches get stuck in naturally unpredictable states. The authors propose reachability as an effective solution.

7. Transporting Value over Long Time Scales

(Hung et al., 2018) address the problem of long term credit assignment by transporting value to past states based on attention over memory. To prove their method, the authors design a difficult task where meaningful actions are separated by a distractor phase. The agent must first collect a key, then collect high rewards over a long time frame of 500 frames, and finally open the corresponding door for a small amount of extra reward. They solve the task by encoding all observations in memory and using an attention mechanism. When observations pass an attention threshold they add the current value to the attended time step. They solve the task using $\gamma = .96$ even though classic RL cannot solve it for $\gamma = 1$ due to the variable distraction reward.

Solving problems over extremely long time scales is an important feat for intelligent behaviour. We take meaningful actions across days, or years, despite the various distractions we encounter through the day. The authors propose a method to connect rewards and distant actions, bypassing the discount formalism.

8. Bayesian Ensembling

(Pearce et al., 2018) make the key observation that the neural network prior is over a random parametrization, as opposed to 0 centered parameters. They note that an ensemble of networks regularized towards 0 tend to collapse to similar solutions and produce over-confident error bounds. They use an ensemble of networks with $L^2$ loss anchored at different random initializations to produce a better approximation of the true posterior. In a small empirical study they find their ensembling method to produce very similar results to Gaussian processes.

The prior for neural networks is random initialization, instead of 0 centered parameters. Using this change, the authors achieve very good error bounds by just using an ensemble of networks.

9. The Laplacian in RL

(Wu et al., 2018) learn an approximation of the state-graph’s Laplacian eigenvectors, and use them to define a better distance heuristic for reward shaping. The authors find an embedding for the smallest eigenvalues by minimizing the distance between neighboring points in state space. Using an example of a maze, the distance in the embedding space would follow the contours of the maze similarly to how water would fill it up. An agent can solve the maze faster if the reward is shaped according to the Laplacian distance to the goal, as opposed to naive Euclidean distance.

The Laplacian distance is a more informative measure of the differences between states. It can be used for better reward shaping, or an aide to exploration.

10. How Does Batch Normalization Help Optimization?

Contrary to the assumption that Batch Normalization reduces internal covariate shift, (Santurkar et al., 2018) show that the beneficial effect of BN is to smooth out the loss landscape. They define a couple of measures for internal covariate shift and empirically show that BN might even increase it. In fact even intentionally adding parameter noise to increase ICS, the accuracy is not affected. The authors investigate further to find that the beneficial effect is smoothing of the loss landscape and making gradients more predictable with respect to changes in parameters along the gradient direction. They theoretically prove that BN reduces the Lipschitz factor of the network’s loss and gradients.

Batch Normalization is a widely used technique in machine learning, but the reason it works is assumed to be reducing internal covariate shift. The authors empirically find the assumption wrong and alternatively propose, with proof, that BN helps by smoothing the loss landscape.

11. Understanding disentangling in β-VAE

(Burgess et al., 2018) explore information encoding in VAE and their successor β-VAE. They relate the objective to the information bottleneck principle, where the β-VAE objective is equivalent to maximizing mutual information between the latent encoding and the task, whilst minimizing information between the input and the encoding. To control the amount of information in the bottleneck they propose to set a target capacity C for the KL divergence and minimize the absolute difference from measured KL and C. They then increase the target capacity and show that the network starts encoding more information. For the dSprites dataset it first encodes only position as it is the most salient feature, then scale, orientation and shape.

β-VAE have produced interesting results, but the amount of information encoded in the latent bottleneck is hard to judge. I believe the biggest contribution of their paper is a method of quantifying and controlling the amount of information required to encode the various features of a dataset.

12. Deep Residual Learning for Image Recognition

(He et al., 2015) introduce residual networks for image recognition. Noticing that very deep networks generally fail to obtain good performance, they propose an alternative architecture comprised of residual blocks. By providing a pass through connection for the inputs, gradients and training are well-behaved at all layers. Residual blocks each add a small amount of processing towards the result. Notably, each residual block is composed of at least 2 layers with a non-linearity in-between to prevent the whole network from being a simple affine transformation.

Residual networks won the COCO 2015 competitions with state of the art results in object detection and image processing. The architecture has since been widely expanded upon as the default method of training very deep networks.

13. Understanding the difficulty of training deep feedforward neural networks

(Glorot & Bengio, 2010) analyse statistical properties of gradients across the layers of a neural network. They propose the normalized initialization to maintain the singular values for the layer Jacobians close to 1. With this modification and using an alternative to sigmoid activations they improve training and observe that gradients are distributed more uniformly across layers.

The paper highlights the importance of good initialization for neural networks. Their initialization scheme and advice of monitoring gradient distributions has been widely adopted across the field.

14. Combined Reinforcement Learning via Abstract Representations

(François-Lavet et al., 2018) integrate model-based and model-free reinforcement learning by using a shared abstract state. The authors split the neural network into an encoding network, an abstract state, and a model-free RL network built on the abstract state. At the same time they train a model-based network to predict rewards and transitions between abstract states. By using the abstract state they greatly reduce the state dimensionality for the model-based algorithm making it easier to train. Additionally it allows for fast planning and interpretation. They experiment with planning by simulating only the abstract state, with interpretability by adding a cosine similarity loss to state vector of interest, and with transfer learning by re-training the encoder on modified inputs.

The authors propose a very useful idea of training a shared embedding for both model-based and model-free reinforcement learning. By using the shared abstract state they improve training time, generalizability and allow for easy manipulation of the state representation.

15. MIT AGI: Building machines that see, learn, and think like people

(Tenenbaum @ MIT AGI, 2018) presents a very good overview of general intelligence. He talks about the key features that make humans intelligent and a rough plan for how we might engineer artificial intelligence. Our eye resolution is low, but we believe we can see the entire room well. Current AI has a lot of difficulty of understanding the world. Image description bots can give a reasonable description, but sometimes miss the point of the image. Humour is super hard to get. The most capable robots, of Boston Dynamics, have nothing to do with intelligence. Even young babies are really good at manipulating objects, and much better than robots. Humans as probabilistic physics simulators. Really good intuition oh how physical states advance and general description of the future.

A good overview of the feats that humans can achieve, current problems with AI, and a path and ideas to achieving general intelligence.

16. Human-level performance in multiplayer games with population-based deep reinforcement learning

The Deepmind team, lead by (Jaderberg et al., 2018), combine a variety of techniques to train a very strong game playing agent. They design a multiplayer game environment and use population based training 1 to obtain a diverse set of agents. Each agent also learn an internal reward function that is a dense proxy for the sparse win/lose reward at the end of a match. Further the agent architecture is comprised of a multi-timescale RNN. With a set of RNN cells updated slowly to encourage long-term learning, and another set updated frequently to fine tune the prior of the slow cells. Finally they visualize the behaviours of the agents by embedding their internal states in 2D using t-SNE 2, prescient of their work on AlphaStar 3.

Although many papers address a specific aspect of reinforcement learning, I believe it’s important to bring together the best techniques to ensure they work well when combined. With this experiment the Deepmind team showcase a proof of concept for their subsequent work on the Starcraft 2 agent 3.

17. Scalable agent alignment via reward modeling: a research direction

(Leike et al., 2018) discuss the challenges in designing intelligent agents that do what we really want them to do. Games are widely used in reinforcement learning research as they are convenient to simulate, while providing a very dynamic environment. However games have a very clear objective of winning or maximising a score, whereas the real world is much more ambiguous. The authors summarize their desiderata for agent alignment as scalable, economic/efficient, as well as pragmatic. The same solution should be applicable on small scale experiments as well as large operations, it should satisfy safety and trust concerns, but not reduce performance, and finally it should be realizable. The authors also analyse a common failure mode where the effective reward function is far from the intended reward, leading to degenerate behaviours in RL agents.

The authors present a philosophical outline of the requirements and challenges of bringing AI agents from the laboratory into the real world. In particular arguing that significant effort should be made to ensure that agents learn what we want them to, and develop mechanisms to correct their behaviour when agents diverge.

18. Curiosity-driven Exploration by Self-supervised Prediction

(Pathak et al., 2017) develop the baseline curiosity implementation (as of early 2019). Although encouraging exploration based on the prediction error of a learned model has been attempted before, results were poor as predicting a high-dimensional state is a very challenging task. The authors design an architecture for an agent to predict only the features that is has control of. Their Internal Curiosity Module (ICM) consists of an embedding network to obtain a low-dimensional state representation. The embedding network is an inverse model trained to predict the action taken between 2 states, thus learning only features the agent can control. They then train a forward model on the embedded vectors and use the prediction error as an additional reward. Because the embedded state only features salient to the agent’s policy, the learnt curiosity reward is robust to noise.

Structured exploration of the environment is a great challenge for reinforcement learning. In this paper the Deepmind team present one of the first solutions that is robust to features outside of the agent’s control.

19. GAN Q-learning

In an innovative paper, (Doan et al., 2018) combine generative adversarial networks and reinforcement learning. They pose the problem as generating optimal action-state pairs and discriminating between optimal and generated states. To approximate optimal, they use a single step of the Bellman operator on the states in the replay buffer. The generator eventually learns to sample actions near the equilibrium Q function, thus becoming an optimal policy. Although both GAN and RL are notoriously unstable, as the authors acknowledge, the combination succeeds in learning policies for CartPole and Acrobot that slightly improve on the DQN baseline.

The authors succeed in posing the reinforcement learning problem in a generator-discriminator framework. A really interesting combination of GAN and RL.

20. Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning

(Frazier-Logue & Hanson, 2018) make the connection between dropout and the older stochastic delta rule. In SDR the weights are sampled from the normal distribution with learnable mean and standard deviation. The mean is updated by gradient descent, where the gradients are computed on the sampled weight values. σ is also updated the same way, but with a separate learning rate and exponential decay to eventually reduce to 0. By considering repeated dropout samples they show that it is equivalent to SDR. They compare the regularization methods on CIFAR-100 and obtain significant improvements in test error.

Dropout has shown several weaknesses in the past years as researchers have tried to analyse it. By re-framing it as the stochastic delta rule the author explains some of the factors that make it work and greatly improve its results.

3D Modelling a Joystick

2019-01-19T00:00:00+00:00

The joystick from your Xbox controller is made of whopping 12 individual parts! To teach myself 3D modelling I’ve copied the design, with painstaking detail, in Fusion 360. Check my model out on thinkgyverse or this animation:

Video of a joystick model.

I’ve modeled the parts to 0.1 mm precision to test out the digital caliper I just bought. As well as to fit it in a spider bot I’m designing. The legs will be connected to the body via the joystick to allow for 2 degrees of freedom as well feedback joint measurements to train a walking neural net.

The joystick also uses a button and 2 potentiometers. Turns out even these guys are a lot more complex than they look:

Video of a button model. Video of a potentiometer model.

Duolingo Chinese

2019-01-05T00:00:00+00:00

After 6 months of practicing Chinese every day I’ve finally finished the course on Duolingo. Now with a lexicon of 1800 Chinese characters I can proudly say I’ve reached the proficiency of a 1st grader.

Cell Membrane Simulation

2018-12-01T00:00:00+00:00

I simulated the behaviour of cell membranes for a research collaboration with my wife, Catalina Spatarelu, and her colleague Dung Nguyen. It’s written in Observable using a literate programming style; that is prose, code and visualisation intermixed. Visualisations are all done using d3 and the physics interactions are optimised using a quadtree. The goal of the project is to analyse the jamming/unjamming transition and a poster has been accepted for presentation at CMBE 2019.

Video of the cell simulation.

Check out the live simulation and play with the controls!

It’s a Stick-up (part 1)

2018-10-01T00:00:00+00:00

My first project on my quest to build a walking robot using neural nets. It’s my physical build of the classic pendulum control problem. The goal is to swing a stick and keep it balanced so it stays upright. Turns out, reinforcement learning is pretty hard, so I can’t show a working version yet. The build I have so far is a Raspberry Pi connected to 4 servos through control board I programmed. There’s a string tied to the servos on which a stick is held and on the stick there’s a gyroscope. The Raspberry Pi connects through websockets to my computer and I sample the measurements and output the servo positions. Hopefully an actor-critic algorithm will control it soon, but for now, here’s me fiddling with it.

Video of my pendulum setup.

PS. Feel free to use my code for the async interface for the servo and gyro, but beware that setting up the Raspberry PI is a bit tricky.

Bonus pulse width modulation through a shift register.

Video of rolling LED effect controlled by shift register.

Faces GAN

2017-06-05T00:00:00+00:00

For the final project of Udacity’s Deep Learning course I built a faces generator using a generative adversarial network. The gist of this approach is to build two competing neural networks, one to generate fake images and one to detect whether an image is fake or real. By training both networks in parallel we eventually end up with a generator network that produces images that look just like our dataset; and we discard the detector network. For this project we used the CelebA dataset of celebrity pictures so the network learned to generate ~~celebrity~~ creepy faces.

Older Projects

2011-06-01T00:00:00+00:00

Single page animations and games.