To the Greeks, the word "character" first referred to the stamp upon a coin. By extension, man was the coin, and the character trait was the stamp imprinted upon him. To them, that trait, for example bravery, was a share of something all mankind had, rather than means of distinguishing one from the whole. - Edith Hamilton, The Greek Way
Deep reinforcement learning is surrounded by mountains and mountains of hype. And for good reasons! Reinforcement learning is an incredibly general paradigm, and in principle, a robust and performant RL system should be great at everything. Merging this paradigm with the empirical power of deep learning is an obvious fit. Deep RL is one of the closest things that looks anything like AGI, and that’s the kind of dream that fuels billions of dollars of funding.
Unfortunately, it doesn’t really work yet.
Now, I believe it can work. If I didn’t believe in reinforcement learning, I wouldn’t be working on it. But there are a lot of problems in the way, many of which feel fundamentally difficult. The beautiful demos of learned agents hide all the blood, sweat, and tears that go into creating them.
Several times now, I’ve seen people get lured by recent work. They try deep reinforcement learning for the first time, and without fail, they underestimate deep RL’s difficulties. Without fail, the “toy problem” is not as easy as it looks. And without fail, the field destroys them a few times, until they learn how to set realistic research expectations.
This isn’t the fault of anyone in particular. It’s more of a systemic problem. It’s easy to write a story around a positive result. It’s hard to do the same for negative ones. The problem is that the negative ones are the ones that researchers run into the most often. In some ways, the negative cases are actually more important than the positives.
The rule-of-thumb is that except in rare cases, domain-specific algorithms work faster and better than reinforcement learning. This isn’t a problem if you’re doing deep RL for deep RL’s sake, but I personally find it frustrating when I compare RL’s performance to, well, anything else. One reason I liked AlphaGo so much was because it was an unambiguous win for deep RL, and that doesn’t happen very often.
This makes it harder for me to explain to laypeople why my problems are cool and hard and interesting, because they often don’t have the context or experience to appreciate why they’re hard. There’s an explanation gap between what people think deep RL can do, and what it can really do. I’m working in robotics right now. Consider the company most people think of when you mention robotics: Boston Dynamics.
This doesn’t use reinforcement learning. I’ve had a few conversations where people thought it used RL, but it doesn’t. If you look up research papers from the group, you find papers mentioning time-varying LQR, QP solvers, and convex optimization. In other words, they mostly apply classical robotics techniques. Turns out those classical techniques can work pretty well, when you apply them right.
People speak sometimes about the "bestial" cruelty of man, but that is terribly unjust and offensive to beasts, no animal could ever be so cruel as a man, so artfully, so artistically cruel. - Fyodor Dostoyevsky
In modern physics, there is no such thing as "nothing." Even in a perfect vacuum, pairs of virtual particles are constantly being created and destroyed. The existence of these particles is no mathematical fiction. Though they cannot be directly observed, the effects they create are quite real. The assumption that they exist leads to predictions that have been confirmed by experiment to a high degree of accuracy. - Richard Morris
Animals are more than ever a test of our character, of mankind's capacity for empathy and for decent, honorable conduct and faithful stewardship. We are called to treat them with kindness, not because they have rights or power or some claim to equality, but in a sense because they don't; because they all stand unequal and powerless before us. - Matthew Scully