Thursday, December 18, 2014

Machine Learning - The High-Interest Credit Card of Technical Debt

Every ML expert must learn a thing or two from Daniel Kahneman's research. A very important paper, read the whole thing - here:

Abstract:

Machine learning offers a fantastically powerful toolkit for building complex sys- tems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is re- markably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several ma- chine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns.

[---]

Traditional software engineering practice has shown that strong abstraction boundaries using en- capsulation and modular design help create maintainable code in which it is easy to make isolated changes and improvements. Strict abstraction boundaries help express the invariants and logical consistency of the information inputs and outputs from an given component.

Unfortunately, it is difficult to enforce strict abstraction boundaries for machine learning systems by requiring these systems to adhere to specific intended behavior. Indeed, arguably the most im- portant reason for using a machine learning system is precisely that the desired behavior cannot be effectively implemented in software logic without dependency on external data. There is little way to separate abstract behavioral invariants from quirks of data. The resulting erosion of boundaries can cause significant increases in technical debt. In this section we look at several issues of this form.


No comments: