Taleb never ceases to fascinate me; around 42 to 52 minutes in the Q&A session he covers the precautionary principles that needs to be followed with Big Data:
Big Data inevitably produces "unwanted" correlations (similar to concept of confirmation bias, echo chambers and you will get inevitably what you are looking for and more which translates to bull shit). NSA has the most robust model to handle Big Data since their search criteria is simple - Is the candidate a terrorist - yes or no?
The idea is to first structure what you are looking for in a simplified format and develop a strong stomach not to distracted by unwanted correlations. Again, develop a stomach to drop variables and focus on the problem at hand.
Higher the numbers of variables, higher the noise and correlations grows non-linearly and shoots up.
This is a simple but yet powerful concept in machine learning we tend to forget.
This whole speech covers Big Data:
Big Data inevitably produces "unwanted" correlations (similar to concept of confirmation bias, echo chambers and you will get inevitably what you are looking for and more which translates to bull shit). NSA has the most robust model to handle Big Data since their search criteria is simple - Is the candidate a terrorist - yes or no?
The idea is to first structure what you are looking for in a simplified format and develop a strong stomach not to distracted by unwanted correlations. Again, develop a stomach to drop variables and focus on the problem at hand.
Higher the numbers of variables, higher the noise and correlations grows non-linearly and shoots up.
This is a simple but yet powerful concept in machine learning we tend to forget.
This whole speech covers Big Data:
No comments:
Post a Comment