What is datascience and what do data scientists do? There are numerous different descriptions on what constitutes a data scientist. The definition that is most accessible to me is the following Venn Diagram from Dre Conway:
Regarding the level of expertise opinions range from having to have a PhD in math, machine learning, stats or something similar, to “just” being able to know how a regression analysis works. As a company I would be really careful about hiring a top data scientist with a PhD who is an absolute expert in all of the above mentioned areas, a jack of all trades so to speak. First of all, recruiting and employing someone like this is really costly, because this breed of people is very rare. You will find someone with excellent math and hacking skills, but then business acumen and skills might be lacking. Secondly, such an outstanding person will most likely develop into one of the most important colleagues in the whole company and act as a swirl, sucking in all kinds of business processes into her “spehre of influence”. That’s good, as long as everything is running smoothly. But if this data scientist is acting as a “data princeling”, a gatekeeper, not letting anyone else besides her team onto the data, and asking for a significant pay raise, then things get complicated. I’d rather have a team of colleagues who have an expertise in "only" two of the above mentioned areas, but I am sure processes are backed up and access to data is available to everyone in the company.
The below interview with DJ Patil on being data driven and data scientists is a must-see. I love the part when he’s talking about being datasmart means to be streetsmart: Instead of throwing all your AI and machine learning algorithms at one problem, sometimes using a simple heuristic might be a smarter and more efficient way to solve a problem. In Patil’s opinion being a good data scientist is more about personality than about specific skills.
For those interested in the topic, here are a couple of useful links: