Prof. Sander Klous started out in mechanical engineering, initially wanting to become a car mechanic. Yet, he was quickly drawn to analytics and control system engineering, leading to an internship at Balast Nedam (Saudi Arabia), and a second internship at Nikhef, designing one of the cryogenic elements for the CERN accelerator (Geneva). At first he worked for the design department, but found himself being drawn towards modelling and attaining a deeper understanding of physics. This led him to a degree in Physics, and a PhD position in the LHCb experiment at CERN. He was involved in research on the difference between matter and anti-matter. After acquiring his PhD degree he joined the ATLAS experiment and worked on building the trigger and data acquisition system, in search of the Higgs Boson particle. This was discovered by ATLAS and CMS in 2012 and resulted in the Nobel prize for Higgs and Englert in 2013. Klous is one of the 3.500 authors of the publication on the discovery of the particle in the ATLAS experiment.
After his time at CERN, Klous decided to make a career switch and moved to a job at KPMG. He was asked to start a new team of data scientists that would make use of the knowledge and skills he had acquired while doing the experiments at CERN. At the time big data was an up-and-coming field in information management and many of the techniques that were used at CERN were also of great value to a company such as KPMG. Data Processing at CERN can be seen as the Champions League of data processing - the accelerator produces 60 Tera Byte per second, which has to be brought down in real-time to 300 MB per second and distributed to several data centres. The same basic toolkits can be applied to data questions in areas such as financial risk analysis, the maintenance of machinery, planning optimisation and consumer behaviour.
‘A good understanding of the basics is still essential, but large amounts of fixed knowledge are less important nowadays, as we can look up everything in an instant. A future data scientist needs the ability to make new connections, critically assess ongoing projects, and understand what level of certainty is required in a given data project. Sometimes a relatively large margin of error is perfectly fine, other times being very precise is essential.
And there isn’t just one type of data scientist that businesses or other organisations need in order to be successful. Instead, it’s about making sure all the relevant qualities are sufficiently present in the team as a whole. This goes for both the professional skills and personal attributes. For every creative person you need a realist, and for every disorganised person you need someone who can help everyone stick to a tight schedule. Yet, there are two things everyone needs: curiosity and decent communication skills. Curiosity speaks for itself, but I also really want to emphasize the ability to communicate, as you will often need to speak with both business-minded people and the hard core scientists in the organisation. You need to be comfortable with both frames of references and be able to translate their interests both ways.’
‘A Master’s degree is really just the beginning. During your education, most of the problems you have to solve are relatively clearly defined, and the provided data is tailored to the problem. After graduating you’ll discover that data sets in real life are a lot messier.
What also came as a surprise when transferring from science to the business world is the difference in time frames you have to work with. Where the schedule for large scientific projects can easily span multiple years, a pilot project for a small business may have to show results in as little as 8 weeks. If you’d asked me whether this was possible 10 years ago, I would have said no, but both the technology and the way we work has changed so dramatically that all sorts of time frames are possible.’
‘In the most immediate future, the data science field will likely have a shortage of people who can quickly translate large amount of data to usable and valuable information, for example, by making use of visualisation tools. Then, a little further down the road, I expect we will need experts who can build fully automated systems - the best known example being self-driving cars. I think these advanced systems will quickly become dominant in our society.
In the more distant future I can see the progress made in deep learning and artificial intelligence come to fruition. One day, these self-learning systems may even take over some of the tasks that now require expertly trained data scientists.
The developments in all of these stages demand that data scientists know their responsibilities when it comes to shaping society. They don’t only solve data problems, but will also affect how people get their information, move through public space and behave on a day to day basis. Many developers are now still unaware of the impact they will increasingly have. It is important that anyone who wishes to become a data scientists is aware that these dimensions come with the job as well.