Big data: quality over quantity

Written by Marlène Verburg

These days, you hear a lot about big data, also in the manufacturing industry. Tremendous use cases and benefits are published by consultancy firms like Capgemini and McKinsey and large IT vendors like Infor and SAP. For researchers, this is also an interesting topic [1] [2] [3]. There is no discussion possible that big data can do tremendous good for a lot of sectors, including industry. However, up to now, a lot of companies are not benefitting from the cost savings and process optimizations as much as promised. There are many possible explanations for this. This blog focusses on a very relevant, but often underestimated reason: the quality of the data.

Why is quality so important?

Quality is a subjective term. You can define KPIs to measure it, but before you can do that, you need to know what good quality is. Often, the assumption about big data is that if you collect lots of data, there will be interesting information in there. Of course, this is correct in the sense that there is relevant information in there. But imagine you have to assemble a machine and there are thousands of parts lying there and you have no idea which one you need. Also, there are some critical parts missing. You will be unable to assemble the machine. The same goes for data. It is not about collecting all possible data, it is about collecting the right data to help you answer your business questions. It is also about making sure that you have all the relevant data to answer these business questions.

What is qualified data?

So, the quality of that data is determined by what you need from the data. It starts with a business question, and not the technique or the potential data that can be collected. If your goal is to get more insights in what your customers want, you want to look into data that is collected from your website and sales processes. If you want to execute repairs remotely, you want to collect data about how the machine is used when production was done and what happened before it went into an error state. Even at this level of detail, it can be hard to determine what data you need. You easily collect too much irrelevant data, and on the other hand, you may not be collecting all the data that is needed to answer your questions. It is a thin line. You can easily draw wrong conclusions and confuse cause and effect. However, if you carefully consider why you are doing it and what factors potentially explain your answer, you have a much greater chance of success.


Does this mean that big data does not need to be big? No, big data is still relevant. From months of measured operating time, you can learn a lot more than from one hour, since you are less subject to coincidence. But you can still learn a lot more from small amounts of relevant data, than you can learn from large amounts of irrelevant data. Big data is not about collecting a lot of different information, but about measuring the right things. So, start with your business questions and what you want to learn from the data. Eventually, this will lead to the promised benefits!


[1] J. Lee, E. Lapira, B. Bagheri and H. Kao, “Recent advances and trends in predictive manufacturing systems in big data environment,” Manufacturing letters, pp. 38-41, 2013.

[2] J. Lee, H. Kao and S. Yang, “Service Innovation and Smart Analytics for Industry 4.0 and Big Data Environment,” in 6th CIRP Conference on Industrial Product-Service Systems, 2014.

[3] P. O’Donovan, K. Leahy, B. K. and D. O’Sullivan, “Big data in manufacturing: a systematic mapping study,” Journal of Big Data, vol. 2, 2015.

Hoe kunnen wij jou helpen?

Neem vrijblijvend contact op om te kijken wat we voor jou kunnen betekenen

Ook een toekomstbestendig en competitief bedrijf?