Manufacturing misconceptions: The difficulties of tackling big data analysis

The advent of Industry 4.0 offers huge potential to explore new and exciting changes to the manufacturing floor, comments Dirk Ortloff, Department Manager, camLine

Intelligent machines capable of “speaking” to one another and collating a myriad of complex data promises huge improvements in productivity … and fundamental changes to the ways in which we view manufacturing efficiency.

However, the wealth of data available to manufacturing organisations is growing larger by the year, increasing the complexity of its analysis. Industrial equipment is becoming capable of storing and sharing types of data that were previously impossible, such as the capture of vibration data to contribute towards wear analysis, in increasingly intimidating volumes.

With the speed of development inherent in Industry 4.0 and the sheer volume of data at hand, many manufacturing organisations simply don’t have the know-how to handle big data storage and analysis. Facing data in more formats and higher volumes than ever before, it’s no surprise that they can be overwhelmed; it’s easy to miss the wood for the trees and fail to take full of advantage of the resources to hand.

To avoid missing out on the benefits of appropriate analysis, manufacturers are increasingly looking to in-vogue data analysis techniques to benefit from the most up-to-date procedures.

In-vogue inaccuracies

For example, it’s common for manufacturers to begin with a “data lake” to analyse all of the available data at once. On the surface, the logic is sound; the more data in your analysis, the more insight you can potentially receive. If you consider everything, you don’t omit a crucial outlier or an interesting correlation.

However, this is going to lead to performance issues. Larger data sets take far longer to analyse, especially if online analysis is part of the remit. A company in high-volume manufacturing may produce millions of units in the time it takes to analyse their operational data, only to discover that their processes are far less cost-effective then they thought. This can have a huge impact on the company’s cost margins and will reflect poorly on those responsible.

However, if a data lake approach fails to deliver the desired benefits, we often see people turn to a range of so-called cutting-edge techniques that threaten similar drawbacks if not deployed correctly. As trends in analytics come to the fore, promising results and new ideas can make people overexcited. But it’s easy to apply them inappropriately and end up with unusable, inefficient or misleading results.

For instance, if the data lake approach fails to work, many opt for the polar opposite: a “gathering and systemising” approach. This involves merging as many data bins as possible with a very strong emphasis towards systemising them — with data analysis only beginning once the bins have been systemised.

There’s a serious risk of falling off the other side of the horse here. In many cases, the systemisation doesn’t end, meaning that the data can’t be analysed. This makes it impossible to secure a quick win, with many organisations racking up high costs with no tangible benefit.

Another mistake that many make is opting to conduct data searches without a specific target. This inexpensive technique will select a bulk of data and use neural networks to search for anything interesting — standout results, repeating sequences, particular correlations, etc. This is often performed inexpensively by a trainee.

Without an appropriate data set, this will often lead to unsatisfactory results. It’s difficult to glean any valuable insight without a clearly defined goal; as the process tends to do away with method selection, the results are often far below expectations.

Determining direction

This all demonstrates how unwise it is for companies to commit to in-vogue analytics trends without a serious appraisal of use cases, methodology and the options available to them. It’s understandable to look at successful examples when attempting to find a solution, but data is far more complicated than that.

Attempting to emulate others without a grounding in the logic behind their decision will do more harm than good, particularly when it comes to adding value and cost-benefit ratios.

This is being recognised by even the highest powers, who are investing in education and data analysis applications. One example is the PRO-OPT research and development project, funded by the Federal Ministry for Economic Affairs and Energy of Germany. The PRO-OPT project looked to help companies operating in “smart ecosystems.”

These ecosystems are immensely complex. Modern companies generating huge volumes of data will almost always have infrastructures spanning countries or even continents, as well as external partners to consider. With companies looking to outsource services such as manufacturing specialist parts, original equipment manufacturers (OEMs) are an example of specialist consultants who will invariably have a complex data infrastructure themselves, further complicating analysis.

Companies without experience in high-volume data analysis will find it extremely difficult to collate all of this data and properly investigate it. PRO-OPT aims to educate and support companies in the analysis of these huge volumes of data. The importance of its mission was recognised by backing from major German corporations including Audi and Fraunhofer IESE.

To examine one PRO-OPT use case, the project tested a wide variety of production data modelling approaches on the data of a leading automotive supplier. This exercise attempted to identify and demonstrate

  • the difficulties of systematically merging different data buckets
  • the possible modelling of the data in databases that are specifically designed to help analysts tackle large sets of distributed data
  • the actual analysis of these large data collections.

Success would mean being able to apply and compare statistically reliable analyses and classification procedures, as well as new procedures from AI instruments.

Sticking up the data bank

This use case stands out as it clarifies the challenges that companies with limited expertise in data analytics face. Without comprehensive preparation, an awareness of the options available and experience of executing them, organisations are inevitably going to hit unexpected roadblocks.

These can start before the process even begins. Securing a tool that could analyse and manipulate data is obviously very important; new technologies, or means of analysing data, have exciting potential. But when you have a new hammer, it’s easy to forget that some things aren’t nails. It’s crucial not to overlook reliable means of exercising control over the data you have.

Statistical process control (SPC) is a tried-and-tested means of doing so. Defined as “the use of statistical techniques to control a process or production method,” SPC was pioneered in the 1920s. The modern iteration is a technology that offers huge synergy with new data analytic techniques.

The ability to make vital production data available ad hoc, for example, or to automate actions to take place in real-time when certain parameters are met, make SPC an incredibly powerful tool through which to interact with data.

To get the most out of an SPC system, and to allow it to analyse and action changes based on data, it needs results to be loaded into a specialised database. The complex datasets required will often have thousands of variables, all of which need meaningful column names. Many databases don’t have the number of columns needed or a limit on the names you can give these columns — so how do you stop this seriously limiting your analysis capabilities?

Once the analysis is under way, does your organisation have the time and schedule to make it cost-effective? Most SPC solutions operate offline, analysing data retrospectively; only the most modern solutions are able to analyse online, in real-time. How do you balance the analysis you need with the manufacturing output that you require?

Beyond this, even if you can employ a database that can handle the volume of data you are working with and have completed the analysis process, data needs to be presented in a digestible way. Datasets with millions of data points can’t be displayed in conventional XY scatter points as they’ll almost completely fill the canvas — even transparent data points aren’t of any use. How do you go about translating a blob of X million data points into actionable insights?

These are just examples of the tip-of-the-iceberg-level thinking required to perform effective analysis, which goes to show just how careful analysts need to be. Without a considered roadmap of the journey that the data needs to take, and how analysts will both identify the data they need and then break down the results, it is all too easy to fail.

However, with the right equipment and ethos, analysis that used to be inefficient owing to the sheer volume of data can offer completely new insights into production data.