So much has been said about using machine learning to predict failures, while so little is actually used in practice. The number of papers, research articles, presentations and videos on predicting failure surely qualify for VERY BIG DATA.
Over the last three years, we have been building out an analytics solution for wind, solar and hydro generation assets. Over a year ago, we set out to deliver predictions on impending failures from big energy generation data. As we interact with prospects and customers, here are a set of questions that often come to us – some for the tech folks and some from the business folks. Below are a broad category of interactions that we often get dragged into. By getting to know these common traps, each of us have to figure our way out of them and get to answering the real questions and moving towards the real problems..
We have had clients telling us to use Hadoop, Cassandra, NoSQL, Spark, Pig, H2O and a bunch of other interesting tech stuff rather than asking us how we can analyze or identify a gearbox failure. Discussions start either about historians that handle sub second data or about NoSQL, MongoDB or elastic search that can handle unstructured data. After the jargons have filled the room, we then start by telling them that most data related to machines is structured. Unstructured data is limited to service reports, tickets/issues, manuals etc. Most of the common data sources support SQL and with help of a few time series functions we should move to extracting value from that data rather than the pursuit of jargons. We note down the requirements with regards to scale / scalability and show that most elements of our technical stack offer 10x if not 100x more scale than we currently operate at and hence we are unlikely to hit the barriers anytime soon. We usually consider ourselves lucky if we have been able to move the discussion to the business goals without hurting egos.
Big data and BIG DATA
The nature of machine data is such that reaching a terra byte is a cake walk (sample data of one sensor say voltage level every milli second and you should be able to reach a terra byte in a few weeks). However, the problem starts when the goal is to fill a terra byte of data rather than extract a kilo byte of useful information. So sampling rates can be changed from milli seconds to seconds to minutes or hours, based on where we see information. We have seen that data at lower than one minute or ten minutes is more an exercise in showing off data size than the ability to extract insight. So at times we wonder which approach can move the discussion forward. We could try to start a discussion on frequency domain analysis or (sampling rates of lower than 1 milli seconds) to really take the focus into the BIGGER DATA realm, but that does not really solve the problems that we can finish in the ten minute or one hour realm. When we are stuck in this BIG DATA problem, we often convert this into an opportunity to upsell by providing data storage and visualization at lower frequency levels and letting client take a call. As someone said “If you can’t beat them, join them. ”
UI is everything
The next big trap we enter into is that user interface – look and feel is everything. No UI is the future of UI, but that discussion would turn ugly, so let’s discuss what level of UI makes the cut. Let’s take the example of the control room display for a client having 400 assets (turbines and inverters) to monitor. With X amount of real estate in the control room, to display data on – we agree that 50% of display must provide an overview of current status of 400 assets. The remaining 50% of real estate should be devoted to incident based display. Now that we have a plan, we figure out a way to communicate the status of 400 assets based on review by designers and end users. By the time you provide this UI, the control room operator says he wants an email or sms when status change happens, so that he can initiate action rather than look at the control room screen. This is a very valid requirement, as action cannot be taken based on 400 very pretty pixels on a screen. So here we go initiating a small workflow that send out an email / sms to control room operator and site staff and overtime the control room finds more merit is following only the status changes and tracking the workflow. However, the 400 asset display along with maps and 3D models of wind turbines have been created to make the UI impressive. So where should we focus – UI or no UI, it’s another fine balance.
In the next post, we will look at a few other traps as we discuss our solution offering. The problems titled my data on the Cloud could walk away and we want the MODEL are another view point on selling cloud services and machine learning.