Imagine a friend asks you to meet him at a particular location but doesn’t give you the complete address. Worse, imagine if he gives you a wrong one. How about solving a math problem where the given set of data leads you to an absurd answer, like a probability greater than one! The key message here is that complete and authentic “data” or information is important.
|Ignorance regarding data quality|
Data mining & analytics are getting popular by the day and many enterprises are relying upon it for lean operations and decision making. In all the hype and buzz about “big data” and the magic it can do, what ends up being left out of discussions is availability of “high quality data”. Quality evangelist Joseph Juran defines high quality data as one which correctly represents its real world construct and is fit for its intended purpose in business operations, planning and decision making.
In this article we will focus on the challenges in uncovering meaningful operational insights from wind turbine and solar PV SCADA data. A 2 MW wind turbine or a 5 MW solar PV plant would have close to 200 sensors with a sub second refresh rate. Both these renewable entities fall under the category of IoT devices generating high volume data. This data is consolidated, time stamped and logged by the SCADA in 5 or 10 minute average values.
Data from these renewable assets can be scrutinized using modern day statistical tools for performance evaluation, anomaly detection and monitoring their condition and health. To effectively do this we need the SCADA data to be of high fidelity. But that is hardly the case.
Scores of times there are missing values or blanks in the data set which are either randomly distributed or patterned or sometimes in chunks. Reasons for missing values include the following-
Other issues plaguing SCADA data are incorrect values. Depending on the case these are harder to detect. Different factors cause incorrect values to be logged-
|Missing & degraded data|
Data corruption can also result from software issues. Some reasons could be incorrect formatting of data or loss of some data due to internal memory issues. Typically in systems, some pre-processing is also done on data before storage. Any faults in the processing logic can result in storage of incorrect data. Imagine a simple glitch in the software code like an incorrect formula applied for conversion of data from one unit to another. It can cause many decimal points of difference in the stored value leading to erroneous results during analysis.
Effect on analytics:
Low quality and missing data are a bane for data analysts. There are numerous repercussion of analyzing wrong data apart from wastage of time and effort
|A view of missing data in Algo Engines|
I have attempted here to give you a glimpse of the challenges posed by low quality SCADA data to analytics in the wind and solar energy space. In any case, if you have constraints that restrict you from improving hardware level components for better data, there are some “Data cleaning” techniques to improve the quality of collected data…Stay tuned to our blog to find out more…