I have been looking into many work-orders looking for failures or anomalies, but at this point I’m not sure to take the most of the shared information, as example I’ll take the two following events:
The work order 6516382362467974 has the following details available:
Description: Exterior leakage - liquid for the main process
Shutdown: Revision (full shutdown)
Planned Start: 2014-09-11
I also plotted some of the signals from near the planned start, as shown in the below image:
I have a few questions about the first work order:
It seems that the compressor turned off about two weeks before the planned start time (As the green dotted line shows) is it safe to assume that is discrepancy is because the two-week delay in sharing the data?
Is there a set of signals that I can look into for this work-order, I’m assuming that since there is a leakage, the time series related to lube oil could be affected, or maybe the pressure of the compressor.
The second work order is 5486503211601545, which has the following details available:
Description: There is *** too much vibration *** *** to electricity. Engine for *** step ***. Pipes must ***. Ref *** ***.
Shutdown: During operation
Planned Start: 2013-10-03
As for this one, there are a few work-orders more that mention high vibration values, my question is, does it make sense to look into the vibration time series like 1stStg Journ BRG DE? Or is this referring to a different source of vibrations?
Besides these two events, I also have a more general doubt regarding the feasibility of doing predictive maintenance from the dataset. The reason for this is that although there is a large volume of data shared, the failure events (if any) are very scarce (which is expected from this sort of process) I mean, is not like the motor bearings have broke down 10 or 20 times and I can make a classification model detect the problem on an early stage. Is this approach of training a classification model feasible for this sort of data or is it a naïve approach?
It seems that the best approach for this dataset is to work on condition monitoring: estimate the wearing of the assets as mentioned in one of the courses shared by Cognite. Now the tricky part is that it even linking some of the assets with maintenance events to reset the wearing counter, I'm not sure what to do with those, since I don't have a reference for the expected lifespan of the asset.
Best answer by Stig Harald GustavsenView original
@Luis Ramon Ramirez Rodriguez, I agree with alot of what you are saying.
One of the biggest issues we have in using supervised machine learning approches to industrial data is having data from failuremodes, where both the failure mode and time segment is speficied. and this is not that easy. the work orders are extracted from an ERP system, where people have both notified and planned a WorkOrder, and do not nessaserily represent the reality as in the SCADA systems or even the physical systems, when it comes to classifying the time segments. Additionaly the lack of failure data is also something we don’t yet have significant ammounts of. especially the data from Valhall, cause the Valhall assett have very high production reliability, and failures on bigger gas processing machinery is a rare occurance. but having some data to augment / feature engineer the data with can potentially be helpfull, or having dimentionaly independant datasets / models can also benefit cases such as this. But this is where we need human creativity.
@Stig Harald Gustavsen
Data augmentation seems like an interesting approach, on the top of my head I can think of two approaches for doing this:
The simpler one would be to edit the signals at some points to add values outside of the normal range, e.g. too high temperatures or changes in frequency depending on what kinds of algorithms are being tested. Ideally, this would allow training models to detect anomalies or failures, but hardly would be useful for predictive maintenance.
The second one would be to simulate not only the anomalies themselves but how the system would start to degrade over time based on physical models, which would be valuable for predictive maintenance but also requires a deep understanding of the system that are been modeled, in which case ML knowledge would not be enough but subject matter expertise is required.
Does that make sense to you?
I totally agree, especially with the second option, with a domain boosted computational science approch, that is near and dear to my heart. aswell as providing newer insight and with this we can augment the data closer for a more actionable insight and create a more factfull culture around industrial data, with not just realtime perspectives but how things behave over time. Thats also the powers of using CDF with industrial data, it democratizes access to industrial data, through open source SDKs and easy to use open API’s, giving a new creative domain to programatically wrangle industrial data 😍