Studying the Effect of Class Imbalance in Ocean Turbine Fault Data on Reliable State Detection


J. Duhaney, T. Koshgoftaar, and A. Napolitano – 11th International Conference on Machine Learning and Applications (ICMLA), December, 2012

Abstract

Class imbalance is prevalent in many real world datasets. It occurs when there are significantly fewer examples in one or more classes in a dataset compared to the number of instances in the remaining classes. When trained on highly imbalanced datasets, traditional machine learning techniques can often simply ignore the minority class(es) and label all instances as being of the majority class to maximize accuracy. This problem has been studied in many domains but there is little or no research related to the effect of class imbalance in fault data for condition monitoring of an ocean turbine. This study makes the first efforts in bridging that gap by providing insight into how class imbalance in vibration data can impact a learner’s ability to reliably identify changes in the ocean turbine’s operational state. To do so, we empirically evaluate the performances of three popular, but very different, machine learning algorithms when trained on four datasets with varying class distributions (one balanced and three imbalanced) to distinguish between a normal and an abnormal state. All data used in this study were collected from the testbed for an ocean turbine and were under sampled to simulate the different levels of imbalance. We find here, as in other domains, that the three learners seemed to suffer overall when trained on data with a highly skewed class distribution (with 0.1% examples in a faulty/abnormal state while the remaining 99.9% were captured in a normal operational state). It was noted, however, that the Logistic Regression and Decision Tree classifiers performed better when only 5% of the total number of examples were representative of an abnormal state (the remaining 95% therefore indicating normal operation) than they did when there was no imbalance present.

Link

Advertisements

Leave a comment

Filed under other Analytics

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s