Automotive Predictive Maintenance Application

Abstract

What if your car could warn you before something goes wrong? In this project, we used real car sensor data and machine learning to predict engine problems early. We cleaned messy data, tested different models, and built a simple mobile app that gives drivers clear and helpful alerts about their car’s health.

Designing the Object Model: Engineering the System

Every robust system starts with thoughtful design. In our project, that design took the form of an object-oriented model that mirrors how data flows in real vehicles. Before diving into machine learning, we focused on building a structured pipeline that could handle inputs, make predictions, and respond to user feedback.

We designed our object model around several core classes: Engine, Car, Sensor, and Predictor. The Sensor class was responsible for reading key engine parameters such as RPM, coolant temperature and pressure, oil temperature and pressure, and fuel pressure. These readings served as the input for our prediction system.

The Predictor class held the trained machine learning model. It accepted input from the sensors (in the form of VehicleData instances from the Dataset), handled internal pre-processing, and returned a health prediction. This modular approach ensured that if we later updated the pre-processing method or replaced the model, the rest of the system would continue working without changes.

We also modeled interactions with the car owner. A maintenance scheduler monitored prediction results and sent notifications to the user. It allowed users to accept or reject suggested maintenance times and updated the app interface based on their choices. This connection between backend logic and user interface made the system responsive, flexible, and user-friendly.

Figure 1: Object-oriented model of the predictive maintenance system

Dirty Data: Preprocessing for Prediction

Real world data is rarely clean, and our automotive dataset was no exception. With over 19,000 engine samples, we encountered the usual suspects: outliers, skewed distributions, and uneven scales. Left untreated, these issues would have seriously undermined the performance of any predictive model.

To tackle this, we began with outlier analysis using the Interquartile Range (IQR) method. For each feature - RPM, coolant temperature, etc. we computed the 25th (Q1) and 75th (Q3) percentiles. Any value falling below Q1 - 1.5×IQR or above Q3 + 1.5×IQR was flagged as an outlier. These extreme values were often the result of sensor errors or unrealistic operating conditions and were subsequently removed from the dataset.

But even after cleaning, we faced another challenge: feature scale imbalance. Some sensor readings spanned thousands of units (like RPM), while others were constrained to smaller ranges (like coolant temperature). To ensure no single feature dominated the model's learning process, we applied MinMax scaling using the MinMaxScaler from scikit-learn. This transformed all features to a 0 - 1 range, giving each equal weight in training.

This two-step pre-processing strategy—IQR-based outlier removal followed by MinMax normalization—formed the bedrock of our data pipeline. It leveled the playing field, allowing our models to detect meaningful patterns without being skewed by noisy or overpowered inputs.

The link to the dataset in Kaggle.

Figure 2: Raw data before preprocessing

Figure 3: Data after outlier removal and MinMax scaling

Machine Learning Methods: Finding the Right Fit

With clean data in hand, we turned our focus to modeling. But we didn’t want to rely on a single one. Instead, we used a comparative approach, testing six different machine learning classifiers; each with distinct strengths, weaknesses.

We began with Logistic Regression, our baseline model. It was fast, interpretable, and surprisingly competitive: achieving a training accuracy of 66.74% and a testing accuracy of 65.91%. Its simplicity made it a great diagnostic tool for spotting linearly separable relationships.

Next came the Decision Tree Classifier, which learned nonlinear patterns by recursively splitting features. It scored slightly better in training (68.95%) but dipped in testing (64.59%), a sign of mild overfitting. To address this, we tried the Random Forest Classifier, an ensemble of decision trees that averaged their outputs to reduce variance. This model achieved the highest training accuracy (79.33%) but saw only modest gains in testing (65.64%).

We also tested the k-Nearest Neighbors (k-NN) algorithm. While intuitive and simple, it struggled with the randomness in our data—overfitting to training samples (75.28%) but generalizing poorly (62.21%).

Interestingly, Gaussian Naive Bayes became the strongest model in our project. It assumed feature independence (which was not strictly true), but its probabilistic nature and low complexity gave it consistent performance: 67.09% training accuracy and 66.11% testing accuracy.

Finally, the Support Vector Machine (SVM) showed disappointing results, peaking at 64.88% training and 60.77% testing. Its sensitivity to hyperparameters like C and gamma, coupled with the high dimensionality of our feature space, likely hindered its performance.

In summary, every model taught us something about the data, about our assumptions, and about the tradeoff between complexity and generalization.

Model	Training Accuracy	Testing Accuracy
Logistic Regression	66.74%	65.91%
Decision Tree Classifier	68.95%	64.59%
Random Forest Classifier	79.33%	65.64%
K-Neighbors Classifier	75.28%	62.21%
Gaussian Naive-Bayes	67.09%	66.11%
Support Vector Machine	64.88%	60.77%

Table 1: Training and testing accuracy of each machine learning model.

While our modeling pipeline was sound, the results left room for improvement. Our top performing model (Gaussian Naive Bayes) topped out at 66.11% testing accuracy; respectable, but far from industry grade. Understanding this plateau became a project of its own.

One issue was data separability. Even after scaling, the difference between “healthy” and “faulty” classes was often subtle. Many samples had overlapping sensor profiles, suggesting either labeling inconsistencies or intrinsic ambiguity in real world engine faults. This made it hard for any classifier to draw clean boundaries.

Another challenge was feature dependency. Most models assumed feature independence or simple additive relationships, which may not capture the true causal mechanisms of engine behavior. Sequential models like LSTMs might perform better in future iterations by modeling temporal patterns.

In conclusion, the dataset itself was likely a limiting factor. Without access to richer sensor modalities (e.g., vibration, audio, or temporal sequences), our models were predicting health from snapshots rather than trends. This is a tough task, even for neural nets.

A Mobile-Friendly Dashboard

After building the prediction model, we needed to make it easy to use. So, we created a simple and clear user interface using Flet, a Python tool for making apps that work on Android, iOS, and desktop. This let us write one codebase that runs everywhere, matching our goal of keeping the app clean and easy to understand.

Car owners don’t need complex data and logs. They just need quick, clear answers. Our app delivers this with three main pages:

Home Page: A welcome screen with general car info. Users can enter six engine values in dedicated boxes. There's also a "Randomize" button to quickly test different inputs for demos.

Figure 4: Home page for entering engine values and randomizing inputs.

Dashboard Page: Shows the prediction result. The app tells users if their car's condition is Good, Poor, or Bad using clear color labels.

Figure 5: Dashboard page showing the car's predicted condition.

Maintenance Page: A calendar that suggests inspection dates. Users can accept or reject the suggestions. Accepted dates turn green for easy tracking.

Figure 6: Maintenance page with calendar and accepted inspection dates.

We tested the app and showed a full demo in class.

Figure 7: Demo video of the app in action.