What if your car could warn you before something goes wrong? In this project, we used real car sensor data and machine learning to predict engine problems early. We cleaned messy data, tested different models, and built a simple mobile app that gives drivers clear and helpful alerts about their car’s health.
Every robust system starts with thoughtful design. In our project, that design took the form of an object-oriented model that mirrors how data flows in real vehicles. Before diving into machine learning, we focused on building a structured pipeline that could handle inputs, make predictions, and respond to user feedback.
We designed our object model around several core classes: Engine, Car, Sensor, and Predictor. The Sensor class was responsible for reading key engine parameters such as RPM, coolant temperature and pressure, oil temperature and pressure, and fuel pressure. These readings served as the input for our prediction system.
The Predictor class held the trained machine learning model. It accepted input from the sensors (in the form of VehicleData instances from the Dataset), handled internal pre-processing, and returned a health prediction. This modular approach ensured that if we later updated the pre-processing method or replaced the model, the rest of the system would continue working without changes.
We also modeled interactions with the car owner. A maintenance scheduler monitored prediction results and sent notifications to the user. It allowed users to accept or reject suggested maintenance times and updated the app interface based on their choices. This connection between backend logic and user interface made the system responsive, flexible, and user-friendly.
Real world data is rarely clean, and our automotive dataset was no exception. With over 19,000 engine samples, we encountered the usual suspects: outliers, skewed distributions, and uneven scales. Left untreated, these issues would have seriously undermined the performance of any predictive model.
To tackle this, we began with outlier analysis using the Interquartile Range (IQR) method. For each feature - RPM, coolant temperature, etc. we computed the 25th (Q1) and 75th (Q3) percentiles. Any value falling below Q1 - 1.5Ă—IQR or above Q3 + 1.5Ă—IQR was flagged as an outlier. These extreme values were often the result of sensor errors or unrealistic operating conditions and were subsequently removed from the dataset.
But even after cleaning, we faced another challenge: feature scale imbalance. Some sensor readings spanned thousands of units (like RPM), while others were constrained to smaller ranges (like coolant temperature). To ensure no single feature dominated the model's learning process, we applied MinMax scaling using the MinMaxScaler from scikit-learn. This transformed all features to a 0 - 1 range, giving each equal weight in training.
This two-step pre-processing strategy—IQR-based outlier removal followed by MinMax normalization—formed the bedrock of our data pipeline. It leveled the playing field, allowing our models to detect meaningful patterns without being skewed by noisy or overpowered inputs.
The link to the dataset in Kaggle.
With clean data in hand, we turned our focus to modeling. But we didn’t want to rely on a single one. Instead, we used a comparative approach, testing six different machine learning classifiers; each with distinct strengths, weaknesses.
We began with Logistic Regression, our baseline model. It was fast, interpretable, and surprisingly competitive: achieving a training accuracy of 66.74% and a testing accuracy of 65.91%. Its simplicity made it a great diagnostic tool for spotting linearly separable relationships.
Next came the Decision Tree Classifier, which learned nonlinear patterns by recursively splitting features. It scored slightly better in training (68.95%) but dipped in testing (64.59%), a sign of mild overfitting. To address this, we tried the Random Forest Classifier, an ensemble of decision trees that averaged their outputs to reduce variance. This model achieved the highest training accuracy (79.33%) but saw only modest gains in testing (65.64%).
We also tested the k-Nearest Neighbors (k-NN) algorithm. While intuitive and simple, it struggled with the randomness in our data—overfitting to training samples (75.28%) but generalizing poorly (62.21%).
Interestingly, Gaussian Naive Bayes became the strongest model in our project. It assumed feature independence (which was not strictly true), but its probabilistic nature and low complexity gave it consistent performance: 67.09% training accuracy and 66.11% testing accuracy.
Finally, the Support Vector Machine (SVM) showed disappointing results, peaking at 64.88% training and 60.77% testing. Its sensitivity to hyperparameters like C and gamma, coupled with the high dimensionality of our feature space, likely hindered its performance.
In summary, every model taught us something about the data, about our assumptions, and about the tradeoff between complexity and generalization.
| Model | Training Accuracy | Testing Accuracy |
|---|---|---|
| Logistic Regression | 66.74% | 65.91% |
| Decision Tree Classifier | 68.95% | 64.59% |
| Random Forest Classifier | 79.33% | 65.64% |
| K-Neighbors Classifier | 75.28% | 62.21% |
| Gaussian Naive-Bayes | 67.09% | 66.11% |
| Support Vector Machine | 64.88% | 60.77% |
While our modeling pipeline was sound, the results left room for improvement. Our top performing model (Gaussian Naive Bayes) topped out at 66.11% testing accuracy; respectable, but far from industry grade. Understanding this plateau became a project of its own.
One issue was data separability. Even after scaling, the difference between “healthy” and “faulty” classes was often subtle. Many samples had overlapping sensor profiles, suggesting either labeling inconsistencies or intrinsic ambiguity in real world engine faults. This made it hard for any classifier to draw clean boundaries.
Another challenge was feature dependency. Most models assumed feature independence or simple additive relationships, which may not capture the true causal mechanisms of engine behavior. Sequential models like LSTMs might perform better in future iterations by modeling temporal patterns.
In conclusion, the dataset itself was likely a limiting factor. Without access to richer sensor modalities (e.g., vibration, audio, or temporal sequences), our models were predicting health from snapshots rather than trends. This is a tough task, even for neural nets.
After building the prediction model, we needed to make it easy to use. So, we created a simple and clear user interface using Flet, a Python tool for making apps that work on Android, iOS, and desktop. This let us write one codebase that runs everywhere, matching our goal of keeping the app clean and easy to understand.
Car owners don’t need complex data and logs. They just need quick, clear answers. Our app delivers this with three main pages:
We tested the app and showed a full demo in class.