Car Mileage Prediction Using Simple Linear Regression: A Step-by-Step Guide

This project demonstrates predicting a car’s miles per gallon (MPG) using simple linear regression. By analyzing the relationship between weight and MPG, the model offers a beginner-friendly introduction to regression analysis with Python, pandas, and scikit-learn, providing insights for those exploring predictive modeling.

Keywords: car mileage prediction, simple linear regression, data science project, MPG prediction, regression model tutorial

Introduction: Why Predict Car Mileage?

Fuel efficiency (measured in miles per gallon, or MPG) is a critical factor for both car buyers and manufacturers. In this project, we’ll predict a vehicle’s MPG using simple linear regression, a foundational machine learning algorithm. This guide is designed for data science professionals and beginners looking to understand regression modeling in practice.

What is Simple Linear Regression?

Simple linear regression models the relationship between one independent variable (e.g., car weight) and a dependent variable (MPG) using a straight line. The equation:MPG = slope × weight + interceptOur goal is to find the best-fit line that minimizes prediction errors.

Step-by-Step Implementation

1. Data Loading and Initial Exploration

We start by importing libraries and loading the dataset:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Load dataset
df = pd.read_csv('CarData.csv')

Key observations:

The dataset contains 398 entries with features like weight, cylinders, and horsepower.
df.info() reveals 6 missing values in the horsepower column (we’ll simplify the model by dropping non-essential columns).

2. Data Preprocessing

To focus on the weight-MPG relationship, we retain only relevant columns:

df = df.drop(columns=['cylinders', 'displacement', 'horsepower', 
  	'acceleration', 'model_year', 'origin', 'name'], axis=1)

Why this step matters: Reducing noise from unrelated features simplifies the model for beginners.

3. Visualizing the Relationship

A scatter plot reveals the negative correlation between weight and MPG:

plt.scatter(x=df['weight'], y=df['mpg'], c="#9ACBD0")
plt.xlabel('Weight')
plt.ylabel('MPG')
plt.show()

Insight: Heavier cars generally have lower fuel efficiency.

4. Splitting Data for Training

We divide the data into training (75%) and testing (25%) sets:

x = df[['weight']]  # Independent variable
y = df['mpg']       # Dependent variable
x_train, x_test, y_train, y_test = train_test_split(x, y, 		  
test_size=0.25, random_state=66)

5. Model Training and Evaluation

Training the Model

model = LinearRegression()
model.fit(x_train, y_train)

Making Predictions

y_pred = model.predict(x_test)

Evaluating Accuracy

We use the R-squared (R²) score to measure how well the regression line fits the data:

r2 = r2_score(y_test, y_pred)
print(f"R2 = {r2:.1f}")  # Output: R2 = 0.7

Interpretation: An R² of 0.7 means 70% of the variance in MPG is explained by weight – a decent fit for a simple model.

6. Visualizing the Regression Line

plt.scatter(x_test, y_test, color='#9ACBD0', label='Actual Data')
plt.plot(x_test, y_pred, color='#2973B2', label='Regression Line',  linewidth=2)
plt.xlabel('Weight')
plt.ylabel('MPG')
plt.legend()
plt.title('Simple Linear Regression: Weight vs. MPG')
plt.show()

Conclusion

Results: Our model shows a moderate negative correlation between car weight and MPG.
Limitations: Simple linear regression ignores other factors (e.g., engine power). For better accuracy, consider multiple linear regression.
Next Steps:
- Experiment with polynomial regression for non-linear relationships.
- Include features like horsepower or origin for a richer model.

Github Repo

Access the GitHub repository and run the Jupyter notebook to explore the code interactively.

Anshuman Bal

Car Mileage Prediction Using Simple Linear Regression: A Step-by-Step Guide

Introduction: Why Predict Car Mileage?

What is Simple Linear Regression?

Step-by-Step Implementation

1. Data Loading and Initial Exploration

2. Data Preprocessing

3. Visualizing the Relationship

4. Splitting Data for Training

5. Model Training and Evaluation

Training the Model

Making Predictions

Evaluating Accuracy

6. Visualizing the Regression Line

Conclusion

Github Repo

Contact
Information

Subscribe to my newsletter!

I respect your privacy!

Anshuman Bal

Car Mileage Prediction Using Simple Linear Regression: A Step-by-Step Guide

Introduction: Why Predict Car Mileage?

What is Simple Linear Regression?

Step-by-Step Implementation

1. Data Loading and Initial Exploration

2. Data Preprocessing

3. Visualizing the Relationship

4. Splitting Data for Training

5. Model Training and Evaluation

Training the Model

Making Predictions

Evaluating Accuracy

6. Visualizing the Regression Line

Conclusion

Github Repo

Contact Information

Subscribe to my newsletter!

I respect your privacy!

Contact
Information