top of page

Car Mileage Prediction Using Simple Linear Regression: A Step-by-Step Guide

This project demonstrates predicting a car’s miles per gallon (MPG) using simple linear regression. By analyzing the relationship between weight and MPG, the model offers a beginner-friendly introduction to regression analysis with Python, pandas, and scikit-learn, providing insights for those exploring predictive modeling.

Keywords: car mileage prediction, simple linear regression, data science project, MPG prediction, regression model tutorial


Introduction: Why Predict Car Mileage?


Fuel efficiency (measured in miles per gallon, or MPG) is a critical factor for both car buyers and manufacturers. In this project, we’ll predict a vehicle’s MPG using simple linear regression, a foundational machine learning algorithm. This guide is designed for data science professionals and beginners looking to understand regression modeling in practice.



What is Simple Linear Regression?

Simple linear regression models the relationship between one independent variable (e.g., car weight) and a dependent variable (MPG) using a straight line. The equation:MPG = slope × weight + interceptOur goal is to find the best-fit line that minimizes prediction errors.



Step-by-Step Implementation



1. Data Loading and Initial Exploration

We start by importing libraries and loading the dataset:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Load dataset
df = pd.read_csv('CarData.csv')

Key observations:


  • The dataset contains 398 entries with features like weight, cylinders, and horsepower.

  • df.info() reveals 6 missing values in the horsepower column (we’ll simplify the model by dropping non-essential columns).



2. Data Preprocessing

To focus on the weight-MPG relationship, we retain only relevant columns:


df = df.drop(columns=['cylinders', 'displacement', 'horsepower', 
  	'acceleration', 'model_year', 'origin', 'name'], axis=1)                  

Why this step matters: Reducing noise from unrelated features simplifies the model for beginners.



3. Visualizing the Relationship

A scatter plot reveals the negative correlation between weight and MPG:

plt.scatter(x=df['weight'], y=df['mpg'], c="#9ACBD0")
plt.xlabel('Weight')
plt.ylabel('MPG')
plt.show()

Scatter Plot: Weight vs MPG
Scatter Plot: Weight vs MPG

Insight: Heavier cars generally have lower fuel efficiency.



4. Splitting Data for Training

We divide the data into training (75%) and testing (25%) sets:

x = df[['weight']]  # Independent variable
y = df['mpg']       # Dependent variable
x_train, x_test, y_train, y_test = train_test_split(x, y, 		  
test_size=0.25, random_state=66)


5. Model Training and Evaluation

Training the Model
model = LinearRegression()
model.fit(x_train, y_train)

Making Predictions
y_pred = model.predict(x_test)

Evaluating Accuracy

We use the R-squared (R²) score to measure how well the regression line fits the data:

r2 = r2_score(y_test, y_pred)
print(f"R2 = {r2:.1f}")  # Output: R2 = 0.7

Interpretation: An R² of 0.7 means 70% of the variance in MPG is explained by weight – a decent fit for a simple model.



6. Visualizing the Regression Line

plt.scatter(x_test, y_test, color='#9ACBD0', label='Actual Data')
plt.plot(x_test, y_pred, color='#2973B2', label='Regression Line',  linewidth=2)
plt.xlabel('Weight')
plt.ylabel('MPG')
plt.legend()
plt.title('Simple Linear Regression: Weight vs. MPG')
plt.show()


Regression Line (Best Fit)
Regression Line (Best Fit)

Conclusion


  1. Results: Our model shows a moderate negative correlation between car weight and MPG.

  2. Limitations: Simple linear regression ignores other factors (e.g., engine power). For better accuracy, consider multiple linear regression.

  3. Next Steps:

    • Experiment with polynomial regression for non-linear relationships.

    • Include features like horsepower or origin for a richer model.



Github Repo


Access the GitHub repository and run the Jupyter notebook to explore the code interactively.


Contact
Information

Anshuman Bal

anshbal06@gmail.com

Bhubaneswar 

Odisha, India 

anshumanbal.com

+91 943-856-0707

  • LinkedIn
  • GitHub

Subscribe to my newsletter!

I respect your privacy!

©2025 by Anshuman Bal. 

bottom of page