Tutorial 2
Tutorial 2
Today I am taking the second tutorial of the course. The goal of this tutorial is to learn basics of regression analysis.^
Here is the outline of the tutorial:
- Simulate data
- Fit a linear regression model
- Evaluate the model
1. Simulate data
We will simulate a simple linear regression model. The model is to simulate people’s weight based on their height. The model is defined as:
\[\text{weight} = 50 + 0.5 \times \text{height} + \epsilon\]where $\epsilon$ is a random noise term.
import numpy as np
import pandas as pd
np.random.seed(0)
n = 100
height = np.random.normal(160, 10, n)
weight = 50 + 0.5 * height + np.random.normal(0, 5, n)
data = pd.DataFrame({'height': height, 'weight': weight})
data.head()
Now I want the code that could be used to plot the histogram of the height and weight.
import matplotlib.pyplot as plt
plt.hist(data['height'], bins=20)
plt.xlabel('Height')
plt.ylabel('Frequency')
plt.show()
Now, I want to plot the scatter plot of height and weight.
plt.scatter(data['height'], data['weight'])
plt.xlabel('Height')
plt.ylabel('Weight')
plt.show()
Now, I want to the data to csv file.
data.to_csv('data.csv', index=False)