You need to install Python, NumPy, Pandas, Matplotlib and Seaborn. For that, you can the instructions from 06-environment.md.
What's the version of Pandas that you installed?
You can get the version information using the __version__
field:
pd.__version__
For this homework, we'll use the Laptops Price dataset. Download it from here.
You can do it with wget:
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv
Or just open it with your browser and click "Save as...".
Now read it with Pandas.
How many records are in the dataset?
- 12
- 1000
- 2160
- 12160
How many laptop brands are presented in the dataset?
- 12
- 27
- 28
- 2160
How many columns in the dataset have missing values?
- 0
- 1
- 2
- 3
What's the maximum final price of Dell notebooks in the dataset?
- 869
- 3691
- 3849
- 3936
- Find the median value of
Screen
column in the dataset. - Next, calculate the most frequent value of the same
Screen
column. - Use
fillna
method to fill the missing values inScreen
column with the most frequent value from the previous step. - Now, calculate the median value of
Screen
once again.
Has it changed?
Hint: refer to existing
mode
andmedian
functions to complete the task.
- Yes
- No
- Select all the "Innjoo" laptops from the dataset.
- Select only columns
RAM
,Storage
,Screen
. - Get the underlying NumPy array. Let's call it
X
. - Compute matrix-matrix multiplication between the transpose of
X
andX
. To get the transpose, useX.T
. Let's call the resultXTX
. - Compute the inverse of
XTX
. - Create an array
y
with values[1100, 1300, 800, 900, 1000, 1100]
. - Multiply the inverse of
XTX
with the transpose ofX
, and then multiply the result byy
. Call the resultw
. - What's the sum of all the elements of the result?
Note: You just implemented linear regression. We'll talk about it in the next lesson.
- 0.43
- 45.29
- 45.58
- 91.30
- Submit your results here: https://courses.datatalks.club/ml-zoomcamp-2024/homework/hw01
- If your answer doesn't match options exactly, select the closest one