DHUM 25A43 - Fall 2025 Course Introduction

Pandas practice on the IMDB 1000 movies dataset

data dictionnary

1. First Look at the Data

Q1. Display the first 5 rows of the dataset.

Q2. What are the column names?

Q3. What is the dimension of the dataset?

Q4. What is the data type of each column?

Q5. What is the number of missing values in each column?

Q6. What is the average in each column?

Q7. who is the most common director?

Q8. what is the most common genre?

This is more complex as we need to concatenate the genres and then count the most common ones.

2. Selecting Columns (Projection)

Q3. Display only the Series_Title column.

Q4. Display the Series_Title and IMDB_Rating columns together.

3. Simple Filtering (Masks on Rows)

Q5. Show all movies with an IMDb rating higher than 9.0.

Q6. Show all movies released after 2010.

Q7. Show movies with the certificate "PG-13".

4. Exploring the Data

Q8. What is the highest IMDb rating in the dataset?

Q9. Which movie has the highest IMDb rating?

Q10. Count how many movies are in each Certificate category.

Q11. What is the average IMDb rating of all movies?

Q12. What is the most common genre?

Q5. do recent movies have better ratings?

5. Sorting Data

Q13. List the top 10 movies with the highest IMDb rating.

Q14. List the top 10 movies with the highest number of votes.

6. Visualization

(use matplotlib or pandas built-in plotting)

Q15. Plot a histogram of IMDb ratings.

Q16. Create a scatter plot of IMDB_Rating (y-axis) vs. Released_Year (x-axis).

Q17. Create a bar plot of the top 5 genres by number of movies.

7. String to minutes

Q18. Convert the Runtime column (e.g. “142 min”) into numeric minutes and find the average runtime.

Q19. What are the top 5 movies with the highest gross revenue?