data is available here imdb_top_1000.csv
data is also available at : https://www.kaggle.com/datasets/mayankray/imdb-top-1000-movies-dataset
data dictionnary
Q1. Display the first 5 rows of the dataset.
.head().Q2. What are the column names?
.columns. (no parenthesis)Q3. What is the dimension of the dataset?
.shape.Q4. What is the data type of each column?
.dtypes or .info().Q5. What is the number of missing values in each column?
.isnull().sum().Q6. What is the average in each column?
.isnull().mean().Q7. who is the most common director?
.value_counts().Q8. what is the most common genre?
This is more complex as we need to concatenate the genres and then count the most common ones.
Q3. Display only the Series_Title column.
Q4. Display the Series_Title and IMDB_Rating columns together.
Q5. Show all movies with an IMDb rating higher than 9.0.
Q6. Show all movies released after 2010.
Q7. Show movies with the certificate "PG-13".
Q8. What is the highest IMDb rating in the dataset?
.max().Q9. Which movie has the highest IMDb rating?
.loc[] and .max().Q10. Count how many movies are in each Certificate category.
.value_counts().Q11. What is the average IMDb rating of all movies?
.mean().Q12. What is the most common genre?
.mode() or .value_counts().Q5. do recent movies have better ratings?
Q13. List the top 10 movies with the highest IMDb rating.
.sort_values().Q14. List the top 10 movies with the highest number of votes.
(use matplotlib or pandas built-in plotting)
Q15. Plot a histogram of IMDb ratings.
Q16. Create a scatter plot of IMDB_Rating (y-axis) vs. Released_Year (x-axis).
Q17. Create a bar plot of the top 5 genres by number of movies.
Q18. Convert the Runtime column (e.g. "142 min") into numeric minutes and find the average runtime.
Q19. What are the top 5 movies with the highest gross revenue?
Gross to numeric and sort.