top of page

Movies Rental Analysis

  • Writer: Aroyewun Airat
    Aroyewun Airat
  • Dec 13, 2022
  • 8 min read

Updated: Sep 30, 2024

Structured Query Language is a language that allows users to get information from data. There are various Relational Database Management Systems, For the purpose of this analysis, I'll be making use of MySQL Workbench to query the movie rental dataset.


ree

About Dataset

This data contains information about movie rentals of the films released in the year 2006. The dataset consists of 11 tables. The following is the list of tables and their content.

  1. Actor: This table provides information about the actors, their first name, last name, and last update.

  2. Actor_Award: This table provides information about awards received by actors. It contains the first and last name of the actor, the award received, date last updated.

  3. Customer: This table contains information about the customers. Customer's first_name, last_name, email, Address_id, and so on.

  4. Film: This contains information about the film. The film title, film description, release year, language ID, rental duration, rental length, film rating, special features, and so on.

  5. Film_Actor: This table contains the list of actor IDs and the films they featured in.

  6. Payment: This table contains payment information. Customer ID, Staff ID, Rental ID, Amount, Payment Date.

  7. Rental: This table contains rental information. Rental date, Inventory ID, Customer ID, Return date, Staff ID, and the Last date updated.

  8. Staff: This table contains information about staff, Staff First and Last name, Address ID, Email, Store ID, Username, Passport,

  9. Store: This table contains store information. Store ID, Manager Staff ID, Address ID, and so on.

  10. Investor: This table contains the list of investors their first name, last name, and company name.

  11. Advisor: This table contains information about the first and last names of advisors.


Creating Database

I started by creating a Database. A database is like a data warehouse, which stores structured data in an organized way.

CREATE SCHEMA moviesrentage;

Creating Tables

After creating the database, I went ahead to create tables.

Table Structure for Actor:

CREATE TABLE actor (
  actor_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  first_name VARCHAR(45) NOT NULL,
  last_name VARCHAR(45) NOT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (actor_id),
  KEY idx_actor_last_name (last_name)
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table Structure for Actor_Award:

CREATE TABLE actor_award (
  actor_award_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  actor_id SMALLINT, 
  first_name VARCHAR(45) NOT NULL,
  last_name VARCHAR(45) NOT NULL,
  awards VARCHAR(45) NOT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (actor_award_id),
  KEY idx_actor_last_name (last_name)
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table Structure for store:

CREATE TABLE store (
  store_id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
  manager_staff_id TINYINT UNSIGNED NOT NULL,
  address_id SMALLINT UNSIGNED NOT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (store_id),
  UNIQUE KEY idx_unique_manager (manager_staff_id),
  CONSTRAINT fk_store_staff FOREIGN KEY (manager_staff_id) REFERENCES staff (staff_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table Structure for Customer

CREATE TABLE customer (
  customer_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  store_id TINYINT UNSIGNED NOT NULL,
  first_name VARCHAR(45) NOT NULL,
  last_name VARCHAR(45) NOT NULL,
  email VARCHAR(50) DEFAULT NULL,
  active BOOLEAN NOT NULL DEFAULT TRUE,
  create_date DATETIME NOT NULL,
  last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (customer_id),
  KEY idx_fk_store_id (store_id),
  KEY idx_last_name (last_name),
  CONSTRAINT fk_customer_store FOREIGN KEY (store_id) REFERENCES store (store_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table Structure for Film

CREATE TABLE film (
  film_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  title VARCHAR(255) NOT NULL,
  description TEXT DEFAULT NULL,
  release_year YEAR DEFAULT NULL,
  rental_duration TINYINT UNSIGNED NOT NULL DEFAULT 3,
  rental_rate DECIMAL(4,2) NOT NULL DEFAULT 4.99,
  length SMALLINT UNSIGNED DEFAULT NULL,
  replacement_cost DECIMAL(5,2) NOT NULL DEFAULT 19.99,
  rating ENUM('G','PG','PG-13','R','NC-17') DEFAULT 'G',
  special_features SET('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') DEFAULT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (film_id),
  KEY idx_title (title)
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table structure for Film_Actor

CREATE TABLE film_actor (
  actor_id SMALLINT UNSIGNED NOT NULL,
  film_id SMALLINT UNSIGNED NOT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (actor_id,film_id),
  KEY idx_fk_film_id (`film_id`),
  CONSTRAINT fk_film_actor_actor FOREIGN KEY (actor_id) REFERENCES actor (actor_id) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT fk_film_actor_film FOREIGN KEY (film_id) REFERENCES film (film_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table Structure for Payment

CREATE TABLE payment (
  payment_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  customer_id SMALLINT UNSIGNED NOT NULL,
  staff_id TINYINT UNSIGNED NOT NULL,
  rental_id INT DEFAULT NULL,
  amount DECIMAL(5,2) NOT NULL,
  payment_date DATETIME NOT NULL,
  last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (payment_id),
  KEY idx_fk_staff_id (staff_id),
  KEY idx_fk_customer_id (customer_id),
  CONSTRAINT fk_payment_rental FOREIGN KEY (rental_id) REFERENCES rental (rental_id) ON DELETE SET NULL ON UPDATE CASCADE,
  CONSTRAINT fk_payment_customer FOREIGN KEY (customer_id) REFERENCES customer (customer_id) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT fk_payment_staff FOREIGN KEY (staff_id) REFERENCES staff (staff_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table structure for Rental

CREATE TABLE rental (
  rental_id INT NOT NULL AUTO_INCREMENT,
  rental_date DATETIME NOT NULL,
  customer_id SMALLINT UNSIGNED NOT NULL,
  return_date DATETIME DEFAULT NULL,
  staff_id TINYINT UNSIGNED NOT NULL,
  inventory_id SMALLINT UNSIGNED NOT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (rental_id),
  UNIQUE KEY  (rental_date,inventory_id,customer_id),
  KEY idx_fk_customer_id (customer_id),
  KEY idx_fk_staff_id (staff_id),
  CONSTRAINT fk_rental_staff FOREIGN KEY (staff_id) REFERENCES staff (staff_id) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT fk_rental_inventory FOREIGN KEY (customer_id) REFERENCES customer (customer_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table structure for Staff

CREATE TABLE staff (
  staff_id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
  first_name VARCHAR(45) NOT NULL,
  last_name VARCHAR(45) NOT NULL,
  picture BLOB DEFAULT NULL,
  email VARCHAR(50) DEFAULT NULL,
  store_id TINYINT UNSIGNED NOT NULL,
  active BOOLEAN NOT NULL DEFAULT TRUE,
  username VARCHAR(16) NOT NULL,
  password VARCHAR(40) BINARY DEFAULT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (staff_id),
  KEY idx_fk_store_id (store_id),
  CONSTRAINT fk_staff_store FOREIGN KEY (store_id) REFERENCES store (store_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table structure for Store

CREATE TABLE store (
  store_id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
  manager_staff_id TINYINT UNSIGNED NOT NULL,
  address_id SMALLINT UNSIGNED NOT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (store_id),
  UNIQUE KEY idx_unique_manager (manager_staff_id),
  KEY idx_fk_address_id (address_id),
  CONSTRAINT fk_store_staff FOREIGN KEY (manager_staff_id) REFERENCES staff (staff_id) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT fk_store_address FOREIGN KEY (address_id) REFERENCES address (address_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table structure for Investor

CREATE TABLE investor (
  investor_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  first_name VARCHAR(45) NOT NULL,
  last_name VARCHAR(45) NOT NULL,
  company_name VARCHAR(45) NOT NULL,
  PRIMARY KEY  (investor_id)
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

Table structure for Advisor

CREATE TABLE advisor (
  advisor_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  first_name VARCHAR(45) NOT NULL,
  last_name VARCHAR(45) NOT NULL,
  is_chairmain SMALLINT UNSIGNED NOT NULL,
  PRIMARY KEY  (advisor_id)
)ENGINE=InnoDB DEFAULT CHARSET=UTF8MB4;

After creating each table I imported the values from a CSV file by Right-clicking on the table in the schemas pane to import wizard.


Data Analysis

Now that I have the database setup, I went straight to analyzing the data. I made sure to check the tables for duplicates.

-- checking Actor tables for duplicates
SELECT 
      actor_id, 
      COUNT(actor_id)
FROM actor
GROUP BY 1
HAVING COUNT(actor_id) > 1;
-- checking Film table for duplicates
SELECT
      film_id,
      count(film_id)
FROM film
GROUP BY film_id
HAVING count(film_id) > 1;
-- checking Payment table for duplicates
SELECT
     payment_id,
     COUNT(payment_id)
FROM payment
GROUP BY payment_id
HAVING COUNT(payment_id) >1;
-- checking the Rental table for duplicates
SELECT 
      rental_id,
      COUNT(rental_id)
FROM rental
GROUP BY rental_id
HAVING COUNT(rental_id) > 1;
ree



Good! Seems like the tables are free from duplicates.


Top Customer Analysis: This analysis is to know the customer with the highest payment/revenue. The query below will extract the customer's name, the sum of all the payment customer has made, and the number of times a customer has rented films.

SELECT
      customer.customer_id,
      customer.first_name,
      customer.last_name,
      SUM(payment.amount) as "Total Payment",
      COUNT(rental.rental_id) as "Total Rent"
FROM customer
LEFT JOIN rental
ON rental.customer_id = customer.customer_id
LEFT JOIN payment
ON payment.customer_id = rental.rental_id
GROUP BY customer_id
ORDER BY 4 DESC;
ree

After creating a query to extract the customer's performance, I created a view that contains the customer analysis query.

A view is like a virtual table. Views can aggregate data from multiple tables into one view. An advantage of view is security. If you want to grant a person assess to view just a limited data, the view is a great way to do that.

CREATE VIEW Top_Customer AS 
SELECT 
      customer.customer_id,
      customer.first_name,
      customer.last_name,
      SUM(payment.amount) as "Total Payment",
      COUNT(rental.rental_id) as "Total Rent"
FROM customer
LEFT JOIN rental
ON rental.customer_id = customer.customer_id
LEFT JOIN payment
ON payment.customer_id = rental.rental_id
GROUP BY customer_id
ORDER BY 4 DESC;

After creating the Top_Customer view, I tested the view.

SELECT * FROM Top_Customer;
ree

Actor Performance Analysis: This analysis is to know the number of awards each actor has received. The query below will display the list of actors and the number of awards received.

Firstly, I wrote a script to know the types of awards won by actors.

SELECT DISTINCT awards FROM actor_award;
ree

Then I went straight to writing a query that extracts the names of Actors and the number of awards won.

SELECT
         actor.actor_id,  
         actor.first_name,
         actor.last_name,
        CASE WHEN actor_award.awards = "Emmy, Oscar, Tony " then "3 Awards"
		    WHEN actor_award.awards IN("Emmy, Oscar","Emmy, Tony","Tony, Oscar") then "2 Awards"
			 ELSE "1 Award" END as "Number of Awards Won"
FROM actor
LEFT JOIN actor_award
ON actor_award.actor_id = actor.actor_id;
ree

After writing the query, I decided to create a view containing the Actor's performance.

CREATE VIEW Actor_performance as
SELECT  actor.actor_id,  
	  actor.first_name,
        actor.last_name,
        COUNT(film_actor.film_id) as number_of_films_featuredin,
        CASE WHEN actor_award.awards = "Emmy, Oscar, Tony " then "3 Awards"
		 WHEN actor_award.awards IN("Emmy, Oscar","Emmy, Tony","Tony, Oscar") then "2 Awards"
		 ELSE "1 Award" END as "Number of Awards Won"
FROM actor
LEFT JOIN actor_award
ON actor_award.actor_id = actor.actor_id
LEFT JOIN film_actor
ON film_actor.actor_id = actor.actor_id
GROUP BY 1
ORDER BY 4 DESC;
SELECT * FROM Actor_Performance

The query below will join the Investor and Advisor table. I first wrote a query to preview both tables.

Preview Investor table;

SELECT * FROM investor;
ree

Preview Advisor table;

SELECT * FROM advisor;
ree

Using the UNION function, I joined the investor and advisor table.

SELECT "investor" AS type,
	   first_name,
       last_name,
       company_name
FROM investor
     UNION
SELECT "advisor" AS type,
	   first_name,
       Last_name,
       null
FROM Advisor;
ree

Store Performance Analysis: This Analysis is to compare the store's performance, To know the Amount made in each store, and The number of active customers in each store.

SELECT
	store.store_id,
      COUNT(CASE WHEN customer.active = 1 and customer.store_id = 1 then "Store_1_active"
		     WHEN customer.active = 1 and customer.store_id = 2 then "Store_2_active"
                 Else "Inactive" End) AS "Active Status",
	SUM(payment.amount) AS "Amount made in each store",
      MAX(payment.last_update) AS Last_update
FROM customer
INNER JOIN store
ON store.store_id = customer.store_id
INNER JOIN payment
ON payment.customer_id = customer.customer_id
GROUP BY 1;
ree

Staff Performance Analysis: This analysis is to compare how active each staff is. This query is to extract the Staff ID, Staff name, Total number of films rented out, and the Total Revenue for each staff.

SELECT 
     staff.staff_id,
    staff.first_name,
     COUNT(rental.rental_id) as "Number of items rented out",
     SUM(payment.amount) AS "Total Amount",
     MAX(rental.last_update)
FROM rental
LEFT JOIN payment
ON rental.rental_id = payment.rental_id
INNER JOIN staff
ON staff.staff_id = rental.staff_id
GROUP BY 1;
ree

After extracting the values, I created a stored procedure to store staff performance, where you can insert a particular staff_id and the staff performance for the particular store ID will pop out.

A Stored procedure can share complex procedures more easily between Analysts and other database users. The IN parameter can take in a value, which will also output.

Delimiter //
CREATE PROCEDURE Staff_Activity(IN staf_id int)
BEGIN
	 SELECT 
           staff.staff_id,
           staff.first_name,
	       COUNT(rental.rental_id) as "Total Rent served",
		  SUM(payment.amount) AS "Total Amount",
           MAX(rental.last_update)
     FROM rental
     LEFT JOIN payment
     ON rental.rental_id = payment.rental_id
	 INNER JOIN staff
     ON staff.staff_id = rental.staff_id
     WHERE staff.staff_id = staf_id
     GROUP BY 1;
END //

After creating the store procedure, I tested it. I called the procedure to extract the staff performance for staff with ID 1.

CALL Staff_Activity(1);
ree

Return duration Analysis: This analysis is to know how long it takes each customer to return a film, the customer that takes a long time to return a film. The query below will extract the avg days it takes for each customer to return a film after rent.

SELECT 
       customer.customer_id,
       ROUND(AVG(DATEDIFF(rental.return_date,rental.rental_date)),1) AS Avg_days_to_return,
       MAX(rental.last_update) AS Last_updated
FROM Rental
LEFT JOIN customer
ON customer.customer_id = rental.customer_id
GROUP BY 1
ORDER BY 2 desc;
ree


Comments


  • Github
  • email
  • LinkedIn
bottom of page