SQL | Interviews
Challenge introduction
This challenge offers a valuable exercise in constructing complex queries, aggregating and transforming data, and handling missing data. This problem enhances SQL skills, especially in areas like CTEs, JOIN operations, and data aggregation. These skills are useful in real-world scenarios, where data from multiple sources needs consolidation and analysis.
The problem is sourced from HackerRank and is available at the following link. I recommend attempting to solve the challenge independently before reviewing the provided solution.
Problem statement
Samantha interviews many candidates from different colleges using coding challenges and contests. Write a query to print the contest_id, hacker_id, name, and the sums of total_submissions, total_accepted_submissions, total_views, and total_unique_views for each contest sorted by contest_id. Exclude the contest from the result if all four sums are zero.
Note: A specific contest can be used to screen candidates at more than one college, but each college only holds one screening contest.
Input format.The data needed for this challenge is spread across the following tables:
Column | Type |
---|---|
contest_id | Integer |
hacker_id | Integer |
name | String |
contests
The contest_id is the id of the contest, hacker_id is the id of the hacker who created the contest, and name is the name of the hacker.
Column | Type |
---|---|
college_id | Integer |
contest_id | Integer |
colleges
The college_id is the id of the college, and contest_id is the id of the contest that Samantha used to screen the candidates.
Column | Type |
---|---|
challenge_id | Integer |
college_id | Integer |
challenges
The challenge_id is the id of the challenge that belongs to one of the contests whose contest_id Samantha used, and college_id is the id of the college where the challenge was given to candidates.
Column | Type |
---|---|
challenge_id | Integer |
total_views | Integer |
total_unique_views | Integer |
view_stats
The challenge_id is the id of the challenge, total_views is the number of times the challenge was viewed by candidates, and total_unique_views is the number of times the challenge was viewed by unique candidates.
Column | Type |
---|---|
challenge_id | Integer |
total_submissions | Integer |
total_accepted_submissions | Integer |
submission_stats
The challenge_id is the id of the challenge, total_submissions is the number of submissions for the challenge, and total_accepted_submission is the number of submissions that achieved full scores.
Sample input.
Take as an example the data contained in the following tables:
contest_id | hacker_id | name |
---|---|---|
66406 | 17973 | Rose |
66556 | 79153 | Angela |
94828 | 80275 | Frank |
college_id | contest_id |
---|---|
11219 | 66406 |
32473 | 66556 |
56685 | 94828 |
challenge_id | college_id |
---|---|
18765 | 11219 |
47127 | 11219 |
60292 | 32473 |
72974 | 56685 |
challenge_id | total_views | total_unique_views |
---|---|---|
47127 | 26 | 19 |
47127 | 15 | 14 |
18765 | 43 | 10 |
18765 | 72 | 13 |
75516 | 35 | 17 |
60292 | 11 | 10 |
72974 | 41 | 15 |
75516 | 75 | 11 |
challenge_id | total_submissions | total_accepted_submissions |
---|---|---|
75516 | 34 | 12 |
47127 | 27 | 10 |
47127 | 56 | 18 |
75516 | 74 | 12 |
75516 | 83 | 8 |
72974 | 68 | 24 |
72974 | 82 | 14 |
47127 | 28 | 11 |
Expected output.
The expected output given the sample input is the following:
66406 | 17973 | Rose | 111 | 39 | 156 | 56 |
66556 | 79153 | Angela | 0 | 0 | 11 | 10 |
94828 | 80275 | Frank | 150 | 38 | 41 | 15 |
Explanation.
The contest 66406 is used in the college 11219. In this college challenges 18765 and 47127 are asked, so from the view and submission stats we get:
- Sum of total submissions = 27 + 56 + 28 = 111
- Sum of total accepted submissions = 10 + 18 + 11 = 39
- Sum of total views = 43 + 72 + 26 + 15 = 156
- Sum of total unique views = 10 + 13 + 19 + 14 = 56
Similarly, we can find the sums for contests 66556 and 94828.
Solution
The critical aspect of solving this problem lies in recognizing the specific sequence of aggregation operations that must be executed. With the exception of the view_stats and submission_stats tables, all other tables primarily serve as intermediaries connecting the contest_id (our primary identifier) with the challenge_id, which, in turn, links to view and submission statistics.
The initial step involves acknowledging that views and submissions are distinct aspects stemming from the challenge. Therefore, the aggregation of both tables by challenge_id should be carried out independently before merging them based on this identifier. Subsequently, the final step entails joining the tables as interconnections and conducting the overarching aggregation based on contest_id.
-- Generate auxiliary CTEs
WITH submissions_agg AS (
SELECT challenge_id,
SUM(total_submissions) AS sum_subs,
SUM(total_accepted_submissions) AS sum_acc_subs
FROM submission_stats
GROUP BY challenge_id),
views_agg AS (
SELECT challenge_id,
SUM(total_views) AS sum_views,
SUM(total_unique_views) AS sum_unique_views
FROM view_stats
GROUP BY challenge_id),
subs_views_agg AS (
SELECT COALESCE(s.challenge_id, v.challenge_id) AS challenge_id,
COALESCE(sum_subs, 0) AS sum_subs,
COALESCE(sum_acc_subs, 0) AS sum_acc_subs,
COALESCE(sum_views, 0) AS sum_views,
COALESCE(sum_unique_views, 0) AS sum_unique_views
FROM submissions_agg AS s
FULL JOIN views_agg AS v
ON s.challenge_id = v.challenge_id)
-- Main query
SELECT cont.contest_id, hacker_id, name,
SUM(sum_subs) AS sum_subs,
SUM(sum_acc_subs) AS sum_acc_subs,
SUM(sum_views) AS sum_views,
SUM(sum_unique_views) AS sum_unique_views
FROM subs_views_agg AS sv
LEFT JOIN challenges as chall
ON sv.challenge_id = chall.challenge_id
LEFT JOIN colleges AS coll
ON chall.college_id = coll.college_id
LEFT JOIN contests as cont
ON coll.contest_id = cont.contest_id
GROUP BY cont.contest_id, hacker_id, name
ORDER BY cont.contest_id ASC;