SQL | 15 days of learning SQL

Challenge introduction

This challenge offers a valuable exercise in constructing complex queries, aggregating and transforming data, working with dates and generating running statistics. This problem enhances SQL skills, especially in areas like CTEs, JOIN operations, window functions and data aggregation. These skills are useful in real-world scenarios, where data from multiple sources needs consolidation and analysis.

The problem is sourced from HackerRank and is available at the following link. I recommend attempting to solve the challenge independently before reviewing the provided solution.

Problem statement

Julia conducted a 15 days of learning SQL contest. The start date of the contest was March 01, 2016 and the end date was March 15, 2016.

Write a query to print total number of unique hackers who made at least one submission each day (starting on the first day of the contest), and find the hacker_id and name of the hacker who made maximum number of submissions each day. If more than one such hacker has a maximum number of submissions, print the lowest hacker_id. The query should print this information for each day of the contest, sorted by the date.

Input format.
The data needed for this challenge is spread across the following tables:

ColumnType
hacker_idInteger
nameString

hackers

The hacker_id is the id of the hacker, and name is the name of the hacker

ColumnType
submission_dateDate
submission_idInteger
hacker_idInteger
scoreInteger

submissions

The submission_date is the date of the submission, submission_id is the id of the submission, hacker_id is the id of the hacker who made the submission, and score is the score of the submission.

Sample input.

Take as an example (assuming that the end date of the contest was March 06, 2016) the data contained in the following tables:

hacker_idname
15758Rose
20703Angela
36396Frank
38289Patrick
44065Lisa
53473Kimberly
62529Bonnie
79722Michael
submission_datesubmission_idhacker_idscore
2016-03-018494207030
2016-03-01224035347315
2016-03-01239657972260
2016-03-01301733639670
2016-03-0234928207030
2016-03-02387401575860
2016-03-02427697972225
2016-03-02443647972260
2016-03-0345440207030
2016-03-03490503639670
2016-03-0350273797225
2016-03-0450344207030
2016-03-04513604406590
2016-03-04544045347365
2016-03-04615337972245
2016-03-0572852207030
2016-03-0574546382890
2016-03-0576487625290
2016-03-05824393639610
2016-03-05900063639640
2016-03-0690404207030

Expected output.

The expected output given the sample input is the following:

2016-03-01420703Angela
2016-03-02279722Michael
2016-03-03220703Angela
2016-03-04220703Angela
2016-03-05136396Frank
2016-03-06120703Angela

Explanation.

On March 01, 2016 hackers 20703, 36396, 53473, and 79722 made submissions. There are 4 unique hackers who made at least one submission each day. As each hacker made one submission, 20703 is considered to be the hacker who made maximum number of submissions on this day. The name of the hacker is Angela.

On March 02, 2016 hackers 15758, 20703, and 79722 made submissions. Now 20703 and 79722 were the only ones to submit every day, so there are 2 unique hackers who made at least one submission each day. 79722 made two submissions, and name of the hacker is Michael.

On March 03, 2016 hackers 20703, 36396, and 79722 made submissions. Now 20703 and 79722 were the only ones, so there are 2 unique hackers who made at least one submission each day. As each hacker made one submission, 20703 is considered to be the hacker who made maximum number of submissions on this day. The name of the hacker is Angela.

On March 04, 2016 hackers 20703, 36396, 38289, and 65529made submissions. Now only 20703 and 79722 submitted each day, so there are 2 unique hackers who made at least one submission each day. As each hacker made one submission, 20703 is considered to be the hacker who made maximum number of submissions on this day. The name of the hacker is Angela.

On March 05, 2016 hackers 20703, 36396, 38289 and 65529 made submissions. Now only 20703 submitted each day, so there is only 1 unique hacker who made at least one submission each day. 36396 made two submissions and name of the hacker is Frank.

On March 06, 2016 only 20703 made a submission, so there is only 1 unique hacker who made at least one submission each day. 20703 made one submission and the name of the hacker is Angela.

Solution

To approach this problem effectively, it's crucial to recognize that it involves two distinct challenges, each requiring a different set of logical skills. However, both complexities can be simplified by initially aggregating the submission data by date and hacker, creating a foundational auxiliary table for subsequent operations.

The first, more straightforward challenge involves identifying the hacker with the most submissions each day. This can be accomplished by sorting hackers within each date based on submission counts descending and hacker IDs ascending, consistently selecting the top-ranked hacker for each submission date block.

Conversely, the second challenge, which demands tracking the count of hackers submitting daily, is more intricate. To tackle this, the chosen approach utilizes window functions to streamline the task of calculating consecutive submission days for each hacker. This involves employing DENSE_RANK to determine the index of the submission day's and comparing it to the ROW_NUMBER for a given hacker_id for this date. It's important to note that this approach assumes at least one hacker submits every day; otherwise, DENSE_RANK may not function as intended. Additionally, while DATEPART could replace DENSE_RANK under the specific conditions of this problem where dates begin at the month's first day, it would result in a less versatile solution that would not work if start or end dates change.

Finally, the solution integrates all components and incorporates the hackers' table to extract their names, completing the task.


-- Generate auxiliary CTEs
WITH daily_subs AS (
        SELECT hacker_id, submission_date, COUNT(*) AS num_subs
        FROM submissions
        GROUP BY hacker_id, submission_date),
    
    all_days_subs AS (
        SELECT submission_date, hacker_id
        FROM (SELECT submission_date, 
                     hacker_id,
                     DENSE_RANK() OVER(ORDER BY submission_date ASC) AS day_num, 
                     ROW_NUMBER() OVER (PARTITION BY hacker_id
                                        ORDER BY submission_date ASC) AS streak
              FROM daily_subs) AS streak_check
        WHERE day_num = streak),
        
    num_all_day_subs AS (
        SELECT submission_date, COUNT(*) AS num_subs
        FROM all_days_subs
        GROUP BY submission_date),
    
    daily_ranking AS (
        SELECT submission_date, hacker_id,
               ROW_NUMBER() OVER(PARTITION BY submission_date 
                                 ORDER BY num_subs DESC, hacker_id ASC) AS rank
        FROM daily_subs),
    
    daily_winners AS (
        SELECT submission_date, hacker_id
        FROM daily_ranking
        WHERE rank = 1)

-- Main query
SELECT ads.submission_date, num_subs, h.hacker_id, name
FROM num_all_day_subs AS ads
LEFT JOIN daily_winners AS dw
    ON ads.submission_date = dw.submission_date
LEFT JOIN hackers as h
    ON dw.hacker_id = h.hacker_id
ORDER BY ads.submission_date ASC;