Coding interview (StrataScratch)

Table of contents

  1. Coding Questions
    1. Easy
      1. Python
        1. Amazon - Return unique list of unique id of April or May
        2. Spotify - Aggregate Listening Data (ID: 10367)
  2. Non-coding Questions (System Designs, Modeling, Statistics…)

Coding Questions

Easy

Python

Amazon - Return unique list of unique id of April or May

transaction_idsignup_idtransaction_start_dateamt
intintdatetimefloat

Question:

Solution:

transactions["signup_id"][transactions["transaction_start_date"].dt.month.isin([4,5])].unique()

'''
Final returned Panda Series has NO column name
- transaction_start_date contains YYYY-MM-dd -> we only care April and May, so use .isin([a,b]) -> returns series of True False in the second []
- signup_id: is the target final column to be returned 
- unique(): returns only unique value + reset column name

'''

Spotify - Aggregate Listening Data (ID: 10367)

user_idsong_idlisten_duration
intintfloat

Question: Capture spotify user habits

  • Per user, find the total listening time and the count of unqiue songs they listen to
  • return total listening time as min not seconds
  • return columns should be: user_id, total_listen_duration, unique_song_count

Solution:

  • First set all missing listening duration to 0 (aka: replace NaN values -> 0)
  • set all exisiting listening duration to nearest min
  • groupby user_id and reset_index so it becomes the main column

# Replace NaN values -> 0:
listening_habits['listen_duration'].fillna(0,inplace=True)

# groupby user_id and set total_listen_duration from sec -> mins
df = listning_habits.groupby('user_id').agg(
    total_listen_duration=('listen_duration', 'sum'),
    unique_song_count=('song_id', 'nunique')
).reset_index()

df['total_listen_duration'] = df['total_listen_duration'].apply(lambda x: round(x/60)

df

'''
Final returned Panda Series has NO column name
- transaction_start_date contains YYYY-MM-dd -> we only care April and May, so use .isin([a,b]) -> returns series of True False in the second []
- signup_id: is the target final column to be returned 
- unique(): returns only unique value + reset column name

'''

Non-coding Questions (System Designs, Modeling, Statistics…)


Table of contents