Week 7 - Valentine’s Day Consumer Data
ns tidy-tuesdays.2024.week07.valentines
(:require
(:as ht]
[aerial.hanami.templates :as str]
[clojure.string :as hanami]
[scicloj.noj.v1.vis.hanami :as tc]
[tablecloth.api :as kind])) [scicloj.kindly.v4.kind
Introduction
This week’s datasets are related to consumer spending during valentines day from the National Retail Federation in the United States. There are three datasets, loaded below.
I discovered later that the csv files contain a BOM, so the clean-keyword
function removes this.
defn clean-keyword [keyword]
(keyword #"[\"\" \p{C}]" "")) (str/replace
def historical_spending (-> "data/2024/week07/historical_spending.csv"
(:key-fn (comp keyword clean-keyword)}))) (tc/dataset {
def gifts_age (-> "data/2024/week07/gifts_age.csv"
(:key-fn (comp keyword clean-keyword)}))) (tc/dataset {
def gifts_gender (-> "data/2024/week07/gifts_gender.csv"
(:key-fn (comp keyword clean-keyword)}))) (tc/dataset {
Spending Trends
Let’s start with something very basic - the trend for average spending per year.
-> historical_spending
(
(hanami/plot ht/line-chart:TITLE "Avg Spend Per Year"
{:Y :PerPerson :YTYPE :quantitative
:X :Year :XTYPE :temporal
:SIZE 3}))
A dip in 2021, presumably due to the pandemic.
Next, the average spend per item type.
First let’s restructure the data from a format that looks like {:Year 2020 :Candy 23 :Flowers 56 ...}
to a format like this: [{:year 2020 :type "Candy" :value 23} ...]
def restructured-historical
(reduce (fn [acc row]
(let [year (:Year row)
(vals (dissoc row :Year)]
into acc
(for [v vals
(:let [[type val] v]]
:year year
{:type (name type)
:value val}))))
[]-> historical_spending
(:PerPerson :PercentCelebrating])
(tc/drop-columns [:as-maps)))) (tc/rows
-> restructured-historical
(
(hanami/plot ht/line-chart:X :year :XTYPE :ordinal
{:Y :value :YTYPE :quantitative :YTITLE "Avg Spend"
:COLOR {:field :type :type :nominal :sort "-y" :title "Item"}
:SIZE 3
:WIDTH 500}))
Let’s rank the items by average spend over the period:
def items-avg-spend
(-> restructured-historical
(
(tc/dataset):type)
(tc/group-by :avg-spend #(/ (reduce + (% :value)) (count (% :value)))})
(tc/aggregate {:Item}))) (tc/rename-columns {:$group-name
-> items-avg-spend
(
(hanami/plot ht/bar-chart:X :Item :XTYPE :nominal :XSORT "-y"
{:Y :avg-spend :YTPE :quantitative}))
Gender
(kind/table gifts_gender)
:Gender | :SpendingCelebrating | :Candy | :Flowers | :Jewelry | :GreetingCards | :EveningOut | :Clothing | :GiftCards |
---|---|---|---|---|---|---|---|---|
Men | 27 | 52 | 56 | 30 | 37 | 33 | 20 | 18 |
Women | 27 | 59 | 19 | 14 | 43 | 29 | 24 | 24 |
defn items-split [data color]
(reduce (fn [result entry]
(let [colr (color entry)
(keys (dissoc entry color))]
ks (conj result
(for [k ks]
(
{color colr:Percent (k entry)
:Item (name k)}))))
[] data))
(kind/vega"https://vega.github.io/schema/vega-lite/v5.json"
{:$schema :data {:values
reduce concat
(
(items-split-> gifts_gender
(:SpendingCelebrating)
(tc/drop-columns :as-maps))
(tc/rows :Gender))}
:mark :bar
:width 500
:height 400
:encoding {:x {:field :Item :sort "-y"
:axis {:labelAngle -45}}
:y {:field :Percent :type :quantitative}
:xOffset {:field :Gender}
:color {:field :Gender :scale {:scheme "category20"}}}
:title {:text "Valentine's Day"
:subtitle "Spending Trends By Gender"}})
Age
(kind/mdlet [most-likely-to-celebrate (-> gifts_age
(:SpendingCelebrating :desc)
(tc/order-by :Age
first)
-> gifts_age
least-likely-to-celebrate (:SpendingCelebrating)
(tc/order-by :Age
first)]
str "According to the data, the **"
(
most-likely-to-celebrate"** age group were the most likely to celebrate Valentine's day, while the **"
least-likely-to-celebrate"** age group were the least likely.")))
According to the data, the 18-24 age group were the most likely to celebrate Valentine’s day, while the 65+ age group were the least likely.
(kind/vega"https://vega.github.io/schema/vega-lite/v5.json"
{:$schema :data {:values
reduce concat
(
(items-split-> gifts_age
(:SpendingCelebrating)
(tc/drop-columns :as-maps))
(tc/rows :Age))}
:mark :bar
:width 500
:height 400
:encoding {:x {:field :Item :sort "-y"
:axis {:labelAngle -45}}
:y {:field :Percent :type :quantitative}
:xOffset {:field :Age}
:color {:field :Age :scale {:scheme "blues"}}}
:title {:text "Valentine's Day"
:subtitle "Spending Trends By Age Group"}})
As we can see, the most popular item was Candy, most likely to be bought by an 18-24 year old.
Combining some of the data
So far, we know that Jewelry is the best in terms of the level of spending, but Candy is the most popular choice (at a lower cost). Let’s imagine we are doing market research, which is the best item to invest in (combining both of these datapoints)? First, let’s get the average popularity of an item across the age groups. I will also multiply the items by the ‘SpendingCelebrating’ column, to get a ‘weighted’ average. (I’m making this up as I go along, so not 100% sure on the methodology here)
def valentines-items (-> gifts_age
(:Age :SpendingCelebrating])
(tc/drop-columns [keys))
defn average [coll]
(float
(/ (reduce + coll) (count coll)))) (
def item-average-popularity
(
(tc/datasetfor [item valentines-items]
(let [weighted-avgs
(-> gifts_age
(:target
(tc/map-columns :SpendingCelebrating item]
[* %1 %2))
#(:target)]
:Item (name item)
{:Average-popularity
/
(
(average weighted-avgs)100)}))))
Next, let’s join with the spending dataset…
(kind/table-> item-average-popularity
(:Item))) (tc/inner-join items-avg-spend
:Item | :Average-popularity | :avg-spend |
---|---|---|
Candy | 17.39 | 12.83769230769231 |
Flowers | 11.975 | 14.653076923076924 |
Jewelry | 7.766666870117188 | 32.54615384615384 |
GreetingCards | 10.748333740234376 | 7.676153846153847 |
EveningOut | 9.873333129882813 | 27.46769230769231 |
Clothing | 7.4116668701171875 | 14.935384615384619 |
GiftCards | 6.12 | 11.503076923076922 |
Now, let’s calculate a ‘score’ (popularity * cost), and sort the items by the highest score:
(kind/table-> item-average-popularity
(:Item)
(tc/inner-join items-avg-spend :score [:Average-popularity :avg-spend]
(tc/map-columns int (* %1 %2))) ;; Rounding these for aesthetic purposes in the table.
#(:score :desc))) (tc/order-by
:Item | :Average-popularity | :avg-spend | :score |
---|---|---|---|
EveningOut | 9.873333129882813 | 27.46769230769231 | 271 |
Jewelry | 7.766666870117188 | 32.54615384615384 | 252 |
Candy | 17.39 | 12.83769230769231 | 223 |
Flowers | 11.975 | 14.653076923076924 | 175 |
Clothing | 7.4116668701171875 | 14.935384615384619 | 110 |
GreetingCards | 10.748333740234376 | 7.676153846153847 | 82 |
GiftCards | 6.12 | 11.503076923076922 | 70 |
Evening Out is the winner!🎉