Live Analysis · Complete

K9 Resorts Customer
Lifetime Value

ContextPet Daycare & Boarding — Bethlehem, PA
Data3 Years · Gingr PMS Backup
InfrastructureAWS RDS · EC2 · S3
MySQLAWS RDSAWS EC2 AWS S3PythonPandas RFM SegmentationBG/NBD Modeling Cohort AnalysisCLV Modeling Price ElasticityLabor Efficiency

Overview

This is a real-world, end-to-end analytics engagement on three years of operational data from a pet daycare and boarding facility in Bethlehem, PA — a business I help manage. Starting from a raw 2.4GB MySQL backup exported from the Gingr pet business management platform, I built a full cloud data pipeline on AWS, cleaned and imported the data, and ran a complete CLV and customer analytics suite across 2,757 customers and $4.4M in revenue.

2,757 Total Customers
$4.4M Total Revenue (3 Yrs)
$1,606 Avg Historical CLV
Median: $676 — heavy right skew
228 Avg Tenure (days)
Median: 95 days
17.1 Avg Transactions / Customer
$186.89 Avg Transaction Value
43,601 Total Transactions

Infrastructure & Data Pipeline

The entire pipeline runs on AWS. The raw Gingr backup — a 2.4GB MySQL dump with DEFINER clauses incompatible with RDS — was staged in S3, stripped and preprocessed on EC2, and loaded into a MySQL 8.4 database on RDS. The analysis layer connects directly to RDS via Python, ensuring reproducibility and a clean separation between storage, compute, and analytics.

01
Gingr Export → S3
2.4GB raw MySQL dump staged in Amazon S3 for durable, versioned storage
02
EC2 Preprocessing
DEFINER stripping, character set normalization, archive table skip — Python on EC2
03
RDS Import
MySQL 8.4 on RDS · us-east-2 · ~3,950 owners · ~43,601 reservations loaded
04
Analysis Layer
Pandas + Python queries RDS directly · outputs CSVs for modeling and visualization
Engineering Note

The hardest part wasn't the analysis — it was the import. The Gingr dump used MySQL DEFINER syntax that RDS rejects, and a stalled import at the archive tables required writing a selective import script that skipped those tables entirely. The result mirrors a production data warehouse pattern: raw in S3, compute on EC2, persistent analytical store on RDS.


RFM Segmentation & Revenue Concentration

I built an RFM (Recency, Frequency, Monetary) model by scoring each of the 2,757 customers on three dimensions — days since last purchase, total transaction count, and total revenue — and bucketing each into quintiles. Composite RFM scores map to five business segments with distinct behavioral and revenue profiles.

Segment Definitions

SegmentRFM ScoreCustomers% of CustomersRevenue% of RevenueAvg CLVAvg Recency
Champion 13–1558721.3% $2,909,49765.7% $4,95748 days
Loyal 10–1264523.4% $972,75122.0% $1,508197 days
Needs Attention 7–973526.7% $443,42010.0% $603354 days
At Risk 4–663723.1% $103,3082.3% $162594 days
Lost 31535.5% $86~0% $1797 days
Revenue Concentration by Customer Segment
Revenue by segment bar chart

The single most important finding in this analysis: 587 Champions — just 21% of the customer base — generate 65.7% of total revenue. The remaining 2,170 customers split the other 34.3%. Losing one Champion costs roughly as much as acquiring five average customers. Protecting this segment is the highest-ROI activity the business can pursue.

Revenue Concentration — Pareto Analysis
Pareto curve

The Pareto curve confirms extreme revenue concentration. The top 29% of customers by revenue drive 80% of total revenue — well beyond the classic 80/20 rule. This is consistent with premium pet service businesses where a core group of frequent boarding customers accounts for a disproportionate share of annual spend.

CLV Distribution by Segment (capped at $15K)
CLV distribution box plot

Champion CLV variance is enormous — the IQR spans roughly $2,400 to $6,000, with outliers exceeding $15K. This means there are Champions worth 20–30× the median customer. Loyal customers show a tighter distribution centered around $1,200–$1,500, suggesting a more predictable, mid-tier booking pattern. Needs Attention, At Risk, and Lost customers cluster near zero, confirming these segments represent lapsed or single-visit relationships.

Customer Frequency vs. Revenue — RFM Scatter
RFM scatter plot

Champions cluster in the upper-right quadrant — high frequency (many transactions) and high total revenue — with the most extreme outliers reaching $40K+ in lifetime revenue. The Needs Attention (pink) and At Risk (orange) clusters concentrate in the bottom-left, with low transaction counts and low revenue. The visual separation of segments confirms that the RFM scoring is meaningfully discriminating customer behavior.

Single-Pet vs. Multi-Pet Household Analysis
Pet analysis comparison

Multi-pet households (2+ animals) generate 65% more CLV ($2,257 vs. $1,364) and equivalent total revenue advantage. Interestingly, average transaction count is slightly higher for single-pet households (17.6 vs. 15.7), suggesting multi-pet households may book longer stays rather than more frequent visits — a meaningful distinction for package design. Targeting multi-pet households in acquisition and retention campaigns is a clear lever.


Cohort Retention Analysis

I built a monthly cohort table tracking what percentage of customers acquired in each month were still active 1–12 months later. The heatmap covers acquisition cohorts from June 2023 through March 2025, and the retention curve averages across all cohorts to surface the structural churn pattern.

Customer Tenure Distribution
Tenure distribution histogram

Tenure is dramatically right-skewed. The modal customer has tenure under 50 days — meaning the single most common customer profile is someone who visited once or twice and never returned. The mean (228 days) is pulled far above the median (95 days) by a long-tenured tail of loyal repeat customers. This is a churn problem, not a revenue problem. The revenue is there — it's concentrated in a small cohort of long-tenure customers who stayed.

Cohort Retention Heatmap — % Still Active by Month
Cohort retention heatmap

The heatmap reveals two structural retention patterns. Early cohorts (Jun–Aug 2023) show unusually high month-1 retention (81% for Aug 2023 cohort) — these were founding-era customers with novelty-driven repeat visits. Later cohorts stabilize at 30–50% month-1 retention, which is closer to the structural baseline. The dark-red (low retention) dominance across all cohorts after month 3 confirms the churn cliff happens early — within the first quarter of the customer relationship.

Average Customer Retention Curve (Month 0–12)
Retention curve

The average retention curve shows a steep initial drop: 100% at acquisition, 42% at month 1, 33% at month 2, and plateauing around 20–23% from months 6–12. This is a classic "fast churn / stable core" pattern — the majority of customers churn within the first two visits, but those who reach month 6 tend to stay. The implication is clear: the highest-ROI retention intervention is in months 1–3, before the plateau sets in.

Retention Opportunity

If the business could shift average month-1 retention from 42% to 52% — a 10-point improvement — and the retention curve held its shape thereafter, the projected impact on 12-month cumulative retention would be meaningful. A post-first-visit follow-up sequence (SMS + email at day 14, day 30, day 60) targeting the drop-off window directly addresses the observed churn cliff.

New Customer Acquisition by Month
Acquisition trend

Acquisition peaked during the facility's growth phase in late 2023 (August 2023 saw 239 new customers — the single highest month). Since then, monthly acquisition has trended down toward a 50–90 customer baseline, consistent with a maturing local market. The seasonal pattern shows summer months driving acquisition spikes, likely from owners needing boarding for vacation travel. The Feb 2026 acquisition level of ~45/month suggests the business has largely captured its immediate catchment area and growth will require either expanded marketing reach or geographic expansion.


Service Mix & CLV by Category

K9 Resorts offers three primary service lines: boarding (overnight stays priced by room tier), daycare (flat daily rate), and grooming (add-on service). I segmented each customer's primary service category by their highest-revenue service line and compared CLV, total revenue, and tenure across categories.

CLV by Primary Service Category
CLV by service mix

Boarding customers generate the highest average CLV ($2,109) and comparable tenure to daycare customers (282 vs. 287 days). Daycare customers are close behind at $1,692 CLV — and their tenure is effectively identical to boarding, suggesting daycare builds just as durable a relationship. Grooming and Other categories generate a fraction of the CLV ($433–$437) with roughly half the tenure, consistent with their role as single-service or occasional-use customers rather than relationship anchors.

Total Revenue by Service Category (3 Years)
Revenue by service

Boarding dominates total revenue at $0.29M in the analysis window, with daycare contributing $0.03M. Grooming and Other categories are negligible at this scale. This confirms that boarding is the core business — the product with the highest price point, highest CLV, and highest total revenue contribution. Daycare is a meaningful relationship-builder but represents a secondary revenue stream. Any pricing, retention, or capacity strategy should be anchored in the boarding segment.


Price Sensitivity & Rate Analysis

K9 Resorts implemented two rate increases: February 2025 and February 2026. I analyzed whether these increases caused volume loss (price-elastic demand) or were absorbed without behavioral change (price-inelastic). The data covers all four boarding room tiers: Executive Room, Luxury Suite, Double Compartment, and Single Compartment.

Feb 2025 Rate Increase — Before vs. After (3-Month Avg)

Room TypePre RatePost RateRate ChangePre VolumePost VolumeVol Change
Executive Room$76$85+11.5%183/mo213/mo+16.0%
Luxury Suite$85$95+12.1%40/mo45/mo+10.7%
Double Compartment$64$70+10.2%28/mo41/mo+43.5%
Single Compartment$61$65+6.9%24/mo25/mo+5.6%
Nightly Rate by Room Type — With Price Change Events
Nightly rate trend

The Feb 2025 rate increase is clearly visible as a step-change across all four room types. The Luxury Suite shows the largest absolute rate increase (~$10/night) and has maintained its new pricing level consistently through the post-increase period. The Feb 2026 increase is also visible, pushing the Luxury Suite above $100/night for the first time.

Total Boarding Reservation Volume — With Price Change Events
Volume trend

This is the critical chart for pricing strategy: volume did not decline after the Feb 2025 rate increase. In fact, monthly reservations grew in the post-increase period, reaching a new peak of ~450 by mid-2025. This is strong evidence of price-inelastic demand — K9 Resorts' customers are not price-sensitive at these rate levels, likely because premium pet boarding is perceived as a necessity rather than a luxury for this customer base.

Feb 2025 Price Increase — Before vs. After Detail
Before vs after comparison

The before/after comparison across all three metrics — nightly rate, reservation volume, and revenue per stay — is uniformly positive. Every room type shows rate increases of 7–12%, volume increases of 6–44%, and stay revenue increases of 6–13%. The combination of higher prices and higher volume means revenue per stay grew on every tier. The Double Compartment's 43.5% volume increase is likely capacity-driven rather than purely price-driven, but the absence of any volume decline across any tier is the key takeaway.

Pricing Conclusion

Demand for K9 Resorts boarding is price-inelastic at current rate levels. Both the Feb 2025 and Feb 2026 increases were absorbed without volume loss. This validates a continued premium pricing strategy and suggests room for further rate increases — particularly in the Luxury Suite tier, which has the most price-sensitive customer profile but maintained volume through both increases.


Labor Efficiency Analysis

I joined 31 months of ADP payroll data to monthly revenue to build a labor efficiency model. The key metrics are labor cost as a percentage of revenue, revenue generated per labor dollar spent, and revenue per labor hour — all of which proxy for operational leverage.

43.9% Avg Labor % of Revenue
Range: 23.8% – 96.7%
$2.59 Revenue per Labor Dollar
Avg across 31 months
$47.30 Revenue per Labor Hour
Based on ADP hours data
Labor Cost vs. Revenue — 31 Months

Labor as % of revenue varies dramatically month to month — from a best-case 23.8% to a worst-case 96.7%. This volatility is driven by seasonality: low-revenue winter months maintain fixed staffing, compressing margins. High-revenue summer months with peak boarding demand spread the same labor base across more revenue, improving efficiency. The goal is to minimize labor% in low-season months through scheduling optimization and shift toward more variable labor structures (part-time, on-call) during trough periods.

Labor Risk

The worst-case month (96.7% labor-to-revenue) is essentially a break-even month after accounting only for labor. With fixed overhead (rent, utilities, franchise fees, debt service) on top, months below ~40% revenue capacity are likely operating at a loss. Building a labor scheduling model that flags when projected revenue drops below the break-even threshold — and triggers proactive hour-reduction — is the highest-impact operational improvement available.


BG/NBD Predictive Modeling

The BG/NBD (Beta Geometric / Negative Binomial Distribution) model estimates the probability that each customer is still "alive" (has not permanently churned) and predicts their expected purchases over the next 12 months. Unlike the historical CLV model, BG/NBD is forward-looking — it tells us who to focus retention resources on today.

1,102 Customers Scored
Active-enough for BG/NBD input
168 Prob Alive > 50%
15.2% of scored customers
$165 Avg Predicted CLV / Year
Across all scored customers

The model scores each customer on probability of still being active, expected purchase frequency over the next year, and expected revenue. The 168 customers with probability-alive above 50% represent the core active base — the group most likely to generate near-term revenue and most worth protecting with proactive outreach.

Top 10 Customers by Predicted 1-Year CLV
CustomerProb AlivePredicted Purchases / YrPredicted CLV / YrHistorical Revenue
Courtney85.6%19.0$5,821$1,548
Jenene78.9%3.1$4,961$2,960
Rae85.8%19.5$3,968$374
Morgan88.1%4.3$3,604$2,537
Pat83.3%6.4$3,462$2,269
Marilyn90.0%36.3$3,308$670
Jane56.6%4.2$3,278$4,907
Jim90.5%11.9$2,817$989
Denise93.8%9.5$2,728$3,364
Jeffrey76.6%11.8$2,618$612

Several customers (Courtney, Rae, Marilyn) have high predicted future CLV despite relatively low historical revenue — the model detects high-frequency, recent visit patterns that signal strong forward purchase intent. Jane has the opposite profile: high historical revenue but only 56.6% probability alive, suggesting a customer who may be drifting. These asymmetries between historical and predicted CLV are exactly what makes forward-looking modeling valuable for prioritizing retention outreach.


Strategic Recommendations

Across all analyses — RFM segmentation, cohort retention, service mix, pricing, labor efficiency, and BG/NBD modeling — four high-priority recommendations emerge for the business.

01 · High Priority
Protect the Champions
587 customers = 65.7% of revenue. These are the business. Build a VIP tier with early booking windows, dedicated check-in, proactive scheduling outreach before their dogs' typical travel dates, and loyalty recognition. The cost of losing one Champion — avg $4,957 CLV — exceeds the cost of retaining 10 average customers. Use the BG/NBD "prob alive" scores to flag Champions showing early churn signals before they're gone.
02 · High Priority
Win Back Needs Attention
735 customers with avg recency of 354 days and $603 avg CLV represent the largest winback opportunity. Converting just 20% (147 customers) back to Loyal status preserves ~$150K in annual revenue. Run a personalized reactivation campaign — "We miss [dog name]" — timed to the anniversary of their last visit. The first return visit is the conversion event; the offer should remove friction (package deal, complimentary assessment), not just discount price.
03 · Medium Priority
Target Multi-Pet Households
Multi-pet households generate 65% more CLV ($2,257 vs. $1,364). Adjust paid acquisition targeting to emphasize households with 2+ pets — Google/Meta audience filters, vet office partnerships, local dog park flyering. Design a multi-pet referral incentive where the refer-a-friend discount scales with the referred household's pet count. The 20% multi-dog boarding discount already exists; make it more visible in acquisition channels.
04 · Medium Priority
Build a CLV Monitoring Dashboard
The analysis is a snapshot; the value is in continuous tracking. Instrument Gingr with monthly RFM re-scoring: automate the customer_base query → score → segment pipeline to run on the 1st of each month. Build an alert when a Champion's recency exceeds 60 days or their RFM score drops below 13. Intervene before churn — not after. The data infrastructure (RDS → Python → CSV) already exists; this is an automation task, not a new analytical build.
← back to all projects