sql - Selecting a subset of rows that exceed a percentage of total values -


i have table customers, users , revenue similar below (in reality thousands of records):

customer   user    revenue 001        james   500 002        james   750 003        james   450 004        sarah   100 005        sarah   500 006        sarah   150 007        sarah   600 008        james   150 009        james   100 

what want return highest spending customers make 80% of total revenue user.

to manually order james' customers revenue, work out percentage of total , running total percentage, return records point running total hits 80%:

customer    user    revenue     % of total  running total % 002         james   750         0.38        0.38  001         james   500         0.26        0.64  003         james   450         0.23        0.87  <- greater 80%, last record 008         james   150         0.08        0.95  009         james   100         0.05        1.00  

i've tried using cte far have come blank. there way via single query rather manually in excel sheet?

sql server 2012+ only

you use windowed sum:

with cte (    select *,           1.0 * revenue/sum(revenue) over(partition [user]) percentile,           1.0 * sum(revenue) over(partition [user] order [revenue] desc)                 /sum(revenue) over(partition [user]) running_percentile    tab ) select * cte  running_percentile <= 0.8; 

livedemo


sql server 2008:

with cte (     select *, row_number() over(partition [user] order revenue desc) rn     t     ), cte2 (     select c.customer, c.[user], c.[revenue]            ,percentile         = 1.0 * revenue / nullif(c3.s,0)            ,running_percentile = 1.0 * c2.s    / nullif(c3.s,0)     cte c     cross apply          (select sum(revenue) s           cte c2           c.[user] = c2.[user]             , c2.rn <= c.rn) c2     cross apply          (select sum(revenue) s           cte c2           c.[user] = c2.[user]) c3 )  select * cte2 running_percentile <= 0.8; 

livedemo2

output:

╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗ ║ customer ║ user  ║ revenue ║   percentile   ║ running_percentile ║ ╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣ ║        2 ║ james ║     750 ║ 0,384615384615 ║ 0,384615384615     ║ ║        1 ║ james ║     500 ║ 0,256410256410 ║ 0,641025641025     ║ ║        7 ║ sarah ║     600 ║ 0,444444444444 ║ 0,444444444444     ║ ╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝ 

edit 2:

that looks there, niggle it's missing last row, third row james takes him on 0.80 needs included.

with cte (     select *, row_number() over(partition [user] order revenue desc) rn     t     ), cte2 (     select c.customer, c.[user], c.[revenue]            ,percentile         = 1.0 * revenue / nullif(c3.s,0)            ,running_percentile = 1.0 * c2.s    / nullif(c3.s,0)     cte c     cross apply          (select sum(revenue) s           cte c2           c.[user] = c2.[user]             , c2.rn <= c.rn) c2     cross apply          (select sum(revenue) s           cte c2           c.[user] = c2.[user]) c3 )  select a.* cte2 cross apply (select min(running_percentile) rp              cte2              running_percentile >= 0.8                , cte2.[user] = a.[user]) s a.running_percentile <= s.rp; 

livedemo3

output:

╔══════════╦═══════╦═════════╦════════════════╦════════════════════╗ ║ customer ║ user  ║ revenue ║   percentile   ║ running_percentile ║ ╠══════════╬═══════╬═════════╬════════════════╬════════════════════╣ ║        2 ║ james ║     750 ║ 0,384615384615 ║ 0,384615384615     ║ ║        1 ║ james ║     500 ║ 0,256410256410 ║ 0,641025641025     ║ ║        3 ║ james ║     450 ║ 0,230769230769 ║ 0,871794871794     ║ ║        7 ║ sarah ║     600 ║ 0,444444444444 ║ 0,444444444444     ║ ║        5 ║ sarah ║     500 ║ 0,370370370370 ║ 0,814814814814     ║ ╚══════════╩═══════╩═════════╩════════════════╩════════════════════╝ 

looks perfect, translated big table , returns need, spent 5 minutes working through , still can't follow you've done!

sql server 2008 not support in over() clause, row_number does.

first cte calculate position within group:

╔═══════════╦════════╦══════════╦════╗ ║ customer  ║ user   ║ revenue  ║ rn ║ ╠═══════════╬════════╬══════════╬════╣ ║        2  ║ james  ║     750  ║  1 ║ ║        1  ║ james  ║     500  ║  2 ║ ║        3  ║ james  ║     450  ║  3 ║ ║        8  ║ james  ║     150  ║  4 ║ ║        9  ║ james  ║     100  ║  5 ║ ║        7  ║ sarah  ║     600  ║  1 ║ ║        5  ║ sarah  ║     500  ║  2 ║ ║        6  ║ sarah  ║     150  ║  3 ║ ║        4  ║ sarah  ║     100  ║  4 ║ ╚═══════════╩════════╩══════════╩════╝ 

second cte:

  • c2 subquery calculate running total based on rank row_number
  • c3 calculate full sum per user

in final query s subquery finds lowest running total exceeds 80%.

edit 3:

using row_number redundant.

with cte (     select c.customer, c.[user], c.[revenue]            ,percentile         = 1.0 * revenue / nullif(c3.s,0)            ,running_percentile = 1.0 * c2.s    / nullif(c3.s,0)     t c     cross apply          (select sum(revenue) s           t c2           c.[user] = c2.[user]             , c2.revenue >= c.revenue) c2     cross apply          (select sum(revenue) s           t c2           c.[user] = c2.[user]) c3 )  select a.* cte cross apply (select min(running_percentile) rp              cte c2              running_percentile >= 0.8                , c2.[user] = a.[user]) s a.running_percentile <= s.rp order [user], revenue desc; 

livedemo4


Comments