Filter according to cumulative value
Task:Find the top n customers whose sales accounts for half of the total sales, and rank them by sales in descending order.
Python
1 | import pandas as pd |
2 | sale_file = "E:\\txt\\sales_volume.csv" |
3 | sale_info = pd.read_csv(sale_file) |
4 | sale_info.sort_values(by='Amount',inplace=True,ascending=False) |
5 | half_amount = sale_info['Amount'].sum()/2 |
6 | vip_list = [] |
7 | amount = 0 |
8 | for client_info in sale_info.itertuples(): |
9 | amount += getattr(client_info, 'Amount') |
10 | if amount < half_amount: |
11 | vip_list.append(getattr(client_info, 'Client')) |
12 | else: |
13 | vip_list.append(getattr(client_info, 'Client')) |
14 | break |
15 | print(vip_list) |
Pandas does not have an existing loop function to calculate the position that meets the conditions, so it can only use for loop to complete the calculation.
esProc
A | ||
1 | E:\\txt\\sales_volume.csv | Data storage path |
2 | =file(A1).import@tc().sort@z(Amount) | Import data and sort by amount in descending order |
3 | =A2.sum(Amount)/2 | Calculate half of total sales |
4 | =A2.pselect(cum(Amount)>=A3) | Find the position where the cumulative sales is more than half of total sales |
5 | =A2.(Client).m(:A4) | Take the client names before the position |
It is recommended to use loop function to calculate in esProc, and filter and record the position information according to the conditions.