Binary file, simple format, no need to define data structure
Application scenarios
- Mediation for data dumping
- Temporary storage
- Storage of small data in calculations (such as dimension tables)
select id, name from T where id = 1
select area,sum(amount) from T group by area
select max(consecutive_day) from ( select count(*) consecutive_day from ( select sum(rise_or_fall) over(order by trade_date) day_no_gain from ( select trade_date, case when close_price > lag(close_price) over( order by trade_date) then 0 else 1 end rise_or_fall from stock_price ) ) group by day_no_gain )
with ranked_prices as ( select closing,lag(closing) over (order by closing) as previous_closing, row_number() over (order by closing desc) as rank from share62) select closing as closing_price,closing / previous_closing - 1 as growth_rate from ranked_prices where rank <= 3 order by rank
with e1 as (select uid, 1 as step1, min(etime) as t1 from event where etime >= to_date('2021-01-10') and etime < to_date('2021-01-25') and eventtype = 'eventtype1' and … group by 1), e2 as (select uid, 1 as step2, min(e1.t1) as t1, min(e2.etime) as t2 from event as e2 inner join e1 on e2.uid = e1.uid where e2.etime >= to_date('2021-01-10') and e2.etime < to_date('2021-01-25') and e2.etime > t1 and e2.etime < t1 + 7 and eventtype = 'eventtype2' and … group by 1), e3 as (select uid, 1 as step3, min(e2.t1) as t1, min(e3.etime) as t3 from event as e3 inner join e2 on e3.uid = e2.uid where e3.etime >= to_date('2021-01-10') and e3.etime < to_date('2021-01-25') and e3.etime > t2 and e3.etime < t1 + 7 and eventtype = 'eventtype3' and … group by 1) select sum(step1) as step1, sum(step2) as step2, sum(step3) as step3 from e1 left join e2 on e1.uid = e2.uid left join e3 on e2.uid = e3.uid
select top 10 x from T order by x desc
select * from ( select *, row_number() over (partition by Area order by Amount desc) rn from Orders )
sort_pos=(-stock_62["CLOSING"]).argsort() max3pos=sort_pos.iloc[:3] stock_62s=stock_62.shift(1) max3CL=stock_62["CLOSING"].iloc[max3pos] max3CLs=stock_62s["CLOSING"].iloc[max3pos] max3_rate=max3CL/max3CLs-1 print(max3_rate)
max_days = 0 current_days = 1 for i in range(1, len(stock_prices)): if stock_prices[i] > stock_prices[i - 1]: current_days += 1 else: max_days = max(max_days, current_days) current_days = 1 max_days = max(max_days, current_days)
with ranked_prices as ( select closing,lag(closing) over (order by closing) as previous_closing, row_number() over (order by closing desc) as rank from share62) select closing as closing_price,closing / previous_closing - 1 as growth_rate from ranked_prices where rank <= 3 order by rank
sort_pos=(-stock_62["CLOSING"]).argsort() max3pos=sort_pos.iloc[:3] stock_62s=stock_62.shift(1) max3CL=stock_62["CLOSING"].iloc[max3pos] max3CLs=stock_62s["CLOSING"].iloc[max3pos] max3_rate=max3CL/max3CLs-1 print(max3_rate)
A | |
1 | =share62.psort@z(CLOSING) |
2 | =A1.m(:3) |
3 | =share62.calc(A2,if(#==1,null,CLOSING/CLOSING[-1]-1)) |
select max(consecutive_day) from ( select count(*) consecutive_day from ( select sum(rise_or_fall) over(order by trade_date) day_no_gain from ( select trade_date, case when close_price > lag(close_price) over(order by trade_date) then 0 else 1 end rise_or_fall from stock_price ) ) group by day_no_gain )
max_days = 0 current_days = 1 for i in range(1, len(stock_prices)): if stock_prices[i] > stock_prices[i - 1]: current_days += 1 else: max_days = max(max_days, current_days) current_days = 1 max_days = max(max_days, current_days)
A | |
1 | =stock.sort(tradeDate) |
2 | =0 |
3 | =A1.max(A2=if(closePrice>closePrice[-1],A2+1,0)) |
with e1 as (select uid, 1 as step1, min(etime) as t1 from event where etime >= to_date('2021-01-10') and etime < to_date('2021-01-25') and eventtype = 'eventtype1' and … group by 1), e2 as (select uid, 1 as step2, min(e1.t1) as t1, min(e2.etime) as t2 from event as e2 inner join e1 on e2.uid = e1.uid where e2.etime >= to_date('2021-01-10') and e2.etime < to_date('2021-01-25') and e2.etime > t1 and e2.etime < t1 + 7 and eventtype = 'eventtype2' and … group by 1), e3 as (select uid, 1 as step3, min(e2.t1) as t1, min(e3.etime) as t3 from event as e3 inner join e2 on e3.uid = e2.uid where e3.etime >= to_date('2021-01-10') and e3.etime < to_date('2021-01-25') and e3.etime > t2 and e3.etime < t1 + 7 and eventtype = 'eventtype3' and … group by 1) select sum(step1) as step1, sum(step2) as step2, sum(step3) as step3 from e1 left join e2 on e1.uid = e2.uid left join e3 on e2.uid = e3.uid
A | |
1 | =["etype1","etype2","etype3"] |
2 | =file("event.ctx").open() |
3 | =A2.cursor(id,etime,etype;etime>=date("2021-01-10") && etime < date("2021-01-25") && A1.contain(etype) && …) |
4 | =A3.group(uid).(~.sort(etime)) |
5 | =A4.new(~.select@1(etype==A1(1)):first,~:all).select(first) |
6 | =A5.(A1.(t=if(#==1,t1=first.etime,if(t,all.select@1(etype==A1.~ && etime>t && etime < t1+7).etime, null)))) |
7 | =A6.groups(;count(~(1)):STEP1,count(~(2)):STEP2,count(~(3)):STEP3) |
A | |
1 | =d:\normalData.btx |
2 | =file(A1).import@b() |
3 | =A2.groups(state;count(custid):count) |
A | |
1 | ==d:\bigData.ctx |
2 | =file(A1).open().cursor@m() |
3 | =A2.groups@o(cust;sum(amount):amount) |
A | |
1 | =d:\smallData.txt |
2 | =file(A1).import@t() |
3 | =A2.groups(state;sum(amount):amount) |
A | |
1 | =d:\bigData.txt |
2 | =file(A1).cursor@t() |
3 | =A2.groups(state;sum(amount):amount) |
A | |
1 | =d:\bigData.txt |
2 | =file(A1).cursor@t() |
3 | =A2.groups(state;sum(amount):amount) |
A | |
1 | =d:\bigData.txt |
2 | =file(A1).cursor@tm() |
3 | =A2.groups(state;sum(amount):amount) |
select top 10 x from T order by x desc
select * from ( select *, row_number() over (partition by Area order by Amount desc) rn from Orders )
Orders.groups(;top(10;-Amount))
Orders.groups(Area;top(10;-Amount))
Binary file, simple format, no need to define data structure
Mixed row-wise and columnar storage, supports indexing, requires pre-defined structure
There is no ready-made method to implement this task
lst = [7,5,5,7,13,15,16,17,13,5,6,7,8,17,6,10,11,11,18,15] diffs = [j-i for i, j in enumerate(lst)] groups = pd.Series(diffs).diff().ne(0).cumsum() g_list = pd.Series(lst).groupby(groups).agg(list) result = g_list[g_list.apply(lambda x: len(x) > \ 1)].tolist()
A | |
1 | [7,5,5,7,13,15,16,17,13,5,6,7,8,17,6,10,11,11,18,15] |
2 | =A1.group@o(~-#).select(~.len()>1) |
SELECT d.DEPT, COUNT(GENDER) AS MaleCount FROM ( SELECT COLUMN_VALUE AS DEPT FROM TABLE(SYS.ODCIVARCHAR2LIST( 'Administration', 'HR', 'Marketing', 'Sales')) ) d LEFT JOIN EMPLOYEE e ON d.DEPT = e.DEPT AND e.GENDER = 'M' GROUP BY d.DEPT;
There is no specialized method, it needs to be hard coded
A | |
1 | =file("EMPLOYEE.csv").import@tc().select(GENDER=="M") |
2 | [Administration, HR, Marketing, Sales] |
3 | =A1.align@a(A2,DEPT).new(A2(#):DEPT,~.len():NUM) |
WITH sales AS ( SELECT COLUMN_VALUE AS sales_amount, ROWNUM AS month_number FROM TABLE(SYS.ODCINUMBERLIST( 123,345,321,345,546,542,874,234,543,983,434,897)) ), lagged_sales AS ( SELECT sales_amount, month_number, LAG(sales_amount) OVER (ORDER BY NULL) AS prev_sales_amount FROM sales ) SELECT MAX(sales_amount - prev_sales_amount) AS max_monthly_growth FROM lagged_sales
sales = [123,345,321,345,546,542,874,234,543,983,434,897] month = [i for i in range(1,13)] df = pd.DataFrame({'month':month,'sales':sales}) max_growth = df['sales'] - df['sales'].shift(1).max()
A | |
1 | [123,345,321,345,546,542,874,234,543,983,434,897] |
2 | =A1.(if(#>1,~-~[-1],0)).max() |
SELECT rn FROM ( SELECT EMPLOYEE.*, ROWNUM AS rn FROM EMPLOYEE ) WHERE SALARY = (SELECT MAX(SALARY) FROM EMPLOYEE)
df = pd.read_csv('../EMPLOYEE.csv’) highest_salary_employee_index_all = \ np.where(df['SALARY'] == df['SALARY'].max())[0]
A | |
1 | =file("EMPLOYEE.csv").import@tc() |
2 | =A1.pmax@a(SALARY) |
SELECT closing/closing_pre-1 AS raise FROM( SELECT dt,closing,ROWNUM AS rn, MAX(closing) OVER (ORDER BY dt ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS max_pre, MAX(closing) OVER (ORDER BY dt ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING) AS max_suf, LAG(closing,1) OVER (ORDER BY dt) AS closing_pre FROM stock ) WHERE rn>5 AND rn<=(SELECT count(*) FROM stock)-5 AND CLOSING>max_pre AND CLOSING>max_suf
import pandas as pd stock_price_df = pd.read_csv('../STOCK.csv') price_increase_list = [] for i in range(5, len(stock_price_df)-5): if stock_price_df['CLOSING'][i] > max(stock_price_df['CLOSING'][i-5:i]) and \ stock_price_df['CLOSING'][i] > max(stock_price_df['CLOSING'][i+1:i+6]): price_increase = stock_price_df['CLOSING'][i] / \ stock_price_df['CLOSING'][i-1]-1 price_increase_list.append(price_increase) result = price_increase_list
A | |
1 | =file("STOCK.csv").import@tc() |
2 | =lth=A1.len(),A1.pselect@a(#>4&<=A1.len()-4&&CLOSING>max(CLOSING[-4:-1])&&CLOSING>max(CLOSING[1:4])) |
3 | =A1.calc(A2,CLOSING/CLOSING[-1]-1) |
WITH first_login AS ( SELECT userid, MIN(TRUNC(ts)) AS first_login_date FROM login_data GROUP BY userid ), next_day_login AS ( SELECT DISTINCT(fl.userid), fl.first_login_date, TRUNC(ld.ts) AS next_day_login_date FROM first_login fl LEFT JOIN login_data ld ON fl.userid = ld.userid WHERE TRUNC(ld.ts) = fl.first_login_date + 1), day_new_users AS( SELECT first_login_date,COUNT(*) AS new_user_num FROM first_login GROUP BY first_login_date), next_new_users AS( SELECT next_day_login_date, COUNT(*) AS next_user_num FROM next_day_login GROUP BY next_day_login_date), all_date AS( SELECT DISTINCT(TRUNC(ts)) AS login_date FROM login_data) SELECT all_date.login_date+1 AS dt, dn.new_user_num, nn.next_user_num, (CASE WHEN nn. next_day_login_date IS NULL THEN 0 ELSE nn.next_user_num END)/dn.new_user_num AS ret_rate FROM all_date JOIN day_new_users dn ON all_date.login_date=dn.first_login_date LEFT JOIN next_new_users nn ON dn.first_login_date+1=nn. next_day_login_date ORDER BY all_date.login_date
import pandas as pd df = pd.read_csv("../login_data.csv") df["ts"] = pd.to_datetime(df["ts"]).dt.date gp = df.groupby('userid') row = [] for uid,g in gp: fst_dt = g.iloc[0].ts sec_dt = fst_dt + pd.Timedelta(days=1) all_dt = g.ts.values wether_sec_login = sec_dt in all_dt row.append([uid,fst_dt,sec_dt,wether_sec_login]) user_wether_ret_df = pd.DataFrame(row,columns=['userid','fst_dt','sec_dt','wether_sec_login']) result = user_wether_ret_df.groupby('sec_dt').apply(lambda x:x['wether_sec_login'].sum()/len(x))
A | |
1 | =file("login_data.csv").import@tc() |
2 | =A1.group(userid;fst=date(ts):fst_login,~.(date(ts)).pos(fst+1)>0:wether_sec_login) |
3 | =A2.groups(fst_login+1:dt;count(wether_sec_login)/count(1):ret_rate) |
esProc is based on a brand-new computing model, no other open source technology can be referenced, and the code all self-developed
SPL has very different theoretical basis from SQL, and can not base on any database technology. It is impossible to implement calculation of SPL using SQL.
Yes, Of course! esProc supports almost all of the common data sources in the industry and can work directly through the interfaces and syntax of the data sources themselves, without the need to map the data sources to relational data tables.
However, esProc can not guarantee high performance in this situation due to the inefficient I/O of database, and database can hardly provide storage schema which is necessary for low complexity algorithm. For high performance, it is recommended to use esProc's own format files for storage.
esProc does not own the data and, in principle, is not responsible for data storage, and any accessible data can be calculated.
In particular, esProc has excellent support for files. Data files can be stored in any file system, including NFS and object storage on the cloud, which naturally implements the separation between computation and storage.
SPL supports common regression fitting algorithms and aggregation algorithms such as SVM and K-means, providing basic vector matrix operation capabilities.
SPL lacks complex machine learning models such as XGBT, which is a little bit far from Python in this regard.
If complex machine learning and deep learning models are needed, SPL can call YModel to build complex models and perform prediction. YModel is an automatic modeling software that can automatically preprocess and build models with just one click.
The SPL syntax is very easy, and those with a Java/SQL foundation can get started in just a few hours and become proficient in a few weeks.
SPL is specifically designed for low code and high performance, has more universal data objects and operations than SQL and Python, is more in line with human natural thinking. Solving problems is very direct, and there is no need to go in a convoluted way. SPL code is usually more concise than SQL and Python.
To develope high-performance code using SPL, it is necessary to learn more algorithm knowledge. SPL provides a high-performance function library and encapsulates algorithms. Once learned, many high-performance tasks become routine.
Concise code that conforms to natural thinking
High development efficiency
Professional IDE is perfect for exploration and analysis
Convenient to debug
Lightweight and portable for desktop computing
Easy data movement
Built-in big data and parallel computing syntax
High performance algorithm
Integration into enterprise applications directly
Rich data sources