» A must-have tool for data scientists - esProc SPL Official Website

01Common tools and their problems

Exploring data using SQL looks simple

select id, name from T where id = 1

select area,sum(amount) from T group by area

Almost all data scientists can use SQL for exploration and analysis. SQL looks very simple and has a certain degree of interactivity, and using it for data exploration and analysis seems very good.

Facing complex problems, the difficulty of writing SQL increases sharply

Longest consecutive rising days of a stock

select max(consecutive_day)  from (
    select count(*) consecutive_day  from (
        select sum(rise_or_fall) over(order by trade_date) day_no_gain  from (
            select trade_date, 
               case when close_price > lag(close_price) over( order by trade_date) 
                then 0 else 1 end rise_or_fall from stock_price
        )
     ) group by day_no_gain
)

Stock price increase for the Top3 days

with ranked_prices as (
	select closing,lag(closing) over (order by closing) as previous_closing,
			row_number() over (order by closing desc) as rank
	from share62)
select
	closing as closing_price,closing / previous_closing - 1 as growth_rate
from ranked_prices
where rank <= 3 order by rank

E-commerce funnel analysis

with e1 as
(select uid, 1 as step1, min(etime) as t1
	from event
where etime >= to_date('2021-01-10')
	and etime < to_date('2021-01-25')
	and eventtype = 'eventtype1'
	and …
group by 1),
e2 as
(select uid, 1 as step2, min(e1.t1) as t1, min(e2.etime) as t2
	from event as e2
inner join e1
	on e2.uid = e1.uid
where e2.etime >= to_date('2021-01-10')
	and e2.etime < to_date('2021-01-25')
	and e2.etime > t1
	and e2.etime < t1 + 7
	and eventtype = 'eventtype2'
	and …
group by 1),
e3 as
(select uid, 1 as step3, min(e2.t1) as t1, min(e3.etime) as t3
	from event as e3
inner join e2
	on e3.uid = e2.uid
where e3.etime >= to_date('2021-01-10')
	and e3.etime < to_date('2021-01-25')
	and e3.etime > t2
	and e3.etime < t1 + 7
	and eventtype = 'eventtype3'
	and …
group by 1)
select sum(step1) as step1, sum(step2) as step2, sum(step3) as step3
from e1
left join e2
	on e1.uid = e2.uid
left join e3
	on e2.uid = e3.uid

Funnel with more steps requires more subqueries

Facing complex problems, the difficulty of writing SQL increases sharply

Identify the top n customers who account for 50% and 80% of total sales from sales data for precision marketing
Analyze which stores are most popular, which time periods are busiest, and which dishes are most popular based on data such as the number of customers, consumption amount, consumption time, and consumption location of chain catering stores
Calculate the sales volume, sales revenue, average price, sales region, etc. for each model based on car sales data, and analyze which models are most popular and which models need to be adjusted in price or improved in design
Based on the purchasing history and preference data of e-commerce customers, divide them into different groups with strong purchasing power, preference for women's clothing, preference for men's clothing, etc., in order to facilitate targeted promotional activities
Based on game login data analysis, list the user's first login records, the user's last login interval, and calculate the user's login attempts within the three days prior to their last login

These complex requirements typically require a multi-step process and involve order related operations, requiring a very convoluted detour measure to be solved using SQL, resulting in hundreds lines of code and multiple nested layers.

The problems of SQL

1Debugging is troublesome

SQL does not provide breakpoints and step-in, which are common debugging methods in most languages. It can only be disassembled and gradually debugged.
SQL statements are often very long and has multiple layers of nesting, making it very difficult to disassemble them.

2The closeness of SQL

The external data(such as text, excel, web crawl) must be imported into the database before being calculated
Databases have strong constraints on data, the data importing and cleaning process is often cumbersome and time-consuming.

3Poor portability

Data scientists often perform calculations on desktop, while SQL must come with a database, making it less portable.

4SQL performance is low

The performance of SQL mainly relies on the database optimizer, which can only handle simple SQL.
```
select top 10 x from T order by x desc
```
when selecting the top 10 from 100 million pieces of data, the optimizer will not directly perform a sorting by literal meaning, but will choose a more efficient method.
When the requirements became more complex, optimizers always get dizzy.
```
select * from ( 
    select *, row_number() over (partition by  Area 
                       order by Amount desc) rn from Orders ) 
```
Such as changing the requirements to find the top 10 within each group. The increase in complexity is not significant, but it has already made most optimizers dizzy, resulting in only literal sorting and a sharp decline in performance.

Switch to Python？Still difficult to write in complex situations

Stock price increase for the Top3 days

sort_pos=(-stock_62["CLOSING"]).argsort()
max3pos=sort_pos.iloc[:3]
stock_62s=stock_62.shift(1)
max3CL=stock_62["CLOSING"].iloc[max3pos]
max3CLs=stock_62s["CLOSING"].iloc[max3pos]
max3_rate=max3CL/max3CLs-1
print(max3_rate)

Python cannot directly reference adjacent members and requires shift alignment, which can be slightly cumbersome.

Longest consecutive rising days of a stock

max_days = 0
current_days = 1
for i in range(1, len(stock_prices)):   
    if stock_prices[i] > stock_prices[i - 1]:        
        current_days += 1   
    else:        
        max_days = max(max_days, current_days)        
        current_days = 1
max_days = max(max_days, current_days)

This more complex task can only be hard coded using a “for” loop.

The problems of Python

1Cumbersome debugging

Checking the intermediate values still relies mainly on the print method, and it needs to be deleted after debugging, which is very troublesome.

2Hard to write code for big data

The Pandas library can directly perform in-memory calculations, such as sorting, filtering, grouping, and aggregation. However, these functions cannot directly handle large data that exceeds memory and require hard coding to segment, making the code much more complex.
Sometimes, specialized algorithms such as Hash Group is needed for performance reasons, making the code even more difficult to write.

3Python only has pseudo parallelism

Parallelism in Python is fake. To utilize multiple CPUs, complex multi processes parallelism is required, which is beyond the capabilities of most data scientists.

4Python lacks its own storage solution

Python lacks its own storage solution, relying on text files is too slow and ambiguous.
Using a database loses portability.

5Python code is difficult to integrate into Java applications

It often needs to run in two processes when Python works with Java applications, resulting in poor performance and stability.

6The versions are chaotic and lack code compatibility

The versions of Python are chaotic, some major versions is completely incompatible, and there are small troubles between minor versions.
Many third-party library packages are independently developed and rarely consider compatibility with other packages. Data scientists find it difficult to understand the compatibility between library packages and between library packages and Python versions, often at a loss.

02A must-have tool for data scientists

esProc SPL is a must-have tool for data scientists

esProc is a tool specifically designed for structured data processing
SPL is the formal language of esProc, which is concise, easy to understand, and easy to debug
SPL has support for big data and high performance
Fundamentally solves many drawbacks of SQL and Python

SPL: Structured Process Language

The code is concise

Stock price increase for the Top3 days

SQL

with ranked_prices as (
    select closing,lag(closing) over (order by closing) as previous_closing,
                row_number() over (order by closing desc) as rank
    from share62)
select
    closing as closing_price,closing / previous_closing - 1 as growth_rate
from ranked_prices
where rank <= 3 order by rank

Python

sort_pos=(-stock_62["CLOSING"]).argsort()
max3pos=sort_pos.iloc[:3]
stock_62s=stock_62.shift(1)
max3CL=stock_62["CLOSING"].iloc[max3pos]
max3CLs=stock_62s["CLOSING"].iloc[max3pos]
max3_rate=max3CL/max3CLs-1
print(max3_rate)

SPL

	A
1	=share62.psort@z(CLOSING)
2	=A1.m(:3)
3	=share62.calc(A2,if(#==1,null,CLOSING/CLOSING[-1]-1))

SPL provides more professional ordered calculations, and the code is easy to write.

Longest consecutive rising days of a stock

SQL

select max(consecutive_day)  from (
    select count(*) consecutive_day  from (
        select sum(rise_or_fall) over(order by trade_date) day_no_gain  from (
            select trade_date, case when close_price > 
                       lag(close_price)  over(order by trade_date) 
                       then 0 else 1 end rise_or_fall from stock_price
        )
     ) group by day_no_gain
)

Python

max_days = 0
current_days = 1
for i in range(1, len(stock_prices)):   
    if stock_prices[i] > stock_prices[i - 1]:        
        current_days += 1   
    else:        
        max_days = max(max_days, current_days)        
        current_days = 1
max_days = max(max_days, current_days)

SPL

	A
1	=stock.sort(tradeDate)
2	=0
3	=A1.max(A2=if(closePrice>closePrice[-1],A2+1,0))

SPL code is simpler and does not require an explicit “for” loop.

E-commerce funnel analysis

SQL

with e1 as
 (select uid, 1 as step1, min(etime) as t1
    from event
   where etime >= to_date('2021-01-10')
     and etime < to_date('2021-01-25')
     and eventtype = 'eventtype1'
     and …
   group by 1),
e2 as
 (select uid, 1 as step2, min(e1.t1) as t1, min(e2.etime) as t2
    from event as e2
   inner join e1
      on e2.uid = e1.uid
   where e2.etime >= to_date('2021-01-10')
     and e2.etime < to_date('2021-01-25')
     and e2.etime > t1
     and e2.etime < t1 + 7
     and eventtype = 'eventtype2'
     and …
   group by 1),
   e3 as
 (select uid, 1 as step3, min(e2.t1) as t1, min(e3.etime) as t3
    from event as e3
   inner join e2
      on e3.uid = e2.uid
   where e3.etime >= to_date('2021-01-10')
     and e3.etime < to_date('2021-01-25')
     and e3.etime > t2
     and e3.etime < t1 + 7
     and eventtype = 'eventtype3'
     and …
   group by 1)
select sum(step1) as step1, sum(step2) as step2, sum(step3) as step3
  from e1
  left join e2
    on e1.uid = e2.uid
  left join e3
    on e2.uid = e3.uid

SPL

	A
1	=["etype1","etype2","etype3"]
2	=file("event.ctx").open()
3	=A2.cursor(id,etime,etype;etime>=date("2021-01-10") && etime < date("2021-01-25") && A1.contain(etype) && …)
4	=A3.group(uid).(~.sort(etime))
5	=A4.new(~.select@1(etype==A1(1)):first,~:all).select(first)
6	=A5.(A1.(t=if(#==1,t1=first.etime,if(t,all.select@1(etype==A1.~ && etime>t && etime < t1+7).etime, null))))
7	=A6.groups(;count(~(1)):STEP1,count(~(2)):STEP2,count(~(3)):STEP3)

SPL is also more concise, in line with natural thinking, and this code can handle any step of funnel

Strong interactivity and convenient debugging

The debugging convenience of SPL is significantly better than SQL, and it is also much more convenient than Python's print method.

Self-owned format files

	A
1	=d:\normalData.btx
2	=file(A1).import@b()
3	=A2.groups(state;count(custid):count)

btx

	A
1	==d:\bigData.ctx
2	=file(A1).open().cursor@m()
3	=A2.groups@o(cust;sum(amount):amount)

ctx

SPL provides its own high-performance format files, which are convenient and unambiguous, making desktop computing more convenient.

Simple Big Data Computing

In-memory computing

	A
1	=d:\smallData.txt
2	=file(A1).import@t()
3	=A2.groups(state;sum(amount):amount)

Cursor computing

	A
1	=d:\bigData.txt
2	=file(A1).cursor@t()
3	=A2.groups(state;sum(amount):amount)

SPL has elaborately designed external storage cursor calculations, and the code for both inside and outside memory calculations is almost identical.

Simple parallel computing

Serial

	A
1	=d:\bigData.txt
2	=file(A1).cursor@t()
3	=A2.groups(state;sum(amount):amount)

Parallel

	A
1	=d:\bigData.txt
2	=file(A1).cursor@tm()
3	=A2.groups(state;sum(amount):amount)

SPL supports thread based parallel computing, which can fully utilize CPU multi-cores. Parallel computing only requires adding one option.

Higher computational performance

SQL

select top 10 x from T order by x desc

select * from ( 
    select *, row_number() over (partition by  Area 
                       order by Amount desc) rn from Orders )

SPL

Orders.groups(;top(10;-Amount))

Orders.groups(Area;top(10;-Amount))

In the TopN problem mentioned earlier, SPL understands TopN as aggregate calculation, and the computational logic does not require big sorting, which is much faster.
Moreover, the TopN within groups and the TopN of entire set are written in the same way.

Part of High Performance Computing Mechanism Provided in SPL

Traversal technique

Delayed cursor

Aggregate Understanding

※ Ordered cursor

※ Multi-purpose traversal

Prefilter traversal

Highly efficient Joins

※ Foreign key as pointer

※ Numbering of foreign keys

Order-based merge

※ Attached table

※ Unilateral HASH Join

High performance storage

Orderly Compressed Storage

Free column storage

※ Hierarchical Numbering positioning

Index and Caching

※ Double increment segmentation

Cluster computing

Preemptive Load Balancing

※ Multi-zone composite table

※ Cluster dimension table

※ Memory spare tire fault tolerance

External storage redundancy fault tolerance

The algorithms and storage schemes marked with ※ are SPL's original inventions!

Extended reading: Performance Optimization for Big Structured Data

Open computing

SPL naturally has openness and can directly calculate various data sources.
No need for data importing, saving time for data cleaning.

Good enterprise attributes

After data scientists have finished exploration and analysis, the results are likely to be integrated into enterprise applications.
SPL provides friendly interfaces and standardized version control
Seamless and authentic integration, and worry free migration and upgrading.

esProc

03Technical Characteristics

Operating environment

JVM of JDK1.8 or above version
Any operating system, including VM and Containers
The full installation space is less than 600M, and the core package is less than 15M, can run on Android smoothly

Resource consumption is far less than that of databases

esProc IDE

Rich computations

All computation is implemented within esProc, will not and also can not translate into SQL

Data sources supported

Diversified data sources and mixed computing support

esProc has independent and complete computing power, independent of data sources, and can read any data sources and mixed calculate them.

High performance file formats

BIN FILE

Binary file, simple format, no need to define data structure

Application scenarios

Mediation for data dumping
Temporary storage
Storage of small data in calculations (such as dimension tables)

COMPOSITE TABLE

Mixed row-wise and columnar storage, supports indexing, requires pre-defined structure

Application scenarios

Big data storage
High performance computing

Easy to integrate: Embedded mode

No server required, esProc JDBC has independent computing power and can be embedded in application to compute
Extremely unique feature

Easy to integrate: Server mode

Server mode, independent deployment, support cluster, provide load balancing and fault tolerance mechanisms

04More code comparison examples

Ordered grouping

Identify the sequence of consecutive numbers in the sequence

SQL

There is no ready-made method to implement this task

Python

lst = [7,5,5,7,13,15,16,17,13,5,6,7,8,17,6,10,11,11,18,15]
diffs = [j-i for i, j in enumerate(lst)]
groups = pd.Series(diffs).diff().ne(0).cumsum()
g_list = pd.Series(lst).groupby(groups).agg(list)
result = g_list[g_list.apply(lambda x: len(x) > \ 
    1)].tolist()

SPL

	A
1	[7,5,5,7,13,15,16,17,13,5,6,7,8,17,6,10,11,11,18,15]
2	=A1.group@o(~-#).select(~.len()>1)

Non equivalence grouping

Number of male employees in designated departments

SQL

SELECT d.DEPT, COUNT(GENDER) AS MaleCount
FROM (
     SELECT COLUMN_VALUE AS DEPT 
     FROM TABLE(SYS.ODCIVARCHAR2LIST(
        'Administration', 'HR', 'Marketing', 'Sales'))
) d
LEFT JOIN EMPLOYEE e
    ON d.DEPT = e.DEPT AND e.GENDER = 'M'
GROUP BY d.DEPT;

Python

There is no specialized method, it needs to be hard coded

SPL

	A
1	=file("EMPLOYEE.csv").import@tc().select(GENDER=="M")
2	[Administration, HR, Marketing, Sales]
3	=A1.align@a(A2,DEPT).new(A2(#):DEPT,~.len():NUM)

Adjacent references

The largest monthly growth in a year

SQL

WITH sales AS (
    SELECT COLUMN_VALUE AS sales_amount, ROWNUM AS month_number
    FROM TABLE(SYS.ODCINUMBERLIST(
          123,345,321,345,546,542,874,234,543,983,434,897))
),
lagged_sales AS (
    SELECT sales_amount, month_number, 
          LAG(sales_amount) OVER (ORDER BY NULL) AS prev_sales_amount
    FROM sales
)
SELECT MAX(sales_amount - prev_sales_amount) AS max_monthly_growth
FROM lagged_sales

Python

sales = [123,345,321,345,546,542,874,234,543,983,434,897]
month = [i for i in range(1,13)]
df = pd.DataFrame({'month':month,'sales':sales})
max_growth = df['sales'] - df['sales'].shift(1).max()

SPL

	A
1	[123,345,321,345,546,542,874,234,543,983,434,897]
2	=A1.(if(#>1,~-~[-1],0)).max()

Positioning calculation

The position of the highest paying employee in the table

SQL

SELECT rn
FROM (
    SELECT EMPLOYEE.*, ROWNUM AS rn
    FROM EMPLOYEE
) 
WHERE SALARY =
    (SELECT MAX(SALARY) FROM EMPLOYEE)

Python

df = pd.read_csv('../EMPLOYEE.csv’)
highest_salary_employee_index_all = \
    np.where(df['SALARY'] == df['SALARY'].max())[0]

SPL

	A
1	=file("EMPLOYEE.csv").import@tc()
2	=A1.pmax@a(SALARY)

The daily increase when the stock price is higher than both the previous 4 days and the next 4 days

SQL

SELECT closing/closing_pre-1 AS raise
FROM(
    SELECT dt,closing,ROWNUM AS rn,
        MAX(closing) OVER (ORDER BY dt ROWS BETWEEN 5  
                PRECEDING AND 1 PRECEDING) AS max_pre,
        MAX(closing) OVER (ORDER BY dt ROWS BETWEEN 1 
                FOLLOWING AND 5 FOLLOWING) AS max_suf,
                LAG(closing,1) OVER (ORDER BY dt) AS closing_pre
    FROM stock
)
WHERE rn>5 AND rn<=(SELECT count(*) FROM stock)-5
    AND CLOSING>max_pre AND CLOSING>max_suf

Python

import pandas as pd
stock_price_df = pd.read_csv('../STOCK.csv')
price_increase_list = []
for i in range(5, len(stock_price_df)-5):
    if stock_price_df['CLOSING'][i] > max(stock_price_df['CLOSING'][i-5:i]) and \
            stock_price_df['CLOSING'][i] > max(stock_price_df['CLOSING'][i+1:i+6]):
        price_increase = stock_price_df['CLOSING'][i] / \ 
                stock_price_df['CLOSING'][i-1]-1
        price_increase_list.append(price_increase)
result = price_increase_list

SPL

	A
1	=file("STOCK.csv").import@tc()
2	=lth=A1.len(),A1.pselect@a(#>4&&#<=A1.len()-4&&CLOSING>max(CLOSING[-4:-1])&&CLOSING>max(CLOSING[1:4]))
3	=A1.calc(A2,CLOSING/CLOSING[-1]-1)

The next day retention rate of new users per day

SQL

WITH first_login AS (
    SELECT userid, MIN(TRUNC(ts)) AS first_login_date
    FROM login_data    GROUP BY userid ),
next_day_login AS (
    SELECT DISTINCT(fl.userid), fl.first_login_date, TRUNC(ld.ts) AS 
            next_day_login_date
    FROM first_login fl  LEFT JOIN login_data ld ON fl.userid = ld.userid
    WHERE TRUNC(ld.ts) = fl.first_login_date + 1),
day_new_users AS(
    SELECT first_login_date,COUNT(*) AS new_user_num
    FROM first_login GROUP BY first_login_date),
next_new_users AS(
    SELECT next_day_login_date, COUNT(*) AS next_user_num
     FROM next_day_login GROUP BY next_day_login_date),
all_date AS(
    SELECT DISTINCT(TRUNC(ts)) AS login_date
    FROM login_data)
SELECT all_date.login_date+1 AS dt, dn.new_user_num, nn.next_user_num,
    (CASE WHEN nn. next_day_login_date IS NULL THEN 0 
     ELSE nn.next_user_num END)/dn.new_user_num AS ret_rate
FROM all_date
JOIN day_new_users dn
    ON all_date.login_date=dn.first_login_date
LEFT JOIN next_new_users nn
    ON dn.first_login_date+1=nn. next_day_login_date
ORDER BY all_date.login_date

Python

import pandas as pd
df = pd.read_csv("../login_data.csv")
df["ts"] = pd.to_datetime(df["ts"]).dt.date
gp = df.groupby('userid')
row = []
for uid,g in gp:
    fst_dt = g.iloc[0].ts
    sec_dt = fst_dt + pd.Timedelta(days=1)
    all_dt = g.ts.values
    wether_sec_login = sec_dt in all_dt
    row.append([uid,fst_dt,sec_dt,wether_sec_login])
user_wether_ret_df = pd.DataFrame(row,columns=['userid','fst_dt','sec_dt','wether_sec_login'])
result = user_wether_ret_df.groupby('sec_dt').apply(lambda x:x['wether_sec_login'].sum()/len(x))

SPL

	A
1	=file("login_data.csv").import@tc()
2	=A1.group(userid;fst=date(ts):fst_login,~.(date(ts)).pos(fst+1)>0:wether_sec_login)
3	=A2.groups(fst_login+1:dt;count(wether_sec_login)/count(1):ret_rate)

05FAQ

Is esProc based on open source or database technology?

esProc is based on a brand-new computing model, no other open source technology can be referenced, and the code all self-developed

SPL has very different theoretical basis from SQL, and can not base on any database technology. It is impossible to implement calculation of SPL using SQL.

Can esProc connect to the existing database or other data sources to work?

Yes, Of course! esProc supports almost all of the common data sources in the industry and can work directly through the interfaces and syntax of the data sources themselves, without the need to map the data sources to relational data tables.

However, esProc can not guarantee high performance in this situation due to the inefficient I/O of database, and database can hardly provide storage schema which is necessary for low complexity algorithm. For high performance, it is recommended to use esProc's own format files for storage.

Where does esProc store data?

esProc does not own the data and, in principle, is not responsible for data storage, and any accessible data can be calculated.

In particular, esProc has excellent support for files. Data files can be stored in any file system, including NFS and object storage on the cloud, which naturally implements the separation between computation and storage.

What is esProc's ability in machine learning?

SPL supports common regression fitting algorithms and aggregation algorithms such as SVM and K-means, providing basic vector matrix operation capabilities.

SPL lacks complex machine learning models such as XGBT, which is a little bit far from Python in this regard.

If complex machine learning and deep learning models are needed, SPL can call YModel to build complex models and perform prediction. YModel is an automatic modeling software that can automatically preprocess and build models with just one click.

Is SPL difficult to learn?

The SPL syntax is very easy, and those with a Java/SQL foundation can get started in just a few hours and become proficient in a few weeks.

SPL is specifically designed for low code and high performance, has more universal data objects and operations than SQL and Python, is more in line with human natural thinking. Solving problems is very direct, and there is no need to go in a convoluted way. SPL code is usually more concise than SQL and Python.

To develope high-performance code using SPL, it is necessary to learn more algorithm knowledge. SPL provides a high-performance function library and encapsulates algorithms. Once learned, many high-performance tasks become routine.

Extended reading : Is SPL more difficult or easier than SQL?

06Summary

Summary of advantages of esProc

5 advantages

Low code

Concise code that conforms to natural thinking
High development efficiency

Strong interactivity

Professional IDE is perfect for exploration and analysis
Convenient to debug

Super Portable

Lightweight and portable for desktop computing
Easy data movement

Big data

Built-in big data and parallel computing syntax
High performance algorithm

Easy integration

Integration into enterprise applications directly
Rich data sources

Download esProc SPL Source Code

esProc SPL

Table of Contents

01Common tools and their problems

Exploring data using SQL looks simple

Facing complex problems, the difficulty of writing SQL increases sharply

Longest consecutive rising days of a stock

Stock price increase for the Top3 days

E-commerce funnel analysis

Facing complex problems, the difficulty of writing SQL increases sharply

The problems of SQL

1Debugging is troublesome

2The closeness of SQL

3Poor portability

4SQL performance is low

Switch to Python？Still difficult to write in complex situations

Stock price increase for the Top3 days

Longest consecutive rising days of a stock

The problems of Python

1Cumbersome debugging

2Hard to write code for big data

3Python only has pseudo parallelism

4Python lacks its own storage solution

5Python code is difficult to integrate into Java applications

6The versions are chaotic and lack code compatibility

02A must-have tool for data scientists

esProc SPL is a must-have tool for data scientists

The code is concise

Stock price increase for the Top3 days

SQL

Python

SPL

Longest consecutive rising days of a stock

SQL

Python

SPL

E-commerce funnel analysis

SQL

SPL

Strong interactivity and convenient debugging

Self-owned format files

Simple Big Data Computing

In-memory computing

Cursor computing

Simple parallel computing

Serial

Parallel

Higher computational performance

SQL

SPL

Part of High Performance Computing Mechanism Provided in SPL

Open computing

Good enterprise attributes

03Technical Characteristics

Operating environment

esProc IDE

Rich computations

Data sources supported

Diversified data sources and mixed computing support

High performance file formats

BIN FILE

Application scenarios

COMPOSITE TABLE

Application scenarios

Easy to integrate: Embedded mode

Easy to integrate: Server mode

04More code comparison examples

Ordered grouping

Identify the sequence of consecutive numbers in the sequence

SQL

Python

SPL

Non equivalence grouping

Number of male employees in designated departments

SQL

Python

SPL

Adjacent references

The largest monthly growth in a year

SQL

Python