Managing Endless Report Development

esProc for Reporting

Simple development · Rich formats · Diverse sources · Lightweight · High performance

esProc Reporting Solution

Technical Architecture

Seamless integration with reporting tools/Java applications

esProcPure Java
  • Lightweight
  • Embedded, no independent server
  • JVM,JDK1.8 or above
  • VM/Container/Android

Development Efficiency Improvement

Easy to develop and debug grid programming

Simplify complex SQL of report

Report objectiveTo query the status of major customers and orders that account for half of the total sales revenue

SQL

SELECT CUSTOMER, AMOUNT, SUM_AMOUNT
	FROM (SELECT CUSTOMER, AMOUNT,
		SUM(AMOUNT) OVER(ORDER BY AMOUNT DESC) SUM_AMOUNT
			FROM (SELECT CUSTOMER, SUM(AMOUNT) AMOUNT
	   			FROM ORDERS GROUP BY CUSTOMER))
	   			WHERE 2 * SUM_AMOUNT < (SELECT SUM(AMOUNT) TOTAL FROM ORDERS)

SPL

AB
1=db.query("select customer,amount from orders order by amount desc")
2=A1.sum(amount)/2 =0
3=A1.pselect((B1+=amount)>=A2)return A1.to(A3)

More complex SQL

Report objectiveTo track the retention of new users for the next day on a daily basis

SQL

WITH first_login AS ( 
    SELECT userid, MIN(TRUNC(ts)) AS first_login_date  FROM login_data GROUP BY userid),
next_day_login AS (
    SELECT DISTINCT(fl.userid), fl.first_login_date, TRUNC(ld.ts) AS next_day_login_date
    FROM first_login fl LEFT JOIN login_data ld ON fl.userid = ld.userid WHERE TRUNC(ld.ts) = fl.first_login_date + 1),
day_new_users AS (
    SELECT first_login_date,COUNT(*) AS new_user_num FROM first_login GROUP BY first_login_date),
next_new_users AS (
    SELECT next_day_login_date, COUNT(*) AS next_user_num FROM next_day_login GROUP BY next_day_login_date),
all_date AS (
    SELECT DISTINCT(TRUNC(ts)) AS login_date FROM login_data)
SELECT all_date.login_date+1 AS dt,dn. new_user_num,nn. next_user_num,
      (CASE  WHEN nn. next_day_login_date IS NULL THEN 0  ELSE nn.next_user_num END)/dn.new_user_num AS ret_rate
FROM all_date JOIN day_new_users dn ON all_date.login_date=dn.first_login_date
    LEFT JOIN next_new_users nn ON dn.first_login_date+1=nn. next_day_login_date
ORDER BY all_date.login_date;

SPL

A
1=file(“login_data.csv”).import@tc()
2=A1.group(userid;fst=date(ts):fst_login,~.(date(ts)).pos(fst+1)>0:w_sec_login)
3=A2.groups(fst_login+1:dt;count(w_sec_login)/count(1):ret_rate)

Enrich report formats

Creating complex reports is simpler than BIRT/JasperReport

Multi Data Source Computing

Multi data source report

  • Access data sources directly through the native interface, without predefined mapping, while preserving their features
  • Lightweight, avoiding heavy logical data warehouses

Application of multi source mixed computing - Real-time report

  • Historical cold data is calculated and read from the AP database
  • Transaction hot data is read from the TP database in real time
  • Mixed computing to implement real-time report of whole data

Cross database migration

One SQL statement that works across all databases, allowing migration without altering the report.

Cross database migration – Conversion example

SELECT EID, NAME, BIRTHDAY, ADDMONTHS(BIRTHDAY,10) DAY10 FROM EMP
⇩ esProc conversion ⇩
SELECT EID, NAME, BIRTHDAY, BIRTHDAY+NUMTOYMINTERVAL(10,'MONTH') DAY10 FROM EMP
SELECT EID, NAME, BIRTHDAY, DATEADD(MM,10,BIRTHDAY) DAY10 FROM EMP
SELECT EID, NAME, BIRTHDAY, BIRTHDAY+10 MONTHS DAY10 FROM EMP
SELECT EID, NAME, BIRTHDAY, BIRTHDAY+INTERVAL 10 MONTH DAY10 FROM EMP
SELECT EID, NAME, BIRTHDAY, BIRTHDAY+interval '10 months' DAY10 FROM EMP
SELECT EID, NAME, BIRTHDAY, ADD_MONTHS(BIRTHDAY, 10) DAY10 FROM EMP

The file data source

Even without a database, SQL remains usable
$select * from Orders.csv where Client like '%bro%’ ;

$select o.OrderId,o.Client,e.Name e.Dept,e.EId from Orders.txt o
left join Employee.txt e on o.SellerId=e.Eid;

$select * from {file("Orders.txt").import@t(;":")}
where Amount>=100 and Client like 'bro' or OrderDate is null;

$select * from Orders.xlsx
where Amount>=100 and Client like 'bro' or OrderDate is null

$select * from {json(file("data.json").read())}
where Amount>=100 and Client like 'bro' or OrderDate is null

$select * from { httpfile("http://127.0.0.1:6868/Orders.csv").import@tc() }
where Amount>=100 and Client like 'bro' or OrderDate is null

Architecture Optimization

Report microservices

Report microservices

  • "Report data sources are delivered as microservices, simplifying extension and migration
  • esProc scripts are interpreted, enabling hot swapping by default
  • Adapt to the ever-changing reporting business.

Replace stored procedures

Replace stored procedures

  • Move stored procedures outside the database, making them independent and easily migratable
  • Reduce coupling between applications
  • No need for compile privileges on stored procedures, enhancing security and reliability

Eliminate intermediate tables in the database

Eliminate intermediate tables in the database

  • Move intermediate tables to files for storage and processing, easing the database load
  • The tree-structured file system simplifies management and reduces coupling between applications

Performance Boosting

Multi threaded parallel data retrieval

Multi threaded parallel data retrieval, compatible with both homogeneous and heterogeneous databases, boosts query performance

Single table parallel data retrieval:

AB
1fork to(12)=connect("oracle")
2=B1.query@x("SELECT * FROM   CUSTOMER WHERE MOD(C_CUSTKEY,?)=?", n, A1-1)
3=A1.conj()

Parallel data retrieval from multiple tables (databases):

AB
1SELECT * FROM SUPPLIER
2SELECT * FROM PART
3SELECT * FROM CUSTOMER
4SELECT * FROM PARTSUPP
5SELECT * FROM ORDERS
6fork [A1:A5]=connect("oracle")
7=B6.query@x(A6)

Simple Big Data and Parallel Computing

In-memory
External storage

Big data

A
1=file(sales.txt).import@t()
2=A1.select(od>=20240101)
3=A2.groups(area,emp;sum(amount):amount)
A
1=file(sales.txt).cursor@t()
2=A1.select(od>=20240101)
3=A2.groups(area,emp;sum(amount):amount)

Parallel computing

A
1=file(sales.txt).import@tm()
2=A1.select(od>=20240101)
3=A2.groups(area,emp;sum(amount):amount)
A
1=file(sales.txt).cursor@tm()
2=A1.select(od>=20240101)
3=A2.groups(area,emp;sum(amount):amount)
An option to enable parallelism can significantly enhance computing performance

Local file data cache

Local file data cache

  • Faster file system IO
  • Binary format does not require parsing
  • High performance support: compression, columnar storage, indexing …

High performance algorithms

Traversal technique

  • Delayed cursor
  • Aggregate Understanding
  • Ordered cursor
  • Multi-purpose traversal
  • Prefilter traversal

Highly efficient Joins

  • Foreign key as pointer
  • Numbering of foreign keys
  • Order-based merge
  • Attached table
  • Unilateral HASH Join

High performance storage

  • Orderly Compressed Storage
  • Columnar storage
  • Hierarchical Numbering positioning
  • Index and Caching
  • Double increment segmentation
Improve the performance of report computation

Table of Contents

  1. esProc Reporting Solution
  2. Development Efficiency Improvement
  3. Multi Data Source Computing
  4. Architecture Optimization
  5. Performance Boosting