From time to time a customer asks me how to join multiple tables with temporal intervals (e.g. defined by two columns such as valid_from and valid_to per row). The solution is quite simple if you may limit your query to a certain point in time like now, yesterday or similar. Such a time point becomes just an additional filter criterion per temporal table (e.g. t1.some_date between t2.valid_from and t2.valid_to). But in some cases, this approach is not feasible, e.g. if you have to provide all relevant time intervals. I’ll explain in this post a solution approach based on the following data model.
This model is based on the famous EMP and DEPT tables. I added a reference table for job just to get an additional table to join. You find the SQL script to create and populate the model here.
I’d like to query data from the tables EMPV, DEPTV, JOBV and EMPV (manager). Here is the content of the tables reduced to the data relevant for empno 7788 (SCOTT).
SQL> SELECT * FROM empv WHERE empno = 7788 ORDER BY valid_from;
EMPVID EMPNO ENAME JOBNO MGR HIREDATE SAL COMM DEPTNO VALID_FROM VALID_TO
------ ----- ----- ----- ---- ----------- ---- ---- ------ ----------- -----------
8 7788 SCOTT 5 7566 19-APR-1987 3000 20 19-APR-1987 31-DEC-1989
22 7788 Scott 5 7566 19-APR-1987 3000 20 01-JAN-1990 31-MAR-1991
36 7788 Scott 5 7566 19-APR-1987 3300 20 01-APR-1991 31-DEC-9999
SQL> SELECT * FROM jobv WHERE jobno = 5 ORDER BY valid_from;
JOBVID JOBNO JOB VALID_FROM VALID_TO
------ ----- ------- ----------- -----------
5 5 ANALYST 01-JAN-1980 20-JAN-1990
10 5 Analyst 21-JAN-1990 31-DEC-9999
SQL> SELECT * FROM deptv WHERE deptno = 20 ORDER BY valid_from;
DEPTVID DEPTNO DNAME LOC VALID_FROM VALID_TO
------- ------ -------- ------ ----------- -----------
2 20 RESEARCH DALLAS 01-JAN-1980 28-FEB-1990
6 20 Research DALLAS 01-MAR-1990 31-MAR-1990
10 20 Research Dallas 01-APR-1990 31-DEC-9999
SQL> SELECT * FROM empv WHERE empno = 7566 ORDER BY valid_from;
EMPVID EMPNO ENAME JOBNO MGR HIREDATE SAL COMM DEPTNO VALID_FROM VALID_TO
------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- -----------
4 7566 JONES 4 7839 02-APR-1981 2975 20 02-APR-1981 31-DEC-1989
18 7566 Jones 4 7839 02-APR-1981 2975 20 01-JAN-1990 31-MAR-1991
32 7566 Jones 4 7839 02-APR-1981 3272.5 20 01-APR-1991 31-DEC-9999
The following figure visualizes the expected result of a temporal join using the data queried previously.
In this case, six result records (intervals) are expected. As you see the result is dependent on the number of different intervals or the distinct VALID_FROM values. The driving object is valid from 19-APR-1987 until 31-DEC-9999. VALID_FROM values outside of the validity are irrelevant (e.g. 01-JAN-1980 and 02-APR-1981).
Based on this information we can write the query. The highlighted inline view g produces a list of all distinct VALID_FROM values which will be used as additional join criterion for all temporal tables.
SELECT e.empno,
MIN(g.valid_from) AS valid_from,
LEAD(MIN(g.valid_from) - 1, 1, DATE '9999-12-31') OVER(
PARTITION BY e.empno ORDER BY MIN(g.valid_from)
) AS valid_to,
e.ename,
j.job,
e.mgr,
m.ename AS mgr_ename,
e.hiredate,
e.sal,
e.comm,
e.deptno,
d.dname,
d.loc
FROM empv e
INNER JOIN (SELECT valid_from FROM empv
UNION
SELECT valid_from FROM deptv
UNION
SELECT valid_from FROM jobv) g
ON g.valid_from BETWEEN e.valid_from AND e.valid_to
INNER JOIN deptv d
ON d.deptno = e.deptno
AND g.valid_from BETWEEN d.valid_from AND d.valid_to
INNER JOIN jobv j
ON j.jobno = e.jobno
AND g.valid_from BETWEEN j.valid_from AND j.valid_to
LEFT JOIN empv m
ON m.empno = e.mgr
AND g.valid_from BETWEEN m.valid_from AND m.valid_to
WHERE e.empno = 7788
GROUP BY e.empno,
e.ename,
j.job,
e.mgr,
m.ename,
e.hiredate,
e.sal,
e.comm,
e.deptno,
d.dname,
d.loc
ORDER BY empno, valid_from;
EMPNO VALID_FROM VALID_TO ENAME JOB MGR MGR_ENAME HIREDATE SAL COMM DEPTNO DNAME LOC
----- ----------- ----------- ----- ------- ---- --------- ----------- ---- ---- ------ -------- ------
7788 19-APR-1987 31-DEC-1989 SCOTT ANALYST 7566 JONES 19-APR-1987 3000 20 RESEARCH DALLAS
7788 01-JAN-1990 20-JAN-1990 Scott ANALYST 7566 Jones 19-APR-1987 3000 20 RESEARCH DALLAS
7788 21-JAN-1990 28-FEB-1990 Scott Analyst 7566 Jones 19-APR-1987 3000 20 RESEARCH DALLAS
7788 01-MAR-1990 31-MAR-1990 Scott Analyst 7566 Jones 19-APR-1987 3000 20 Research DALLAS
7788 01-APR-1990 31-MAR-1991 Scott Analyst 7566 Jones 19-APR-1987 3000 20 Research Dallas
7788 01-APR-1991 31-DEC-9999 Scott Analyst 7566 Jones 19-APR-1987 3300 20 Research Dallas
The beauty of this approach is that it works with any granularity and it automatically merges identical intervals. In this example, I use a granularity of a day, but this approach works also for the granularity of seconds or even a fraction of a second, e.g. if you are using a TIMESTAMP data type to define the interval boundaries.
It’s important to notice that I’ve used an including semantic for VALID_TO in this example. If you use an excluding semantic (VALID_TO = VALID_FROM of the subsequent interval) you have to amend the calculation of the VALID_TO and the join criteria (BETWEEN is not feasible with excluding semantic). Furthermore, this example does not cover gaps in the historization. If you have gaps you need to amend the calculation of the VALID_TO column and ensure that you do not merge gaps. Merging intervals with a simple group by will produce wrong results if “disconnected” intervals have the same content. Issues are addressed in part 2 of this post.
Updated on 2012-12-28, emphasized the possibility of wrong results and added a link to part 2 of this post.
1 Comment
Nice Blog entry – transparent and systematic approach. I got it after a few moments of (hard) thinking ;-) -Thanks.