The solution I’ve provided in Joining Temporal Intervals produces wrong results if one or more temporal tables have gaps in their history or if disconnected intervals have the same content. In this post I’ll address both problems.
Test Data
The example queries are based on the same model as described in Joining Temporal Intervals. For the join of the tables EMPV, DEPTV, JOBV and EMPV (manager) I’ve amended the history to contain some gaps which are highlighted in the following listing.
SQL> SELECT * FROM empv WHERE empno = 7788 ORDER BY valid_from;
EMPVID EMPNO ENAME JOBNO MGR HIREDATE SAL COMM DEPTNO VALID_FROM VALID_TO
------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- -----------
8 7788 SCOTT 5 7566 19-APR-1987 3000.0 20 19-APR-1987 31-DEC-1989
22 7788 Scott 5 7566 19-APR-1987 3000.0 20 01-JAN-1990 31-MAR-1991
36 7788 Scott 5 7566 19-APR-1987 3300.0 20 01-APR-1991 31-JUL-1991
43 7788 Scott 5 7566 01-JAN-1992 3500.0 20 01-JAN-1992 31-DEC-9999
SQL> SELECT * FROM jobv WHERE jobno = 5 ORDER BY valid_from;
JOBVID JOBNO JOB VALID_FROM VALID_TO
---------- ----- --------- ----------- -----------
5 5 ANALYST 01-JAN-1980 20-JAN-1990
10 5 Analyst 22-JAN-1990 31-DEC-9999
SQL> SELECT * FROM deptv WHERE deptno = 20 ORDER BY valid_from;
DEPTVID DEPTNO DNAME LOC VALID_FROM VALID_TO
---------- ------ -------------- ------------- ----------- -----------
2 20 RESEARCH DALLAS 01-JAN-1980 28-FEB-1990
6 20 Research DALLAS 01-MAR-1990 31-MAR-1990
10 20 Research Dallas 01-APR-1990 31-DEC-9999
SQL> SELECT * FROM empv WHERE empno = 7566 ORDER BY valid_from;
EMPVID EMPNO ENAME JOBNO MGR HIREDATE SAL COMM DEPTNO VALID_FROM VALID_TO
------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- -----------
4 7566 JONES 4 7839 02-APR-1981 2975.0 20 02-APR-1981 31-DEC-1989
18 7566 Jones 4 7839 02-APR-1981 2975.0 20 01-JAN-1990 31-MAR-1991
32 7566 Jones 4 7839 02-APR-1981 3272.5 20 01-APR-1991 31-DEC-9999
SQL> SELECT * FROM empv WHERE mgr = 7788 ORDER BY valid_from;
EMPVID EMPNO ENAME JOBNO MGR HIREDATE SAL COMM DEPTNO VALID_FROM VALID_TO
------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- -----------
11 7876 ADAMS 1 7788 23-MAY-1987 1100.0 20 23-MAY-1987 31-DEC-1989
25 7876 Adams 1 7788 23-MAY-1987 1100.0 20 01-JAN-1990 31-MAR-1991
39 7876 Adams 1 7788 23-MAY-1987 1210.0 20 01-APR-1991 31-DEC-9999
From a business point of view Scott left the company on 31-JUL-1991 and came back on 01-JAN-1992 with a better salary. It’s important to notice, that Scott is Adams manager and Adams is therefore leaderless from 01-AUG-1991 until 31-DEC-1991. Additionally I fabricated a gap for JOBNO 5 on 21-JAN-1990.
You find the SQL script to create and populate the model here.
Gap-Aware Temporal Join
The following figure visualizes the expected result of the temporal join. The raw data intervals queried perviously are represented in blue and the join result in red. The yellow bars highlight the gaps in the source and result data set.
Here is the query and the join result for EMPNO = 7788. Please note that the column LOC from table DEPV is not queried, which will reduce the number of final result intervals from 7 to 6.
SELECT e.empno,
g.valid_from,
LEAST(
e.valid_to,
d.valid_to,
j.valid_to,
NVL(m.valid_to, e.valid_to),
LEAD(g.valid_from - 1, 1, e.valid_to) OVER(
PARTITION BY e.empno ORDER BY g.valid_from
)
) AS valid_to,
e.ename,
j.job,
e.mgr,
m.ename AS mgr_ename,
e.hiredate,
e.sal,
e.comm,
e.deptno,
d.dname
FROM empv e
INNER JOIN (SELECT valid_from FROM empv
UNION
SELECT valid_from FROM deptv
UNION
SELECT valid_from FROM jobv
UNION
SELECT valid_to + 1 FROM empv
WHERE valid_to != DATE '9999-12-31'
UNION
SELECT valid_to + 1 FROM deptv
WHERE valid_to != DATE '9999-12-31'
UNION
SELECT valid_to + 1 FROM jobv
WHERE valid_to != DATE '9999-12-31') g
ON g.valid_from BETWEEN e.valid_from AND e.valid_to
INNER JOIN deptv d
ON d.deptno = e.deptno
AND g.valid_from BETWEEN d.valid_from AND d.valid_to
INNER JOIN jobv j
ON j.jobno = e.jobno
AND g.valid_from BETWEEN j.valid_from AND j.valid_to
LEFT JOIN empv m
ON m.empno = e.mgr
AND g.valid_from BETWEEN m.valid_from AND m.valid_to
WHERE e.empno = 7788
ORDER BY 1, 2;
EMPNO VALID_FROM VALID_TO ENAME JOB MGR MGR_E HIREDATE SAL COMM DEPTNO DNAME
----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ --------
7788 19-APR-1987 22-MAY-1987 SCOTT ANALYST 7566 JONES 19-APR-1987 3000.0 20 RESEARCH
7788 23-MAY-1987 31-DEC-1989 SCOTT ANALYST 7566 JONES 19-APR-1987 3000.0 20 RESEARCH
7788 01-JAN-1990 20-JAN-1990 Scott ANALYST 7566 Jones 19-APR-1987 3000.0 20 RESEARCH
7788 22-JAN-1990 28-FEB-1990 Scott Analyst 7566 Jones 19-APR-1987 3000.0 20 RESEARCH
7788 01-MAR-1990 31-MAR-1990 Scott Analyst 7566 Jones 19-APR-1987 3000.0 20 Research
7788 01-APR-1990 31-MAR-1991 Scott Analyst 7566 Jones 19-APR-1987 3000.0 20 Research
7788 01-APR-1991 31-JUL-1991 Scott Analyst 7566 Jones 19-APR-1987 3300.0 20 Research
7788 01-JAN-1992 31-DEC-9999 Scott Analyst 7566 Jones 01-JAN-1992 3500.0 20 Research
The highlighted inline view g produces a list of all distinct VALID_FROM values which will be used as additional join criterion for all temporal tables. Unlike in Joining Temporal Intervals all interval endpoints need also be considered to identify gaps.
The calculation of the VALID_TO column is a bit laborious (see highlighted lines 3 to 11). You need to get the lowest value for VALID_TO of all involved intervals, including outer joined intervals like m. Also, the subsequent VALID_FROM has to be considered since the inline view g is providing all VALID_FROM values to be probed and they may be completely independent of the involved intervals.
The remaining part of the query is quite simple. I’ve highlighted the rows in the result set which should be merged in a subsequent step (see lines 4 and 8).
If you change line 46 to “WHERE e.mgr = 7788” you get the following result:
EMPNO VALID_FROM VALID_TO ENAME JOB MGR MGR_E HIREDATE SAL COMM DEPTNO DNAME
----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ --------
7876 23-MAY-1987 31-DEC-1989 ADAMS CLERK 7788 SCOTT 23-MAY-1987 1100.0 20 RESEARCH
7876 01-JAN-1990 20-JAN-1990 Adams CLERK 7788 Scott 23-MAY-1987 1100.0 20 RESEARCH
7876 21-JAN-1990 21-JAN-1990 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 RESEARCH
7876 22-JAN-1990 28-FEB-1990 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 RESEARCH
7876 01-MAR-1990 31-MAR-1990 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 Research
7876 01-APR-1990 31-MAR-1991 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 Research
7876 01-APR-1991 31-JUL-1991 Adams Clerk 7788 Scott 23-MAY-1987 1210.0 20 Research
7876 01-AUG-1991 31-DEC-1991 Adams Clerk 7788 23-MAY-1987 1210.0 20 Research
7876 01-JAN-1992 31-DEC-9999 Adams Clerk 7788 Scott 23-MAY-1987 1210.0 20 Research
This result is interesting for several reasons. First, there are two highlighted records which should be merged in a subsequent step. Second, line 10 represents the time when Scott was not employed by this company. Third, the records in lines 9 and 11 must not be merged in a subsequent step. They are identical (beside VALID_FROM and VALID_TO) but the intervals are not connected.
Merging Temporal Intervals
If you look at the result of the previous query you might be tempted to avoid the merging step since there are just a few intervals which need merging. However, in real-life scenarios, you might easily end up with daily intervals for large tables since the inline-view g considers all valid_from and valid_to columns of all involved tables. Sooner or later you will think about merging temporal intervals or about other solutions to reduce the result set. – If you’re sceptical, then querying for “e.empno = 7369” might give you an idea of what I’m talking about (21 intervals before merge, 6 intervals after merge).
Since I covered this topic in Merging Temporal Intervals with Gaps I’ll provide the query to produce the final result and explain the specialities only.
WITH
joined AS (
-- gap-aware temporal join
-- produces result_cols to calculate new_group in the subsequent query
SELECT e.empno,
g.valid_from,
LEAST(
e.valid_to,
d.valid_to,
j.valid_to,
NVL(m.valid_to, e.valid_to),
LEAD(g.valid_from - 1, 1, e.valid_to) OVER(
PARTITION BY e.empno ORDER BY g.valid_from
)
) AS valid_to,
(
e.ename
|| ',' || j.job
|| ',' || e.mgr
|| ',' || m.ename
|| ',' || TO_CHAR(e.hiredate,'YYYY-MM-DD')
|| ',' || e.sal
|| ',' || e.comm
|| ',' || e.deptno
|| ',' || d.dname
) AS result_cols,
e.ename,
j.job,
e.mgr,
m.ename AS mgr_ename,
e.hiredate,
e.sal,
e.comm,
e.deptno,
d.dname
FROM empv e
INNER JOIN (SELECT valid_from FROM empv
UNION
SELECT valid_from FROM deptv
UNION
SELECT valid_from FROM jobv
UNION
SELECT valid_to + 1 FROM empv
WHERE valid_to != DATE '9999-12-31'
UNION
SELECT valid_to + 1 FROM deptv
WHERE valid_to != DATE '9999-12-31'
UNION
SELECT valid_to + 1 FROM jobv
WHERE valid_to != DATE '9999-12-31') g
ON g.valid_from BETWEEN e.valid_from AND e.valid_to
INNER JOIN deptv d
ON d.deptno = e.deptno
AND g.valid_from BETWEEN d.valid_from AND d.valid_to
INNER JOIN jobv j
ON j.jobno = e.jobno
AND g.valid_from BETWEEN j.valid_from AND j.valid_to
LEFT JOIN empv m
ON m.empno = e.mgr
AND g.valid_from BETWEEN m.valid_from AND m.valid_to
),
calc_various AS (
-- produces columns has_gap, new_group
SELECT empno,
valid_from,
valid_to,
result_cols,
ename,
job,
mgr,
mgr_ename,
hiredate,
sal,
comm,
deptno,
dname,
CASE
WHEN LAG(valid_to, 1, valid_from - 1) OVER(
PARTITION BY empno ORDER BY valid_from
) = valid_from - 1 THEN
0
ELSE
1
END AS has_gap,
CASE
WHEN LAG(result_cols, 1, result_cols) OVER (
PARTITION BY empno ORDER BY valid_from
) = result_cols THEN
0
ELSE
1
END AS new_group
FROM joined
),
calc_group AS (
-- produces column group_no
SELECT empno,
valid_from,
valid_to,
ename,
job,
mgr,
mgr_ename,
hiredate,
sal,
comm,
deptno,
dname,
SUM(has_gap + new_group) OVER(
PARTITION BY empno ORDER BY valid_from
) AS group_no
FROM calc_various
),
merged AS (
-- produces the final merged result
-- grouping by group_no ensures that gaps are honored
SELECT empno,
MIN(valid_from) AS valid_from,
MAX(valid_to) AS valid_to,
ename,
job,
mgr,
mgr_ename,
hiredate,
sal,
comm,
deptno,
dname
FROM calc_group
GROUP BY empno,
group_no,
ename,
job,
mgr,
mgr_ename,
hiredate,
sal,
comm,
deptno,
dname
ORDER BY empno,
valid_from
)
-- main
select * FROM merged WHERE empno = 7788;
EMPNO VALID_FROM VALID_TO ENAME JOB MGR MGR_E HIREDATE SAL COMM DEPTNO DNAME
----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ --------
7788 19-APR-1987 31-DEC-1989 SCOTT ANALYST 7566 JONES 19-APR-1987 3000.0 20 RESEARCH
7788 01-JAN-1990 20-JAN-1990 Scott ANALYST 7566 Jones 19-APR-1987 3000.0 20 RESEARCH
7788 22-JAN-1990 28-FEB-1990 Scott Analyst 7566 Jones 19-APR-1987 3000.0 20 RESEARCH
7788 01-MAR-1990 31-MAR-1991 Scott Analyst 7566 Jones 19-APR-1987 3000.0 20 Research
7788 01-APR-1991 31-JUL-1991 Scott Analyst 7566 Jones 19-APR-1987 3300.0 20 Research
7788 01-JAN-1992 31-DEC-9999 Scott Analyst 7566 Jones 01-JAN-1992 3500.0 20 Research
The named query “joined” produces an additional column RESULT_COLS (see highlighted lines 16-26). It’s simply a concatenation of all columns used in the group by clause of the named query “merged”. RESULT_COLS is used in the named query “calc_various” (see highlighted lines 86-88) to calculate the column NEW_GROUP. NEW_GROUP is set to 1 if the value of RESULT_COLS is different for the current and previous rows. NEW_GROUP ensures that Adam’s intervals valid from 01-APR-1991 and valid from 01-JAN-1992 are not merged. See lines 7 and 9 in the following listing.
EMPNO VALID_FROM VALID_TO ENAME JOB MGR MGR_E HIREDATE SAL COMM DEPTNO DNAME
----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ --------
7876 23-MAY-1987 31-DEC-1989 ADAMS CLERK 7788 SCOTT 23-MAY-1987 1100.0 20 RESEARCH
7876 01-JAN-1990 20-JAN-1990 Adams CLERK 7788 Scott 23-MAY-1987 1100.0 20 RESEARCH
7876 21-JAN-1990 28-FEB-1990 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 RESEARCH
7876 01-MAR-1990 31-MAR-1991 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 Research
7876 01-APR-1991 31-JUL-1991 Adams Clerk 7788 Scott 23-MAY-1987 1210.0 20 Research
7876 01-AUG-1991 31-DEC-1991 Adams Clerk 7788 23-MAY-1987 1210.0 20 Research
7876 01-JAN-1992 31-DEC-9999 Adams Clerk 7788 Scott 23-MAY-1987 1210.0 20 Research
The ORA_HASH function over RESULT_COLS could give me a shorter representation of all result columns. But since I do not detect hash collisions and this would lead to wrong results in some rare data constellations, I decided not to use a hash function.
The named queries “calc_various”, “calc_group” and “merged” are based on the query in scenario A of Merging Temporal Intervals with Gaps. The columns HAS_GAP, NEW_GROUP and GROUP_NO are explained in this post.
Conclusion
Joining and merging temporal intervals is indeed very challenging. Even if I showed in this post that it is doable I recommend choosing a simpler solution whenever feasible. E.g. limiting the query to a certain point in time for all involved temporal tables, since this eliminates the need to merge temporal intervals and even simplifies the gap-aware temporal join.
3 Comments
[…] results if “disconnected” intervals have the same content. Issues are addressed in part 2 of this […]
[…] time ago Philipp Salvisberg has posted several thoughts about joining and merging temporal intervals. Recently I was looking for some examples of using the new MATCH_RECOGNIZE clause introduced in 12c […]
[…] einiger Zeit hat mein Kollege Philipp Salvisberg ein paar interessante Beiträge zum Thema Joinen und Mergen der Zeitintervalle gepostet. Neulich war ich auf der Suche nach Anwendungsbeispielen für die neue MATCH_RECOGNIZE […]