Introduction
In the last episode, we looked at some features in PostgreSQL which I miss in the Oracle Database. The IslandSQL grammar now covers PL/SQL and the related DDL statements. The implementation was more complex than expected, mainly because of the incompatibilities between PostgreSQL and the Oracle Database. I will probably deal with this topic in a future blog post.
Now I’d like to talk about the new features in Oracle Database 23ai. Not about all features in the New Features Guide, but only about some changes since the release of 23.3, which was known as 23c. It’s hard to find changes from 23.3 to 23.4 in the documentation. So, I guess it’s worth a blog post. I focus on the features that are relevant to the IslandSQL grammar. In other words, the interesting ones from a developer’s perspective.
1. Vector Data Type
A vector is a number array for which you can optionally define the number of dimensions (size) and the data type (int8, float32 or float64) of the dimension values. This data type is the basis for the vector search functionality.
Here’s a slightly amended example from the documentation creating and populating a table with a vector data type.
drop table if exists galaxies purge;
create table galaxies (
id number not null primary key,
name varchar2(10 char) not null unique,
embedding vector(5, int8) not null, -- 5 dimensions, stored as int8
doc varchar2(120 char) not null
);
insert into galaxies
(id, name, embedding, doc)
values
(1, 'M31', '[0,2,2,0,0]', 'Messier 31 is a barred spiral galaxy in the Andromeda constellation which has a lot of barred spiral galaxies.'),
(2, 'M33', '[0,0,1,0,0]', 'Messier 33 is a spiral galaxy in the Triangulum constellation.'),
(3, 'M58', '[1,1,1,0,0]', 'Messier 58 is an intermediate barred spiral galaxy in the Virgo constellation.'),
(4, 'M63', '[0,0,1,0,0]', 'Messier 63 is a spiral galaxy in the Canes Venatici constellation.'),
(5, 'M77', '[0,1,1,0,0]', 'Messier 77 is a barred spiral galaxy in the Cetus constellation.'),
(6, 'M91', '[0,1,1,0,0]', 'Messier 91 is a barred spiral galaxy in the Coma Berenices constellation.'),
(7, 'M49', '[0,0,0,1,1]', 'Messier 49 is a giant elliptical galaxy in the Virgo constellation.'),
(8, 'M60', '[0,0,0,0,1]', 'Messier 60 is an elliptical galaxy in the Virgo constellation.'),
(9, 'NGC1073', '[0,1,1,0,0]', 'NGC 1073 is a barred spiral galaxy in Cetus constellation.');
commit;
The vector in this example has 5 dimensions with the following meaning:
- Number of occurrences of
intermediate
in thedoc
column - Number of occurrences of
barred
in thedoc
column - Number of occurrences of
spiral
in thedoc
column - Number of occurrences of
giant
in thedoc
column - Number of occurrences of
elliptical
in thedoc
column
2. Vector Functions and PL/SQL Packages
The vector_distance function is a key functionality for similarity searches. Let’s say we want to see galaxies that are similar to NGC1073
(barred
and spiral
). The following query shows how vector functions can help to get this result:
with ngc1073 as (select vector('[0,1,1,0,0]', 5, int8) as query_vector)
select name,
round(vector_distance(embedding, query_vector, cosine), 3) as cosine_distance,
round(vector_distance(embedding, query_vector, dot), 3) as inner_product,
round(vector_distance(embedding, query_vector, euclidean), 3) as l2_distance,
round(vector_distance(embedding, query_vector, euclidean_squared), 3) as l2_squared,
round(vector_distance(embedding, query_vector, hamming), 3) as hamming_distance,
round(vector_distance(embedding, query_vector, manhattan), 3) as l1_distance
from galaxies, ngc1073
order by cosine_distance;
NAME COSINE_DISTANCE INNER_PRODUCT L2_DISTANCE L2_SQUARED HAMMING_DISTANCE L1_DISTANCE
---------- --------------- ------------- ----------- ---------- ---------------- -----------
M31 0 -4 1.414 2 2 2
M77 0 -2 0 0 0 0
M91 0 -2 0 0 0 0
NGC1073 0 -2 0 0 0 0
M58 .184 -2 1 1 1 1
M63 .293 -1 1 1 1 1
M33 .293 -1 1 1 1 1
M60 1 0 1.732 3 3 3
M49 1 0 2 4 4 4
9 rows selected.
M77
and M31
have identical vectors and they are expected to be very similar. However, M31
is interesting. It has a similar shape to NGC1073
but the words barred
and spiral
appear twice. This is a good match only for some distance metrics.
A conventional query, e.g. based on Oracle Text might be good enough to find similar galaxies. However, if you have vectors with tons of dimensions then a vector similarity search might become appealing. Especially, since you can index vector columns to speed up your queries.
The following SQL functions are provided for the vector data type in the Oracle Database:
- vector_chunks
- vector or to_vector
- vector_dimension_count or vector_dims
- vector_dimension_format
- vector_distance or cosine_distance or inner_product or l1_distance or l2_distance
- vector_embedding
- vector_norm
- vector_serialize or from_vector
Furthermore, the following supplied PL/SQL packages provide additional functionality related to the vector data type:
3. Shorthand Operators for Distances
For some vector_distance
metrics shorthand operators (<=>
, <->
, <#>
) are available. Here’s an example:
with ngc1073 as (select vector('[0,1,1,0,0]', 5, int8) as query_vector)
select name,
round(embedding <=> query_vector, 3) as cosine_distance,
round(embedding <-> query_vector, 3) as l2_distance,
round(embedding <#> query_vector, 3) as inner_product
from galaxies, ngc1073
order by cosine_distance;
NAME COSINE_DISTANCE L2_DISTANCE INNER_PRODUCT
---------- --------------- ----------- -------------
M31 0 1.414 -4
M77 0 0 -2
M91 0 0 -2
NGC1073 0 0 -2
M58 .184 1 -2
M63 .293 1 -1
M33 .293 1 -1
M60 1 1.732 0
M49 1 2 0
9 rows selected.
4. Approximate Similarity Searches
You can imagine producing a top-N result of the previous queries using the row_limiting_clause. This clause was introduced in 12.1 to limit search results. Exactly to be precise. In 23.4 the clause was extended to support approximate similarity searches. The idea is to get a good enough result with a better performance when using vector indexes. Here’s an example:
select name, embedding
from galaxies
order by embedding <=> vector('[0,1,1,0,0]', 5, int8)
fetch approx first 3 rows only
with target accuracy 80 percent;
NAME EMBEDDING
---------- --------------------
M31 [0,2,2,0,0]
M91 [0,1,1,0,0]
M77 [0,1,1,0,0]
5. Source and Destination Predicates in Graph Operator
The graph operator is a new 23ai feature. In 23.4 it got two additional predicates: source_predicate and destination_predicate. It allows us to test if a vertex is a source or a destination of an edge. The direction of the arrow so to speak. Here’s a formatted example from the documentation, based on this model.
select *
from graph_table (students_graph
match (p1 is person) -[e is friends]- (p2 is person)
where p1.name = 'Mary'
columns (
e.friendship_id,
e.meeting_date,
case
when p1 is source of e then
p1.name
else
p2.name
end as from_person,
case
when p1 is destination of e then
p1.name
else
p2.name
end as to_person
)
)
order by friendship_id;
FRIENDSHIP_ID MEETING_DATE FROM_PERSON TO_PERSON
------------- ------------------- ----------- ----------
1 19.09.2000 00:00:00 Mary Alice
5 19.09.2000 00:00:00 Mary John
7 10.07.2001 00:00:00 Bob Mary
6. Breaking Change for Inlined MLE Call Specification
In 23.3 the following code works.
create or replace function get42 return number is
mle language javascript q'[return 42;]';
/
select get42();
Function GET42 compiled
GET42()
----------
42
In 24.4 the same code produces this error:
Function GET42 compiled
LINE/COL ERROR
--------- -------------------------------------------------------------
2/28 PLS-00881: missing closing delimiter 'q'[return' for MLE language code
Errors: check compiler log
The reason is a change in the syntax of the inlined MLE call specification. The JavaScript code cannot be passed as a string anymore. Instead, a new type of delimiter must be used as documented here. You can use almost any sequence of characters as a delimiter for the JavaScript code. The same character sequence must be used for the start and the end delimiter. Exceptions are the pairs ()
, []
, {}
and <>
.
PostgreSQL dollar-quoted string constants are also valid delimiters. This works in 23.4:
create or replace function get42 return number is
mle language javascript $code$ return 42;$code$;
/
select get42();
Function GET42 compiled
GET42()
----------
42
Please note that the space after the first $code$
is required for the Oracle Database to recognize the end of the delimiter.
This change might simplify the implementation of additional MLE languages. Nevertheless, this is a breaking change that requires an amendment of the existing code base.
Outlook
In the next episode, the IslandSQL grammar will be extended to cover the missing statements with a query block. This means the
create view
,create materialized view
andcreate table
statement
for PostgreSQL and the Oracle Database. Only two episodes are left until the end of the first IslandSQL season.
1 Comment
[…] the last episode, we looked at some new features in Oracle Database 23.4. The IslandSQL grammar now covers all […]