Announcing ppx_mysql 1.0
A saner way to query MySQL
Readers of Issuu’s engineering blog will know that we use the OCaml programming language in many of our production systems. As members of the small but vibrant OCaml community, we are proud to contribute to the health of this ecosystem by releasing as open-source many of the libraries and tools developed for internal use. This blog post describes the genesis and development of one such tool: ppx_mysql, a syntax extension aimed at reducing the pain and boilerplate traditionally associated with using MySQL bindings from OCaml.
The Genesis of ppx_mysql
Issuu makes extensive use of MySQL for relational data storage.
Fortunately, ocaml-mysql offers OCaml bindings to
libmysqlclient
, and we have relied on those for many years. Though grateful that we could find in
the OCaml community a ready-made way to communicate with MySQL from OCaml, we have long suffered the
disadvantages of the fairly low-level interface offered by these bindings.
Our primary source of frustration with the low-level interface is the copious amount of boilerplate that it requires. Statements must be prepared, the input parameters must be serialised into strings, the result set must be looped through, and the output parameters must be deserialised into the correct types. This is cumbersome to write, tedious to review, and prone to errors. There was a strong consensus that we needed a better solution!
In many language ecosystems, the common approach that sidesteps the abovementioned issues is to rely instead on a library offering some form of Object Relational Mapping (ORM). ORMs bring their own set of problems, however. For one, there is typically some impedance mismatch between the relational model and the object model used by the ORM. Moreover, effective use of an ORM demands learning yet another API that typically encompasses only a subset of SQL — and seldom the subset you actually require! This is particularly galling at a company like Issuu, where our backend engineers are comfortable with SQL and want to make practical use of its features and expressiveness.
Fortunately, there is a third way. When it comes to interfacing with SQL databases — though alas
restricted to PostgreSQL — the best known example in the OCaml community is arguably
PG’OCaml. Originally developed by Richard W. M. Jones,
and maintained and upgraded over the years by various members of the OCaml community, PG’OCaml
leveraged the meta-programming facilities of OCaml to offer the best of both worlds: it allows you
to write SQL directly within your program (with some special sigils for identifying input
parameters), and automatically generates the boilerplate code required for interfacing with the
PostgreSQL server. Moreover, it takes advantage of the Describe
command in PostgreSQL’s wire
protocol to determine the types of the query parameters at compile-time, and is this way able to
automatically invoke the appropriate (de)serialisation functions.
Even though MySQL does not have the same Describe
feature offered by PostgreSQL, and therefore
some degree of manual type annotations would always be required, we felt that a syntax extension in
the spirit of PG’OCaml’s could nevertheless offer a substantial improvement to the way we used the
MySQL bindings for OCaml.
Design considerations
Introduced with OCaml 4.02, PPX rewriters quickly replaced the old Camlp4 system as the standard way to annotate and extend OCaml’s grammar for compile-time code generation. It was therefore obvious from the get-go that some sort of PPX would be the way forward for us as well. The space of design possibilities is nevertheless vast, so in the following sections we discuss and justify the decisions we took. These sections also serve as an introduction to the various features of the PPX.
Overall format
The first design decision concerns the type of code that the PPX should generate. Interestingly, the two well-known projects using syntax extensions to facilitate communication with SQL databases (the aforementioned PG’OCaml and OCaml-sqlexpr) each took a different route. Our approach was inspired by PG’OCaml, but it is not quite the same.
Basically, our PPX expects its extension points to take the form [%mysql ACTION SQL]
, where
ACTION
is one of execute
/ select_all
/ select_one
/ select_opt
, and SQL
is a string
containing the actual SQL statement. The code it generates is a function that takes at least one
parameter (the database handle) and potentially more depending on the number of input parameters
declared within the SQL query. The function’s return type depends on the output parameters in the
query, and on the ACTION
. The latter also determines how the result set from MySQL should be
processed.
The existence of the various actions is based on the realisation that one invariably performs
one of four different kinds of post-processing on a query’s result set. Action execute
is
meant for queries whose sole purpose is to modify the database, and whose result set is meant
to be empty. Actions select_all
, select_one
, and select_opt
are meant for SELECT
queries
whose result set should contain a list with an arbitrary number of rows, a single row, or either
zero or one rows, respectively.
Use the result type instead of exceptions
At Issuu, we avoid using
exceptions
for signaling errors, preferring to rely instead on the result type using polymorphic variants for
errors. It should not come as a surprise
that we have adopted this same policy for ppx_mysql
. Note that in practice this means that the
set of error variants in the generated function’s result type will depend on its action and on the
parameters declared in the SQL string. As an example, a query using select_one
as its action
includes Expected_one_found_none
among the possible error variants, which is absent if the action
is instead select_all
. In a similar vein, the error variant Column_errors
— used to signal
deserialisation errors — is only present if the SQL string actually contains any output parameters.
The only variant common to all possible generated functions is Mysql_error
, which is returned in
case an internal error is encountered.
Manual type annotations
Unfortunately, MySQL does not provide anything similar to the Describe
feature offered by
PostgreSQL, and therefore manual type annotations are required. Superficially, our conventions
resemble those used by OCaml-sqlexpr, where input parameters
are prefixed with %
, output parameters are prefixed with @
, and where a ?
after the type
specifier indicates we are dealing with a nullable column. The similarities end there, however.
Instead of single-letter type specifications, we use the full name of the OCaml type. Moreover, we
also require that input parameters include the name of a variable inside curly brackets. This is
because the function generated by the PPX uses named parameters for all the inputs. As an example,
consider the signature of the function generated by the PPX in the code below. Note how the names
and types of the function’s parameters match those provided within the query.
# [%mysql execute "INSERT INTO users (id, name, phone) VALUES (%int{id}, %string{name}, %string?{phone})"];;
- : Mysql.dbd -> id:int -> name:string -> phone:string option ->
(unit, [> `Expected_none_found_one | `Mysql_error of exn ]) IO_result.t
= <fun>
Compared with PG’OCaml, we lose the compile-time verification that the column types in the database match those used in the OCaml code. We even lose the ability to check at compile-time that the SQL statement is valid in the first place. On the other hand, the lack of coupling between our PPX and the database does have the advantage that compile-time access to the MySQL server is not required.
Support for custom types and (de)serialisation functions
As
described
in a previous post about Issuu’s best OCAML practices, we tend to wrap all sorts of identifiers in
modules, and we make the type itself opaque. This practice has many advantages when it comes to
safety and abstraction, but it did pose a problem for the first versions of ppx_mysql
. Suppose we
have an opaque type Foobar.t
, represented internally as just an integer, and that module Foobar
offered functions to_int
and of_int
. If ppx_mysql
supported only OCaml’s base types, one
would need to use the int
type specifier within the SQL expression, and manually call
Foobar.to_int
or Foobar.of_int
depending on whether we were dealing with an input or output
parameter. Not only does this force us to write boring boilerplate code, it also breaks the
abstraction offered by the Foobar
module. A better solution was in order.
The solution we converged upon is simple yet effective. Whenever the PPX encounters a type
specification whose first letter is capitalised, it assumes it is dealing with a module that
satisfies the signature below. Note that the reason why functions of_mysql
and to_mysql
use
strings for (de)serialisation is because that is the representation used by MySQL’s wire protocol.
module type SERIALIZABLE = sig
type t
val of_mysql : string -> (t, string) result
val to_mysql : t -> string
end
As an example, suppose the PPX encounters the type specification %Foobar{foobar}
. It will generate
a function with a named parameter foobar
of type Foobar.t
, and it will call function
Foobar.to_mysql
to serialise it. No more boilerplate needed and abstraction is preserved.
Support for lists as input parameters
Consider the SQL statements below, which exemplify two common use cases for lists in SQL: Inserting multiple rows with a single statement, and selecting based on list membership.
INSERT INTO users (id, name) VALUES (1, 'john'), (2, 'mary'), (3, 'claire');
SELECT COUNT(*) FROM users WHERE id IN (1, 2, 3);
Our PPX supports both use cases via a special list
type specification available for
input parameters. The code below illustrates its use for each of the SQL statements
under discussion:
# [%mysql execute "INSERT INTO users (id, name) VALUES %list{(%int{id}, %string{name})}"];;
- : Mysql.dbd -> (int * string) list -> (unit, [> ... ]) IO_result.t = <fun>
# [%mysql select_one "SELECT @int{COUNT(*)} FROM users WHERE id IN (%list{%int{id}})"];;
- : Mysql.dbd -> int list -> (int, [> ... ]) IO_result.t = <fun>
Note that for the sake of clarity and brevity we replaced the lengthy list of error variants with an
ellipsis (...
). More to the point, notice the placement of the parentheses in each of the SQL
statements: inside the %list
curly brackets in the first example, but outside the %list
in the
second one. The reason for this becomes obvious once you realise that the PPX replaces whatever it
encounters within the %list
by a comma-separated list.
PG’OCaml users will find parallels between the usage of %list
in the second example and PG’OCaml’s
$@
notation for list expressions. There is no PG’OCaml equivalent to the first example, however.
Being able to provide a list of tuples as input parameter is a novel and useful feature offered by
our PPX.
Notice also that there is no way to specify a name for the list parameter, and that it shows up as a positional argument to the generated function. This reflects an underlying limitation of this feature: only one list parameter may be used per statement, and you will get a compile-time error if multiple uses are attempted. In fairness though, we have yet to encounter a situation where this limitation impacted us in practice.
A discussion of this feature is not complete without mentioning an important caveat. The generated
function must always be provided with a non-empty list, and it will return an Empty_input_list
error variant otherwise. We briefly considered creating an explicit type for non-empty lists (such
as type 'a nelist = 'a * 'a list
) and using that in the generated function. However, experience
tells us that such types end up being in practice too much of a hassle, so we opted for plain lists
in the end.
Not tied to any particular I/O monad or MySQL binding
There is a well know schism in the OCaml community between the two main libraries offering monadic concurrency for I/O, Lwt and Async. Therefore, being able to abstract over the I/O monad is an almost mandatory requirement, which our PPX satisfies. Moreover, though the PPX itself depends neither on Lwt or Async, we do include a simple “Hello World” example illustrating how to use the PPX with projects based on Lwt or Async.
Note that the code generated by the PPX does not even have a dependency on
ocaml-mysql itself. Instead, it expects the presence in the
environment of the signature
PPX_MYSQL_CONTEXT
described in the project’s README, and therefore the declaration of the three modules (IO
,
IO_result
, and Prepared
) used by the generated code. This is the mechanism used to ensure that
the generated code can be used with any I/O monad. It has the additional benefit of not tying the
PPX to any particular MySQL client library. In fact, it should be easy to use a MariaDB client
library instead.
Also part of the project though packaged separately is ppx_mysql_identity
. This library offers an
implementation of the PPX_MYSQL_CONTEXT
signature described above, using the identity monad for
I/O and ocaml-mysql as the client library.
Automatic caching of prepared statements
The code generated by our PPX relies solely on MySQL prepared statements. There are two reasons behind this choice. The first is about security: SQL injection attacks are the poster child of bad security practices, and using prepared statements avoids them. The second reason concerns performance: there is some overhead associated with parsing a SQL statement, and by using prepared statements we need only incur that cost the first time the statement is used.
At this point, the astute reader may be wondering how the code generated by the PPX knows that a given statement has already been prepared. The approach we chose is similar to the one used by PG’OCaml: the database handle required by the PPX is not the raw handle used by the underlying MySQL client library. Instead, it’s an opaque type whose internal representation is a record that includes the original raw handle and a hash table that maps SQL query strings into the corresponding prepared statements.
There is one important caveat with the approach described above. If the SQL query contains list parameters, each list size will result in the preparation of a different statement. Should you use list parameters where the list size varies dramatically, you may end up with a very large cache of prepared statements taking up resources both on the client and on the MySQL server. We are fully aware that this is not a desirable situation. Nevertheless, we also feel that it would be draconian to simply disable statement caching for all queries with list parameters. After all, there are use cases where the list size is bounded and small, and therefore where it makes sense to cache prepared statements for all possible list sizes.
Obviously the PPX cannot guess whether it is reasonable or not to cache any given statement. The burden of this decision should fall on the user. Therefore, we plan to release soon a 1.1 version of our PPX offering per-query control of whether or not the resulting statement(s) should be cached.
Conclusion
Looking back at the development of this project, some lessons stand out. First, we were fortunate
to have started working on this PPX after ppxlib came into
existence. Not only does ppxlib
bring the disparate libraries and tools required for writing a
PPX under the same umbrella, it also provides a framework that abstracts away the low-level details
of hooking into the compiler, and allows for multiple PPXs to coexist peacefully. Unfortunately,
most of the resources found on the Internet were written in the age before ppxlib
, and are
therefore out of date and a potential source of confusion for newcomers to the ecosystem.
A perhaps more important lesson concerns the PPX mechanism itself. It started out as a clever hack, but a hack nonetheless. In an ideal world, a principled way to generate code at compile-time would’ve been designed from the ground up together with the language. Some of the pitfalls associated with writing a PPX — such as forgetting to fully qualify identifiers in the generated code and thus fall prey to the lack of hygiene — could have been sidestepped altogether in this alternate world. We are nevertheless cognisant of the Nirvana fallacy. Though not perfect, the PPX mechanism has made a positive contribution to the OCaml ecosystem, and we encourage those on the fence to give it a try.
The reader looking for more information about ppx_mysql
should refer to the project’s home
page, which contains a fairly detailed README. And there are
of course OPAM packages for both
ppx_mysql and
ppx_mysql_identity.