Tarantool 2.11.0-rc1 is out! #8371
Totktonada
announced in
Announcements
Replies: 3 comments
-
Update 2023-04-04: added the new [Tarantool Enterprise] WAL extensions for stateless CDC tools section. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Update 2023-07-11: added the new Downgrade database schema section. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Minor update 2023-10-28: fixed the last but one example in the HTTP client enhancements section. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Overview
We’re happy to announce first Tarantool 2.11 release candidate!
Tarantool 2.11 development was started in April 2022 and it incorporates more than thousand commits from 42 authors. 2.11.0-rc1 offers several important features for administrators and application developers.
The release aims to solve major maintenance problems our users experienced in the past. We’re glad to introduce circuit breakers to prevent unintentional outage due to long queries: fiber slices and the SQL explicit
SEQSCAN
directive. The Enterprise Edition has received a lot of security enhancements. And of course there have been many improvements for developers, for ex: modular logging, improved HTTP client, manageable compatibility and so on.The goal of the release candidate is to collect feedback from our customers and keep in touch with the community. If you meet a problem with the release candidate, reach us using a dedicated issue form. It helps us to don’t miss any feedback.
Tarantool 2.11 tends to be worthy inheritor of the long supported 1.10 releases.
Table of Contents
compat
Enhancements for administration and maintenance
Limitation of fiber execution time slice
A mechanism that prevents fibers from running for too long without yielding was added. Each fiber is now assigned an execution slice which is the max time it can run without yielding execution to other fibers. There are two kinds of slices - warning and error. If the warning slice is exceeded, a warning is logged, but the fiber continues to run. If the error slice is exceeded, an error is raised by any function that checks the fiber slice. The fiber slice is checked by all functions operating on spaces and indexes (
index.select()
,space.replace()
, and others). It can also be checked explicitly by the application code with the newfiber.check_slice()
function. The max fiber slice is set with thefiber.set_max_slice()
function. The default value is controlled by thecompat
optionfiber_slice_default
: the old default is unlimited (no warnings or errors); the new default is 0.5 sec for the warning slice and 1.0 sec for the error slice.Explicit sequential scanning in the SQL engine
Tarantool is primarily designed for OLTP workloads: it means that data reads are supposed to be relatively small. However, with such a complex tool as SQL it is easy to overlook a suboptimal query and flood the database with heavy requests.
Now, there is ability to ask the SQL engine to ensure that a query doesn’t lead to full sequential scan of a table.
The new session setting
sql_seq_scan
is added to explicitly discard full table scanning.The new
SEQSCAN
keyword is added to explicitly allow table scanning.The keyword will be required for scanning queries since Tarantool 3.0 with ability to disable the check using the session setting. The new behavior can be enabled using a
compat
option (see a section at bottom of the announcement).Replica join retry
If the error is non-critical, the instance retries join automatically. For example, if automatic election is used, then a new instance can find the leader, but while it is sending a join request, the leader can resign. The join will fail saying the former leader is read-only. The waiting make sense in this case.
Strict fencing in RAFT
There is a situation when an old leader doesn't resign it's leadership before new leader may be elected. Because of this several "leaders" might coexist in a replicaset for some time.
Fencing (when enabled) makes leader resign it’s leadership when quorum of connections is lost. Connection is considered lost after being inactive (not responding) for more than death timeout. Death timeout is not set directly, but calculated from
replication_timeout
.The new strict fencing mode is implemented. Practically, it changes disconnect timeout for current RAFT leader that it is twice as short as for followers. Assuming that
replication_timeout
is the same for every replica in replicaset this makes it less probable that new leader can be elected before old one resigns it's leadership.The new
election_fencing_mode
option inbox.cfg
is added:'off'
— fencing turned off (leader wont resign). Connection death timeout is4*replication_timeout
for all nodes.'soft'
(default) — fencing turned on, but connection death timeout is the same for leader and followers in a replicaset. This is enough to solve cluster being readonly and not being to elect a new leader in some situations because of pre-vote. Connection death timeout is4*replication_timeout
for all nodes.'strict'
— fencing turned on. In this mode leader tries its best to resign leadership before new leader can be elected. This is achived by halving death timeout on leader. Connection death timeout is4*replication_timeout
for followers and2***replication_timeout
for current leader.The
election_fencing_enabled
option is deprecated in favor ofelection_fencing_mode
.New sane startup and bootstrap defaults
Initially
replication_connect_quorum
was designed to simplify replicaset bootstrap, but in fact it brings a lot of complexity and problems during cluster lifetime and maintenance operations.Users who didn’t touch the option encountered problems with partial cluster boostrap. Users who did touch the option encountered problems during instance restart.
In a search for sane defaults for
replication_connect_quorum
we found that such an option can’t have one default that will satisfy all requirements. So, we decided to deprecate it in favour ofbootstrap_strategy
which works during replicaset bootstrap and implies sane default values for other parameters based on replicaset configuration.On replica set bootstrap, the nodes will refuse to boot, unless a majority is reached (this would mean
replication_connect_quorum = 3
, when#box.cfg.replication
is 4 or 5, for example, orreplication_connect_quorum = 2
, when#box.cfg.replication
is 2 or 3). Moreover, the bootstrap leader will fail to boot unless it sees that every connected node chose it as the bootstrap leader.On new replica join to an existing cluster, the replica will fail to boot only if it couldn't connect to anyone. As long as at least one connection is established, the replica will try to join like before.
Moreover, the replica will check that its
box.cfg.replication
table contains every registered node in the cluster, thus ensuring that it has tried to connect to everyone and chose the best bootstrap leader possible.On replication reconfiguration on a working instance and recovery from local WAL files, the node will try to connect to everyone specified in
box.cfg.replication
. Any number of connections (even no connections) will be deemed a success, but the replica will stay in orphan mode until it is synced with everyone connected.If you wish to return to the old behavior, the
'legacy'
mode of thebootstrap_strategy
option inbox.cfg
is available. In this case the node behaves exactly like before: quorum for both connection and synchronisation is determined byreplication_connect_quorum
, and neither bootstrap leader nor joining replicas perform any additional checks on bootstrap.Downgrade database schema
The usual Tarantool upgrade process assumes database schema upgrading at the final stage of the upgrade. This action is required to enable all the new functionality.
There are rare cases, when a problem is revealed after some time after the upgrade. For example, the problem is in a periodically running task.
Before this release there was no way to return back to the old Tarantool version after calling
box.schema.upgrade()
. Now there is box.schema.downgrade(version), which makes the database suitable to use with the given Tarantool version.[Tarantool Enterprise] Encrypted SSL/TLS keys
Tarantool Enterprise Edition now supports password-protected SSL/TLS private key files.
A password can be provided either in the new URI parameter
ssl_password
or in a text file specified in the new URI parameterssl_password_file
.[Tarantool Enterprise] Security enforcement options
A set of security enforcement options was introduced to Tarantool Enterprise Edition.
First, it is now possible to enforce strength of a user password.
Tarantool will refuse to set a user password (
box.schema.user.passwd()
) if it doesn’t meet the specified requirements.Second, the admin may restrict authentication attempts over the network protocol (IPROTO).
[Tarantool Enterprise] PAP-SHA256 authentication method
Tarantool Enterprise Edition now supports the PAP-SHA256 authentication method.
In contrast to the default and only available so far CHAP-SHA1 authentication method, PAP-SHA256 applies a random salt to a user password before hashing and storing it in the database, which makes this method robust against brute force attacks using rainbow tables. It also uses the SHA256 hash function, which doesn’t have any known vulnerabilities.
However, the PAP-SHA256 authentication method implies passing the user password as plain text over the network channel so it’s safe to use only as long as the connection is encrypted. Tarantool connectors refuse to use this method unless SSL/TLS is enabled.
Features for application developers
Linearizable (quorum) read
Linearizability is a property of operations when operation performed on any node sees all the operations performed earlier on any other node of the cluster.
The property enables feed read requests to any node—disregarding whether it is a leader or a follower—and obtain data from last successful write. It allows to reduce leader’s load and, in fact, just very convenient for balancing requests on a client.
In addition to existing
box.begin()
'stxn_isolation
optionsread-committed
,read-confirmed
andbest-effort
the new one is added:linearizable
.There are several prerequisites to perform a linearizable transaction:
box.cfg.memtx_use_mvcc_engine
to be on.N
is the count of registered nodes in the cluster andQ
isreplication_synchro_quorum
. So, for example, you can’t perform a linearizable transaction on anonymous replicas.When called with
{txn_isolation = 'linearizable'}
,box.begin()
yields until the instance receives enough data from remote peers to be sure that the transaction is linearizable. This call may fail in case the node can't contact enough remote peers to determine which data is committed or if waiting for commited data is timed out.Pagination
The
index.select()
andindex.pairs()
methods now support pagination: you can continue iteration from where the last call stopped by passing the last returned tuple via the newafter
option.The
after
option takes either a tuple containing the indexed key parts or an iteration position. An iteration position is represented by a base64 string. It’s returned byindex.select()
in the second value if thefetch_pos
option is set. Note thatindex.pairs()
doesn’t supportfetch_pos
.The new
after
andfetch_pos
options are also available via the network protocol IPROTO and implemented by the builtin connectornet.box
.Per-module logging
The logging subsystem now supports modules. A log module can be created with the
require('log').new()
function. The function takes the module name, which will be included into all messages printed using the new module.It is possible to configure a different log level for each module with the
box.cfg.log_modules
configuration option.If the global logger is used, the log module name deduced automatically from the Lua module name. For example, suppose you have the
foo/bar.lua
module with the following content:Even though this module uses the global logger, messages logged from this module will include the module name
foo.bar
, as if a logger module created withlog.new('foo.bar')
was used:HTTP client enhancements
Nowadays HTTP + JSON and REST are default protocols for microservices communication.
The built-in HTTP client gains several usability improvements and the new streaming API.
The client now automatically encodes provided data into JSON or other format pointed by the
Content-Type
HTTP header. It supports all the tarantool built-in types.Query parameters and data forms are now encoded from the new
params
request option.The query parameters encoding is RFC 3986 compliant. The data forms encoding follows
application/x-www-form-urlencoded
WHATWG’s URL standard.The HTTP client now supports chunked write of request data and reading a response piece-by-piece. Examples of use cases are uploading a large file to server or subscribing to changes in etcd via the GRPC-JSON gateway.
Interactive debugger for Lua code
It used to be very hard to debug Lua sources using standard Tarantool console facilities. Usually one ends up with a number of debugging prints inserted here and there. In this release we introduce console debugger shell which significantly simplify debugging scenario for Lua modules.
Lua debugger supports all basic operations you may expect from debugger: step-by-step execution, investigations of variables in local or global context, traversing stack frames, setting breakpoints, and so on.
Demo https://asciinema.org/a/560039
In addition to the instrumented approach of calling debugger via
require('luadebug')()
call mentioned above, one may use a less invasive approach, using the new-d
option. This way you may debug Lua scripts directly, without any prior instrumentation:In addition to the console shell accessible from command-line, there is VSCode extension published “Tarantool Lua Debugger” (download source) which supports similar scenarios as console debugger described above.
NB! Both console and vscode debuggers do not yet support debugging of a fiber-enabled code. This limitation will be lifted in the future releases.
NB! A developer who is interesting in debugging of a Lua code may also be curious about the built-in memory profiler tool.
[Tarantool Enterprise] Read views
It is now possible to create a consistent read view of the entire database in Tarantool Enterprise Edition. A read view object provides access to a frozen image of all spaces that existed at the time when the read view was created.
A read view may be useful for performing long-running queries while ensuring data consistency. We use the copy-on-write mechanism to implement read views so memory consumption should be limited by the size of the data set updated after read view creation.
Note that read views are currently supported only by the memtx engine while vinyl spaces are silently ignored.
[Tarantool Enterprise] Data compression improvements
A new data compression algorithm was built in Tarantool Enterprise Edition. Now, in addition to zstd and lz4, Tarantool also supports zlib for tuple compression.
Also, the new Lua module
compress
was introduced. The module provides an API for compressing and decompressing arbitrary data strings with the same algorithms that are available for tuple compression.[Tarantool Enterprise] WAL extensions for stateless CDC tools
Tarantool writes write-ahead logs (WAL,
*.xlog
files) to recovery data after an incident (such as a power loss). It is basically a sequence of operations: INSERT, REPLACE, UPDATE, DELETE and so on.The same representation is used to feed data from master to replica. It is a stream of operations too.
Let’s check an example.
The UPDATE operation contains the key and the operation itself (in the
tuple
field). There is no need to store all the fields of the original tuple: just operation is enough to apply it again after an incident or to apply it on a replica after receiving from master.However, there are situations, when we want to store all the tuple’s fields in WAL files and/or send all the fields to a replica.
The replication stream is a powerful concept to construct data transformation tools (often called CDC or change data capture tools). For example, we may want to replicate data from Tarantool to another storage in order to perform analytic queries. We can write an external service, which subscribes to Tarantool as an anonymous replica and executes the operations on the other storage.
Let’s assume that the other storage supports only REPLACE and DELETE operations. We can’t execute an update statement as is: it is necessary to know all the fields of the tuple either before or after the operation.
We can accumulate all the tuples that are go over the service, but this solution has many disadvantages: enormous resources consumption at least, and complexity of an implementation.
Now, it is possible to ask Tarantool to store the old or the new tuple in WAL files and feed it to replicas. It allows to make such a service stateless.
Let’s correct the previous example and look what changes.
The new
new_tuple
andold_tuple
fields are added into each DML operation, not only to UPDATE operations. It is particularly useful for UPDATE/UPSERT, but may be also needed for REPLACE (to know what was replaced) and DELETE (to know what was deleted).The new
wal_ext
box.cfg() option allows to store/send all the tuple’s fields for all DML operations:The
old
and thenew
fields control whether to store/send the tuple before and/or after the operation. It can be enabled selectively for certain spaces:Or, in the exclusive manner:
Under the hood: stability, security, compatibility
There were many nice activities in Tarantool development team that couldn’t be present as new features, but important for success of the technology.
In 2022 (and early 2023) the team paid significant effort to ensure a good quality of the product.
We’re working on refactoring of the code to make it safer and easier to maintain. We added CodeQL analyzer into the development workflow and analyzed a lot of reports from this and other analyzers to improve stability and security.
We’re working with internal and external security experts to make in depth analysis of possible problems.
We’re spending hours and days in discussions about proper ways to implement new functionality compatible and extensible. Now, much more features are designed from scratch with seamless upgrading, backward and forward compatibility in the mind.
That’s not a finish, but the road and we’re proud to say that we’re involving here!
Preview of a future behavior:
compat
As developers of a development platform we’re responsible for seamless upgrading for existing users and providing good safe defaults for new users.
The usual way to drive forward is to use semantic versioning to setup proper expectations from a release. A major version bump means possibly breaking change and may require attention of an application developer.
However, major versions are rare and upgrading between them is often a headache. Semantic versioning makes a human involvement expectable, but it doesn’t make upgrading of an application itself easier.
In order to make the upgrades less like showstoppers we introduced a set of compatibility options.
Each such option represents a future change of a default behavior. The list of 2.11.0-rc1
compat
options is shown in the listing above. All the options are set toold
, which means the old behavior: nothing is changed without a human involvement.A developer may test the application against the new behavior:
And verify that it’ll work nice on future major Tarantool version. After this, it is easy to dump the options to hold them in a instance file (init.lua or so):
In turn, it will be possible to set an option to
old
on future Tarantool 3.0 to return to 2.11’s behavior. So, if an application is not ready for handling the new default, a developer may postpone updating application’s code, but still use Tarantool 3.0.All in all, we’re hope that all this effort will make it possible to flexible plan developer’s time for upgrading and as result make the upgrades smooth.
Further reading
This announcement presents only several enhancements. A full list of changes is in the release notes. Many bugfixes land to 2.10 bugfix releases and are listed there: 2.10.5, 2.10.4, 2.10.3, 2.10.2, 2.10.1 (they’re, of course, present in the release candidate too).
We’re glad to see your feedback on the features presented above and all the other new ones. We’re collecting reports regarding the release candidate using a dedicated issue form.
Beta Was this translation helpful? Give feedback.
All reactions