Welcome to itemdb’s documentation!¶
The itemdb library allows you to store and retrieve Python dicts in a database on the local filesystem, in an easy, fast, and reliable way.
Based on the rock-solid and ACID compliant SQLite, but with easy and
explicit transactions using a with
statement. It provides a simple
object-based API, with the flexibility to store (JSON-compatible) items
with arbitrary fields, and add indices when needed.
Guide¶
Introduction¶
The itemdb library allows you to store and retrieve Python dicts in a database on the local filesystem, in an easy, fast, and reliable way.
To be more precise: it is an ACID compliant transactional database for storage and retrieval of JSON-compatible dict items. That sounds very technical; let’s break it down:
- ACID means it has desirable database features of atomicity, consistency, isolation and durability. We’ll get back to these when we talk about transactions.
- The itemdb API focuses on making transactions easy and explicit.
- JSON is used to serialize the dict items, so the values in the dicts
are limited to:
None
,bool
,int
,float
,str
,list
,dict
.
In practice, itemdb uses the rock-solid SQLite, and provides an object-based API that requires no knowledge of SQL.
You can use itemdb in a wide variety of applications. This includes web-servers, though once your traffic scales up, you may want to consider something like PostgreSQL or perhasps a hosted db.
Opening a database¶
In itemdb (like SQLite) each database is represented as a file. One can also
use ":memory:"
to create an in-memory database for testing/demo purposes.
db = ItemDB(filename)
db = ItemDB(":memory:")
Creating tables and indices¶
Each database consists of tables, and the tables contain the items. A “table” is what is also called “table” in SQL databases, a “collection” in MongoDB, and an “object store” in IndexedDB.
You can create a table using ensure_table()
. It is safe to call this
before every time that you use the database, because it returns fast if the
table already exist:
db.ensure_table("some_table_name")
In the same call we can also specify indices. Indicices represent fields in the
items that are indexed, so that they can be used to retrieve items fast,
using select()
, count()
and delete()
.
Indices can also be prefixed with a “!”, marking the field as mandatory and unique, making it possible to identify items.
db.ensure_table("persons", "!name", "age")
We can now select()
items based on the name
and age
fields, and no
two items can have the same value for name
.
Note
No new fields can be marked unique once the table has been created.
Note
In the examples below we mark the “name” field as unique, but strictly speaking this is wrong, because different persons can have the same name. Another form of ID would be more appropriate in real use-cases.
Add some items¶
An “item” is what is called a “row” in SQL databases, a “document” in MongoDB, and an “object” in IndexedDB. Let’s add some to our table!
with db:
db.put_one("persons", name="Jane", age=22)
db.put_one("persons", name="John", age=18, fav_number=7)
db.put("persons", {"name": "Guido"}, {"name": "Anne", "age": 42})
You can see how we use with db
here. This is because itemdb requires using
a transaction when making changes to the database. Everything inside
the with statement is a single transaction. More on that later.
You can also see that with put_one()
we can use keyword arguments to specify fields,
while with put()
we can specify multiple items, each items a dict.
The dictionary can contain as many fields as you want, including sub-dicts and lists.
Although the age
field is indexed, it is not mandatory (you can
select items with missing age using db.select("persons", "age is NULL")
).
Since the name
field is unique, if we put
an item with an existing name,
it will simply update it:
# John had his birthday and changed his favourite number
with db:
db.put_one("persons", name="John", age=19, fav_number=8)
Make some queries¶
Use e.g. count()
, select()
to query the database:
>>> db.count_all("persons")
4
>>> db.select("persons", "age > ?", 20)
[{'name': 'Jane', 'age': 22}, {'name': 'Anne', 'age': 42}]
>>> select_name = "John"
>>> db.select_one("persons", "name = ?", select_name)
{'name': 'John', 'age': 19, 'fav_number': 8}
Avoiding SQL injection¶
SQL injection is a technique by which a potential hacker could access your database to get access or destroy data. The common path for SQL injection is to write SQL code in an end-user input field.
For example, imagine a website that sells paintings, which may have a field for the minimum size. You’ll want to use
the given size in a query (e.g. a db.select()
) so you can show the user all paintings that
qualify. Now imagine that an attacker writes SQL code in that input field.
If the input is not sanitized, your db is compromised!
This is the reason for the ?
notation used throughout these docs - the actual arguments
are passed to SQLite in a safe way. It’s a good habit to always provide query arguments this way.
Transactions¶
Transactions are an important concept in databases. In ACID databases (like itemdb) it has a number of features:
- A transaction is atomic (either the whole transaction is applied, or the whole transaction is not applied)
- A transaction is applied in isolation, even when multiple processes are interacting with the database at the same time. This means that when a transaction is in progress, another process/thread that wants to apply a transaction that “intersects” with the ongoing operation, it will wait. (This even works for multiple Docker containers operating on the same SQLite database.)
- The remaining elements of ACID (consistency and durability) mean that the database always remains in a healthy state. Even on a power outage or if the system crashes halfway a transaction.
In itemdb, transactions are easy, using a context manager. Let’s have a look at some examples:
# Increasing a value is recommended to do in a transaction.
with db:
player = db.select("players", "name == ?", player_name)
player["position"] += 2
db.put("players", player)
# The below has no effect: the transaction fails and is rolled back
with db:
db.put_one("persons", name="John", age=21, fav_number=8)
raise RuntimeError()
Database maintenance¶
Sometimes, you may want to add unique keys to a table or remove existing indices. This is possible by copying the items to a new table and then replacing the new table with the old. By doing this inside a transaction, it can be done safely:
with db:
db.ensure_table("persons2", "!id", "name", "age")
for i, person in enumerate(db.select_all("persons")):
# Make sure each person has an id, e.g.:
person["id"] = i
db.put("persons2", person)
db.delete_table("persons")
db.rename_table("persons2", "persons")
At the time of writing, itemdb does not provide an API for backups or vacuuming, but it’s just SQLite under the hood, so you can use the common methods.
Going Async¶
The API of ItemDB
is synchronous. It operates with the filesystem, so
it can benefit from async use a lot.
There are two ways to make your code async. The first is by using the
AsyncItemDB
class. It has the exact same API as ItemDB
, but all its
methods are async. Note that you must also use async with
.
The second approach is to asyncify a synchronous function. The idea of this approach is to do all itemdb operations inside a function and then wrap that function if you want to use it in an async environment. Consider the following example of a web server:
@itemdb.asycify
def push_items(filename, items):
db = ItemDB(filename)
db.ensure_table("my_table", "!id", "mtime")
with db:
...
db.put("my_table", items)
async def my_request_handler(request):
...
# Because we decorated the function with asyncify,
# we can now await it, while the db interaction
# occurs in a separate thread.
await push_items(filename, items)
...
Of the two mentioned approaches, the asyncify-approach is slightly
more efficient, because it makes use of a thread pool, and only switches
to a thread for the duration of the function you’ve asyncified. However,
using AsyncItemDB
probably makes your code easier to read and
maintain, which is probably worth more.
Reference¶
The ItemDB class¶
-
class
itemdb.
ItemDB
(filename)¶ A transactional database for storage and retrieval of dict items.
Parameters: filename (str) – The file to open. Use “:memory:” for an in-memory db. The items in the database can be any JSON serializable dictionary. Indices can be defined for specific fields to enable fast selection of items based on these fields. Indices can be marked as unique to make a field mandatory and identify items based on that field.
Transactions are done by using the
with
statement, and are mandatory for all operations that write to the database.-
mtime
¶ The time that the database file was last modified, as a Unix timestamp.
Is -1 if the file did not exist, or if the filename is not represented on the filesystem.
-
close
()¶ Close the database connection.
This will be automatically called when the instance is deleted. But since it can be held e.g. in a traceback, consider using
with closing(db):
.
-
get_table_names
()¶ Return a (sorted) list of table names present in the database.
-
get_indices
(table_name)¶ Get a set of index names for the given table.
Parameters: table_name (str) – The name of the table to get the indices for. To avoid SQL injection, this arg should not be based on unsafe data. Names prefixed with “!” represent fields that are required and unique. Raises KeyError if the table does not exist.
-
ensure_table
(table_name, *indices)¶ Ensure that the given table exists and has the given indices.
Parameters: - table_name (str) – The name of the table to make sure exists. To avoid SQL injection, this arg should not be based on unsafe data.
- indices (varargs) – A sequence of strings, representing index names. Fields that are
indexed can be queried with e.g.
select()
. To avoid SQL injection, this arg should not be based on unsafe data.
If an index name is prefixed with “!”, it indicates a field that is mandatory and unique. Note that new unique indices cannot be added when the table already exist.
This method returns as quickly as possible when the table already exists and has the appropriate indices. Returns the ItemDB object, so calls to this method can be stacked.
Although this call may modify the database, one does not need to call this in a transaction.
-
delete_table
(table_name)¶ Delete the table with the given name.
Parameters: table_name (str) – The name of the table to delete. To avoid SQL injection, this arg should not be based on unsafe data. Be aware that this deletes the whole table, including all of its items.
This method must be called within a transaction. Can raise KeyError if an invalid table is given, or IOError if not used within a transaction
-
rename_table
(table_name, new_table_name)¶ Rename a table.
Parameters: - table_name (str) – The current name of the table. To avoid SQL injection, this arg should not be based on unsafe data.
- new_table_name (str) – The new name. To avoid SQL injection, this arg should not be based on unsafe data.
This method must be called within a transaction. Can raise KeyError if an invalid table is given, or IOError if not used within a transaction
-
count_all
(table_name)¶ Get the total number of items in the given table.
-
count
(table_name, query, *save_args)¶ Get the number of items in the given table that match the given query.
Parameters: - table_name (str) – The name of the table to count items in. To avoid SQL injection, this arg should not be based on unsafe data.
- query (str) – The query to select items on. To avoid SQL injection, this arg should not be based on unsafe data; use save_args for end-user input.
- save_args (varargs) – The values to select items on.
Examples:
# Count the persons older than 20 db.count("persons", "age > ?", 20) # Count the persons older than a given value db.count("persons", "age > ?", min_age) # Use AND and OR for more precise queries db.count("persons", "age > ? AND age < ?", min_age, max_age)
See
select(
) for details on queries.Can raise KeyError if an invalid table is given, IndexError if an invalid field is used in the query, or sqlite3.OperationalError for an invalid query.
-
select_all
(table_name)¶ Get all items in the given table. See
select()
for details.
-
select
(table_name, query, *save_args)¶ Get the items in the given table that match the given query.
Parameters: - table_name (str) – The name of the table to select items in. To avoid SQL injection, this arg should not be based on unsafe data.
- query (str) – The query to select items on. To avoid SQL injection, this arg should not be based on unsafe data; use save_args for end-user input.
- save_args (varargs) – The values to select items on.
The query follows SQLite syntax and can only include indexed fields. If needed, use ensure_table() to add indices. The query is always fast (which is why this method is called ‘select’, and not ‘search’).
Examples:
# Select the persons older than 20 db.select("persons", "age > ?", 20) # Select the persons older than a given age db.select("persons", "age > ?", min_age) # Use AND and OR for more precise queries db.select("persons", "age > ? AND age < ?", min_age, max_age)
There is no method to filter items bases on non-indexed fields, because this is easy using a list comprehension, e.g.:
items = db.select_all("persons") items = [i for i in items if i["age"] > 20]
Can raise KeyError if an invalid table is given, IndexError if an invalid field is used in the query, or sqlite3.OperationalError for an invalid query.
-
select_one
(table_name, query, *args)¶ Get the first item in the given table that match the given query.
Parameters: - table_name (str) – The name of the table to select an item in. To avoid SQL injection, this arg should not be based on unsafe data.
- query (str) – The query to select the item on. To avoid SQL injection, this arg should not be based on unsafe data; use save_args for end-user input.
- save_args (varargs) – The values to select the item on.
Returns None if there was no match. See
select()
for details.
-
put
(table_name, *items)¶ Put one or more items into the given table.
Parameters: - table_name (str) – The name of the table to put the item(s) in. To avoid SQL injection, this arg should not be based on unsafe data.
- items (varargs) – The dicts to add. Keys that match an index can later be used for fast querying.
This method must be called within a transaction. Can raise KeyError if an invalid table is given, IOError if not used within a transaction, TypeError if an item is not a (JSON serializable) dict, or IndexError if an item does not have a required field.
-
put_one
(table_name, **item)¶ Put an item into the given table using kwargs.
Parameters: - table_name (str) – The name of the table to put the item(s) in. To avoid SQL injection, this arg should not be based on unsafe data.
- item (kwargs) – The dict to add. Keys that match an index can later be used for fast querying.
This method must be called within a transaction.
-
delete
(table_name, query, *save_args)¶ Delete items from the given table.
Parameters: - table_name (str) – The name of the table to delete items from. To avoid SQL injection, this arg should not be based on unsafe data.
- query (str) – The query to select the items to delete. To avoid SQL injection, this arg should not be based on unsafe data; use save_args for end-user input.
- save_args (varargs) – The values to select the item on.
Examples:
# Delete the persons older than 20 db.delete("persons", "age > ?", 20) # Delete the persons older than a given age db.delete("persons", "age > ?", min_age) # Use AND and OR for more precise queries db.delete("persons", "age > ? AND age < ?", min_age, max_age)
See
select()
for details on queries.This method must be called within a transaction. Can raise KeyError if an invalid table is given, IOError if not used within a transaction, IndexError if an invalid field is used in the query, or sqlite3.OperationalError for an invalid query.
-
The AsyncItemDB class¶
-
class
itemdb.
AsyncItemDB
¶ An async version of ItemDB. The API is exactly the same, except that all methods are async, and one must use async with instead of the normal with.
The asyncify function¶
-
itemdb.
asyncify
(func)¶ Wrap a normal function into an awaitable co-routine. Can be used as a decorator.
The original function will be executed in a separate thread. This allows async code to execute io-bound code (like querying a sqlite database) without stalling.
Note that the code in func must be thread-safe. It’s probably best to isolate the io-bound parts of your code and only wrap these.