Datasette Office Hours, Personal Data Warehouses, datasette-ripgrep, datasette-indieauth, Datasette Client for Observable and more
Datasette Weekly(ish) volume 4
In this edition: book Datasette office hours, build a personal data warehouse, deploy a regular expression code search engine, plus updates on Datasette project releases from the past month.
Datasette Office Hours
One of the toughest things about running open source projects is the challenge of getting feedback: users often assume they are doing you a favor by leaving you alone, so generally you only hear from people if they find a bug.
I’m always really keen to talk to people who are using Datasette - or who aren’t using it yet, since I want to understand what it’s missing.
So I’m going to try a new approach: I’m setting aside time every Friday for Datasette Office Hours, where anyone can book a 20 minute Zoom call with me to talk about the project.
I’d love to hear from you if:
You’re solving problems with Datasette
You have a problem you think Datasette might be able to help with
You’ve run into problems using Datasette
You have thoughts on the direction you think the project should go
You’d like to see some demos of things I’m working on
You’re just interested in having a chat!
You can sign up for office hours at calendly.com/swillison/datasette-office-hours
Personal Data Warehouses
A gave a talk for the GitHub OCTO Speaker Series a couple of weeks ago about my Datasette and Dogsheep projects called Personal Data Warehouses: Reclaiming Your Data.
GitHub shared the video on their YouTube channel, and I’ve prepared an extended, annotated version of the talk with additional screenshots, links and notes.
The talk shows how I built my own personal data warehouse on top of Datasette that imports, analyzes and visualizes my data from Twitter, Swarm, GitHub, 23AndMe, Apple HealthKit, Apple Photos and more.
datasette-ripgrep
datasette-ripgrep is a web application I built for running regular expression searches against source code, built on top of the amazing ripgrep command-line tool.
Here are some example searches, running across around 100 Datasette and Datasette- adjacent GitHub repositories:
with.*AsyncClient - regular expression search for with.*AsyncClient
.plugin_config, literal=on - a non-regular expression search for .plugin_config(
with.*AsyncClient glob=datasette/** - search for that pattern only within the datasette/ top folder
"sqlite-utils[">] glob=setup.py - a regular expression search for packages that depend on either sqlite-utils or sqlite-utils>=some-version
test glob=!*.html - search for the string test but exclude results in HTML files
I wrote about the project on my blog, in datasette-ripgrep: deploy a regular expression search engine for your source code
It’s an interesting use-case for Datasette in that it doesn’t use SQLite at all - the tool works by running the “rg” executable against a folder full of source code. It does benefit from Datasette’s “datasette publish” mechanism - the following one-liner will deploy a Datasette instance to Google Cloud Run pre-configured to run searches against everything in the “all/“ directory, which is uploaded as part of the deployment:
datasette publish cloudrun \
--metadata metadata.json \
--static all:all \
--install=datasette-ripgrep \
--service datasette-ripgrep \
--apt-get-install ripgrep
The official demo is deployed by this GitHub Actions workflow which pulls a list of repos using github-to-sqlite, filters down to just the ones I want to include in the demo, then deploys them using the above pattern.
datasette-indieauth
My other big plugin project this month was datasette-indieauth, an authentication plugin which adds support for the emerging IndieAuth single sign-on standard.
You can read more about this project in Implementing IndieAuth for Datasette on my blog. IndieAuth is a spiritual successor to OpenID which allows users to sign-in using a website address that they control. It’s a particularly good fit for Datasette as it allows you to deploy single sign-on without first registering your site with a central authority.
Here’s an animation showing what the experience looks like signing in to the official demo at datasette-indieauth-demo.datasette.io/-/indieauth:
Datasette Client for Observable
Alex Garcia built a beautiful JavaScript client library for interacting with data hosted by Datasette from Observable notebooks. His demo at observablehq.com/@asg017/datasette-client shows how to use the client and demonstrates it integrating with Observable’s form elements and visualizing data as a stacked area chart using D3.
Other significant releases this month
Datasette 0.52 (and 0.52.1) - a relatively small release, this adds a new database_actions(datasette, actor, database) plugin hook and renames the --config option to --setting - --config still works but shows a deprecation message, and will be removed in Datasette 1.0.
github-to-sqlite 2.8 adds a new “github-to-sqlite workflows” command which imports GitHub Actions workflow YAML flies and uses them to populate new workflows, jobs and steps tables.
datasette-graphql 1.2 and 1.3 add support for the Datasette view-instance permission and view-database permissions, and use the new table actions plugin hook to add example GraphQL queries to the cog action menu on every table page. You can try that out against this commits table created using github-to-sqlite.
sqlite-utils 3.0 adds a new command-line tool and Python method for executing full-text searches against a table, and returning the results ordered by relevance. It also adds a new --tsv output option and makes some small changes to other command-line options, hence the 3.0 major version bump.