Datasette Desktop - a macOS application version of Datasette

Datasette on your laptop without installing Python

The big news this month is Datasette Desktop, a brand new Mac application I've been building to help people run Datasette - and Datasette plugins - on their own computers.

An ongoing goal I have for Datasette is to make it available for as many people to use as possible.

I've tried a bunch of different strategies for this in the past:

  • Datasette is built using Python, so if you have a working Python environment you can install it using pip install datasette

  • I packaged it for Homebrew, so brew install datasette on macOS should work too

  • I have documentation for running it online using Glitch, where it can be installed using just your web browser

  • The datasette publish set of commands are designed to help deploy Datasette to Cloud Run, Heroku, Vercel or fly.io as easily as possible

Unfortunately, each of these options require some degree of deep technical knowledge before you can use them - familiarity with the command-line, or knowledge of how to deploy websites. These things are a significant barrier to entry!

My goal with Datasette Desktop is to make installing Datasette as easy as downloading and installing any other Mac application.

You can try that out here: click the Download link, open the zip file and drag the Datasette application to your /Applications folder.

The application comes with its own copy of Python 3.9 tucked away inside it, which means that even if you don’t have Python installed it will still be able to run just fine. It also pulls off some tricks to ensure that existing Datasett plugins can be installed using the “Plugins → Install and Manage Plugins…” menu option.

Once installed, the application can be used to open existing CSV or SQLite files on your computer. It can also open CSV files by URL, and uses that capability to offer examples that you can open to try it out (currently using data from the Central Park Squirrel Census and the London Fire Brigade’s list of animal rescue incidents).

How I built it

Like Datasette itself, Datasette Desktop is open source. I’ve been building it entirely in the open in the simonw/datasette-app GitHub repository.

I’ve been writing up details of how it works as I’ve gone along:

  • Building a desktop application for Datasette describes the initial research into the project, and talks about how I’m building it using Electron.

  • Datasette Desktop—a macOS desktop application for Datasette was the initial launch announcement for the first installable version. It describes how the application works in some detail, including how I bundled Python inside the application and how I figured out signing and distribution of macOS apps using Electron.

  • Datasette Desktop 0.2.0: The annotated release notes describes the second release, which introduced some significant new features including a plugin management interface for installing, upgrading and uninstalling plugins and the ability to open SQLite and CSV files directly using the application.

I’ve also been collecting numerous TILs (Today I Learned) about Electron.

Feedback welcome

The application so far is just a starting point: I’m very keen to hear from people who have tried it out. Please leave questions, suggestions and feedback on the GitHub Discussions forum for the project.

Everything new in Datasette since January, plus Django SQL Dashboard

What's new in the Datasette ecosystem since January 2021

It’s been a while since the last edition of this newsletter, so plenty to cover today - starting with what’s new in Datasette and sqlite-utils, and then introducing a new Datasette-like tool aimed at Django + PostgreSQL developers called Django SQL Dashboard.

What’s new in Datasette

Datasette 0.55 added support for cross-database SQL queries: you can now start Datasette with the new “--crossdb” option which enables cross-database joins! More details in the documentation or you can try out an example cross-database query here.

0.56 was mainly small bug fixes and documentation improvements, though it did also introduce a handle you can drag to resize the SQL editor.

0.57 brought a critical security fix - if you have any Datasette instances running on the public web you should upgrade to a version higher than this (or 0.56.1 which applies the same fix to 0.56) as soon as possible.

It also fixed a long-standing issue where SQL errors were displayed without letting you edit your original SQL, and added mechanisms for selectively showing and hiding columns on a table page (“?_col=” and “?_nocol”).

0.58 added Unix domain socket support, useful for people running Datasette behind an Apache or NGINX proxy server. It also added two new plugin hooks and a significant performance boost for faceting - all of which are covered in detail in the Datasette 0.58 annotated release notes.

What’s new in sqlite-utils

Datasette’s close companion is sqlite-utils - a combination CLI tool and Python library for creating and manipulating SQLite databases.

sqlite-utils gained two key new features this year.

sqlite-utils memory is a new command that lets you pipe JSON and CSV data directly into a temporary, in-memory SQLite database, execute a SQL query that filters or joins that data, then returns the results as tabular, CSV or JSON output.

It turns sqlite-utils into an ad-hoc querying tool that works against all sorts of data, and can be used to combine data from different sources - effectively running joins between JSON, CSV and other data sources.

I recorded a YouTube video showing the new tool in action:

Here’s an example that fetches JSON from the GitHub API, executes a SQL query to select three columns and sort by the number of stars, then outputs the result to the console as CSV:

$ curl -s 'https://api.github.com/users/dogsheep/repos' \
  | sqlite-utils memory - '
      select full_name, forks_count, stargazers_count as stars
      from stdin order by stars desc limit 3
    ' --csv
full_name,forks_count,stars
dogsheep/twitter-to-sqlite,12,225
dogsheep/github-to-sqlite,14,139
dogsheep/dogsheep-photos,5,116

I wrote more about this in Joining CSV and JSON data with an in-memory SQLite database.

The sqlite-utils convert command is the other big new feature: it lets you apply a conversion function written in Python to every value in a column in your SQLite database.

Here’s how to remove commas from a column with 123,456 style numbers:

sqlite-utils convert states.db states count \
    'value.replace(",", "")'

And this will split a “location” column into two separate “latitude” and “longitude” columns (using the --multi option):

sqlite-utils convert data.db places location '
latitude, longitude = value.split(",")
return {
    "latitude": float(latitude),
    "longitude": float(longitude),
}' --multi

Read Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool for more.

Django SQL Dashboard

I spent the first few months of the year mainly not working on Datasette at all: I joined the team running VaccinateCA / Vaccinate The States to build a Django application to help crowdsource information about places that people could go to get vaccinated against COVID-19 in the USA.

My first order of business was to help replace their Airtable infrastructure - which was rapidly reaching the size limits of what could be kept in Airtable - with a Django equivalent.

Since a major benefit of building on Airtable was its ad-hoc reporting abilities, I found myself really needing something similar to Datasette that could work against a PostgreSQL database.

So I built exactly that: Django SQL Dashboard - an open source package that adds a Datasette-like querying interface to a Django+PostgreSQL project, protected by the Django authentication layer.

I recorded this video with a demo of the project:

Lots more detail in the Django SQL Dashboard launch announcement.

I’m still deciding if and when to bring the lessons from that project back into Datasette itself - having Datasette work against PostgreSQL in addition to SQLite would open up a huge new set of possibilities for the project, and my work on Django SQL Dashboard has helped me start to scope out how much work it would be to make that a reality.

If you’re interested in reading more about my work at VaccinateCA I’ve imported my writing from my internal blog at the organization into my regular blog - you can find that series of posts here: simonwillison.net/series/vaccinateca

New Datasette Plugins

I decided to check how many new Datasette plugins I’ve released since the last newsletter on January 26th. The answer turned out to be seven! Here’s the query I used:

select
  repos.full_name,
  tag_name,
  min(releases.created_at) as launched_at
from
  releases
  join repos on releases.repo = repos.id
where
  repos.full_name like 'simonw/datasette-%'
  and repos.full_name not in (
  -- These are older but the first tagged release was this year
    'simonw/datasette-jellyfish',
    'simonw/datasette-haversine'
  )
group by
  repo
having
  launched_at > '2021-01-26'
order by
  launched_at

Run that here.

datasette-tiles is a plugin for serving map tiles directly out of a SQLite database, using the MBTiles standard for bundling tiles in a DB file.

datasette-basemap is a related plugin that bundles the first four layers of tiles from OpenStreetMap in a SQLite database file which can be installed in a place where Datasette can find it using “pip install datasette-basemap”.

Here’s a demo of the plugin, or you can browse the underlying database of tile images.

I wrote more about these plugins in Serving map tiles from SQLite with MBTiles and datasette-tiles.

datasette-block is a plugin that lets you block all access to specific path prefixes within Datasette. You probably don’t need this one!

datasette-placekey adds SQL functions for working with the placekey mechanism for creating identifiers for locations. I built this for some analysis at VaccinateCA.

datasette-remote-metadata lets you define your Datasette metadata in an external file and have Datasette fetch it on-demand while it is running. This is useful for if you have a large (1GB+) database running somewhere like Cloud Run where deploying new metadata can take several minutes - with the plugin you can edit the remote file and have the changes go live a few seconds later.

I built this as part of working with Stanford’s Big Local News to help launch the Stanford School Enrollment Project, which makes available detailed school enrollment figures for schools and districts across the USA from this year back to 2015.

datasette-pyinstrument adds the ability to add ?_pyinstrument=1 to any Datasette URL in order to see the output of a profile run while constructing the page.

datasette-query-links is a highly experimental plugin which looks out for SQL queries that return strings that are themselves valid SQL queries, and turns those strings into links that will execute the SQL. I thought this was a neat and unique idea, until I found out that PostgreSQL already has a version of this as a built in feature called \gexec!

What’s next

I’ve decided to try and get this newsletter back onto a roughly monthly cadence. In the meantime, I suggest keeping an eye on the official Datasette website at datasette.io - you can even subscribe to its Atom feeds!

And a reminder: I’m still running Datasette Office Hours every Friday, so if you’d like to have a 25 minute video chat with me about Datasette to talk about things you’re working on or provide feedback on the project, sign up for a slot!

Datasette 0.54, and querying a database by drawing on a map

Datasette Weekly(ish) volume 6

Since the last edition of this newsletter I’ve made 33 releases across 13 different Datasette ecosystem projects. Some of the highlights.

datasette-leaflet-freedraw

Datasette supports parameterized SQL queries: if you compose a SQL query with a placeholder such as “select * from counties where population > :number” Datasette will spot the :number parameter and turn it into a form field on a page:

I’ve been planning to add custom form field widgets for a while. datasette-leaflet-freedraw is a new plugin which starts exploring this concept, by enhancing any fields ending in “freedraw” with a map widget that lets users draw a shape, which will be converted to a GeoJSON MultiPolygon ready to be sent back to Datasette.

Here’s an interactive demo:

You can read more about this plugin in Drawing shapes on a map to query a SpatiaLite database.

Datasette 0.54

I released Datasette 0.54 yesterday. You can read the annotated release notes on my blog, but the two two big new features in this release are the _internal SQLite in-memory database storing details of all connected databases and tables, and support for JavaScript modules in plugins and additional scripts.

The _internal database

I want Database to be able to handle dozens of attached database files containing potentially thousands of tables. To support this, the Datasette homepage will need to get a lot smarter: it needs to provide pagination and search and filtering against the list of tables themselves.

Datasette is already pretty good at providing paginated search and filtering! The new _internal database is the first step to providing this feature: it's an in-memory database which is automatically updated with details of all of the attached databases, plus their columns, foreign keys and indexes.

As part of building this I added support for shared, named in-memory databases - so plugins can now create an in-memory database that will persist for the lifetime of the Datasette process and can be accessed from multiple connections.

You can browse a demo of the _internal database by signing into the demo as root and then accessing latest.datasette.io/_internal

JavaScript modules

JavaScript modules are supported by modern browsers and allow you to use the “import” and “export” keywords from ECMAScript 2015 without needing a build step.

They only work in <script> blocks with the type="module" attribute, so Datasette now has the ability to include scripts with that attribute, for both plugins and extra scripts loaded through configuration.

A big benefit of native modules is that they help avoid accidentally loading the same script twice. There are three Datasette plugins that use Leaflet maps now - datasette-cluster-map, datasette-leaflet-geojson and the new datasette-leaflet-freedraw - so I've started working on a new base plugin called datasette-leaflet which makes the latest Leaflet available for all of them to use via the modules system.

datasette-css-properties and datasette-export-notebook

Two more new plugins: these use Datasette's register_output_renderer hook to add extra options for getting data back out of Datasette.

I wrote about datasette-css-properties here - it's a very weird plugin! It was inspired by Custom Properties as State by Chris Coyier, which proposed server APIs that return CSS files defining custom properties that can be used to style pages.

I'm not sure if this is a good idea, but the joy of plugins is that you can try out ideas like this without risking damage to the integrity of the core project!

datasette-export-notebook is more obviously useful: it adds instructions for exporting data from Datasette into Jupyter and Observable notebooks. You can try that out here against the Covid-19 Datasette.

Datasette Office Hours

I've now had 17 conversations with people about Datasette in office hours sessions. It's been absolutely wonderful! Talking to people about what they're building, or what they might build, is incredibly valuable for helping me decide what to work on next, and it's also just really fun talking to people about the project.

I plan to continue running office hours sessions on Friday for the forseeable future. If you'd like to chat about the project you can grab a slot with me here.

Official project website for Datasette, building a search engine with Dogsheep Beta, sqlite-utils analyze-tables

Datasette Weekly(ish) volume 5

Datasette finally has an official project website!

datasette.io is the new home for Datasette. The site features news about the project, explains the problems that Datasette and its ecosystem of tools try to solve and gathers together a complete directory of plugins and tools for using with Datasette.

The site itself is built using Datasette - it’s actually a heavily themed installation of Datasette, making extensive use of the datasette-template-sql plugin and custom pages to implement the different sections of the site. The code for the site is available on GitHub at simonw/datasette.io.

I wrote more about how the site works in datasette.io, an official project website for Datasette.

Adding search to datasette.io

datasette.io also includes a search engine for the project.

It provides faceted search across:

  • Every section of the latest documentation (416 total)

  • 48 plugin READMEs

  • 22 tool READMEs

  • 64 news items posted on the Datasette website

  • 212 items from my blog

  • Release notes from 557 package releases

That’s 1,319 items total! The overall Datasette project continues to grow in all kinds of directions, so having a single search engine to tie it all together felt like a useful addition.

I built the search engine using Dogsheep Beta, the cross-table SQLite search engine I originally built for my Dogsheep Personal Analytics project.

You can read more about how the search engine works, including a breakdown of the YAML configuration used to create the unified search index, in Building a search engine for datasette.io.

sqlite-utils analyze-tables

I released sqlite-utils 3.1 with a useful feature for getting to grips with new data: the “sqlite-utils analyze-tables” command.

Run this against any SQLite database and it will spit out a summary of every column in every table, showing you common and least-common values, the number of distinct items and how many rows contain nulls or blanks.

The analyze-tables documentation has more. In addition to outputting the summary to the command-line you can add the “--save” option to store the generated summary in a SQLite table called _analyze_tables_ - which means you can then further examine it using Datasette.

Here’s an example of the output when running analyze-tables against a database created using my github-to-sqlite tool.

Office hours for 2021

Every Friday I run Datasette Office Hours sessions, where people who use Datasette (or who are interested in using it) can grab a 20 minute video chat with me to talk about the project. I’ve done a couple of weeks of these now and they’ve been fantastic - it’s so interesting hearing how people are using or plan-to-use the tool, and I’ve already had a bunch of great feedback to help me plan the next steps for the project.

If you’re interested in talking to me, slots in January 2021 are available now!

Datasette Office Hours, Personal Data Warehouses, datasette-ripgrep, datasette-indieauth, Datasette Client for Observable and more

Datasette Weekly(ish) volume 4

In this edition: book Datasette office hours, build a personal data warehouse, deploy a regular expression code search engine, plus updates on Datasette project releases from the past month.

Datasette Office Hours

One of the toughest things about running open source projects is the challenge of getting feedback: users often assume they are doing you a favor by leaving you alone, so generally you only hear from people if they find a bug.

I’m always really keen to talk to people who are using Datasette - or who aren’t using it yet, since I want to understand what it’s missing.

So I’m going to try a new approach: I’m setting aside time every Friday for Datasette Office Hours, where anyone can book a 20 minute Zoom call with me to talk about the project.

I’d love to hear from you if:

  • You’re solving problems with Datasette

  • You have a problem you think Datasette might be able to help with

  • You’ve run into problems using Datasette

  • You have thoughts on the direction you think the project should go

  • You’d like to see some demos of things I’m working on

  • You’re just interested in having a chat!

You can sign up for office hours at calendly.com/swillison/datasette-office-hours

Personal Data Warehouses

A gave a talk for the GitHub OCTO Speaker Series a couple of weeks ago about my Datasette and Dogsheep projects called Personal Data Warehouses: Reclaiming Your Data.

GitHub shared the video on their YouTube channel, and I’ve prepared an extended, annotated version of the talk with additional screenshots, links and notes.

The talk shows how I built my own personal data warehouse on top of Datasette that imports, analyzes and visualizes my data from Twitter, Swarm, GitHub, 23AndMe, Apple HealthKit, Apple Photos and more.

datasette-ripgrep

datasette-ripgrep is a web application I built for running regular expression searches against source code, built on top of the amazing ripgrep command-line tool.

Here are some example searches, running across around 100 Datasette and Datasette- adjacent GitHub repositories:

I wrote about the project on my blog, in datasette-ripgrep: deploy a regular expression search engine for your source code

It’s an interesting use-case for Datasette in that it doesn’t use SQLite at all - the tool works by running the “rg” executable against a folder full of source code. It does benefit from Datasette’s “datasette publish” mechanism - the following one-liner will deploy a Datasette instance to Google Cloud Run pre-configured to run searches against everything in the “all/“ directory, which is uploaded as part of the deployment:

datasette publish cloudrun \
    --metadata metadata.json \
    --static all:all \
    --install=datasette-ripgrep \
    --service datasette-ripgrep \
    --apt-get-install ripgrep

The official demo is deployed by this GitHub Actions workflow which pulls a list of repos using github-to-sqlite, filters down to just the ones I want to include in the demo, then deploys them using the above pattern.

datasette-indieauth

My other big plugin project this month was datasette-indieauth, an authentication plugin which adds support for the emerging IndieAuth single sign-on standard.

You can read more about this project in Implementing IndieAuth for Datasette on my blog. IndieAuth is a spiritual successor to OpenID which allows users to sign-in using a website address that they control. It’s a particularly good fit for Datasette as it allows you to deploy single sign-on without first registering your site with a central authority.

Here’s an animation showing what the experience looks like signing in to the official demo at datasette-indieauth-demo.datasette.io/-/indieauth:

Datasette Client for Observable

Alex Garcia built a beautiful JavaScript client library for interacting with data hosted by Datasette from Observable notebooks. His demo at observablehq.com/@asg017/datasette-client shows how to use the client and demonstrates it integrating with Observable’s form elements and visualizing data as a stacked area chart using D3.

Other significant releases this month

Datasette 0.52 (and 0.52.1) - a relatively small release, this adds a new database_actions(datasette, actor, database) plugin hook and renames the --config option to --setting - --config still works but shows a deprecation message, and will be removed in Datasette 1.0.

github-to-sqlite 2.8 adds a new “github-to-sqlite workflows” command which imports GitHub Actions workflow YAML flies and uses them to populate new workflows, jobs and steps tables.

datasette-graphql 1.2 and 1.3 add support for the Datasette view-instance permission and view-database permissions, and use the new table actions plugin hook to add example GraphQL queries to the cog action menu on every table page. You can try that out against this commits table created using github-to-sqlite.

sqlite-utils 3.0 adds a new command-line tool and Python method for executing full-text searches against a table, and returning the results ordered by relevance. It also adds a new --tsv output option and makes some small changes to other command-line options, hence the 3.0 major version bump.

Loading more posts…