Datasette 0.51 - new visual design, smarter plugins

Datasette Weekly volume 3

In this edition: annotated release notes for Datasette 0.51, and datasette-copyable as plugin of the week.

(You may have seen these annotated release notes on my blog - if so, feel free to skip straight to plugin of the week.)

Datasette 0.51, the annotated release notes

I shipped Datasette 0.51 at the weekend, with a new visual design, plugin hooks for adding navigation options, better handling of binary data, URL building utility methods and better support for running Datasette behind a proxy. It’s a lot of stuff! Here are the annotated release notes.

New visual design

Datasette is no longer white and grey with blue and purple links! Natalie Downe has been working on a visual refresh, the first iteration of which is included in this release. (#1056)

It’s about time Datasette grew beyond its clearly-designed-by-a-mostly-backend-engineer roots. Natalie has been helping me start adding some visual polish: we’ve started with an update to the colour scheme and will be continuing to iterate on the visual design as the project evolves towards the 1.0 release.

The new design makes the navigation bar much more obvious, which is important for this release since the new navigation menu (tucked away behind a three-bar icon) is a key new feature.

Plugins can now add links within Datasette

A number of existing Datasette plugins add new pages to the Datasette interface, providig tools for things like uploading CSVs, editing table schemas or configuring full-text search.

Plugins like this can now link to themselves from other parts of Datasette interface. The menu_links(datasette, actor) hook (#1064) lets plugins add links to Datasette’s new top-right application menu, and the table_actions(datasette, actor, database, table) hook (#1066) adds links to a new “table actions” menu on the table page.

This feature has been a long time coming. I’ve been writing an increasing number of plugins that add new pages to Datasette, and so far the main way of using them has been to memorise and type in their URLs!

The new navigation menu (which only displays if it has something in it) provides a global location to add new links. I’ve already released several plugin updates that take advantage of this.

The new “table actions” menu imitates Datasette’s existing column header menu icon—it’s a cog. Clicking it opens a menu of actions relating to the current table.

Want to see a demo?

The demo at latest.datasette.io now includes some example plugins. To see the new table actions menu first sign into that demo as root and then visit the facetable table to see the new cog icon menu at the top of the page.

Here’s an animated GIF demo showing the new menus in action.

Binary data

SQLite tables can contain binary data in BLOB columns. Datasette now provides links for users to download this data directly from Datasette, and uses those links to make binary data available from CSV exports. See Binary data for more details. (#1036 and #1034).

I spent a ton of time on this over the past few weeks. The initial impetus was a realization that Datasette CSV exports included ugly Python b'\x15\x1c\x02\xc7\xad\x05\xfe' strings, which felt like the worst possible way to display binary in a CSV file, out of universally bad options.

Datasette’s main interface punted on binary entirely—it would show a <Binary data: 7 bytes> label which didn’t help much either.

The only way to get at binary data stored in a Datasette instance was to request the JSON version and then manually decode the Base-64 value within it!

This is now fixed: binary columns can be downloaded directly to your computer, using a new .blob output renderer. The approach is described on this new page in the documentation.

Security was a major consideration when building this feature. Allowing the download of arbitrary byte payloads from a web server is dangerous business: it can easily result in XSS holes where HTML with dangerous <script> content can end up hosted on the primary domain.

After some research, I decided to serve up binary content for download using the following headings:

content-type: application/binary x-content-type-options: nosniff content-disposition: attachment; filename="data-f30889.blob" 

application/binary is a safer Content-Type option than the more common application/octet-stream, according to Michal Zalewski’s renowned web application security book The Tangled Web (quoted here)

x-content-type-options: nosniff disables the XSS-tastic content sniffing feature in older versions of Internet Explorer, where IE would helpfully guess that you intended to serve HTML based on the first few bytes of the response.

The content-disposition: attachment header causes the browser to show a “download this file” dialog, using the suggested filename.

If you know of a reason that this isn’t secure enough, please let me know!

URL building

The new datasette.urls family of methods can be used to generate URLs to key pages within the Datasette interface, both within custom templates and Datasette plugins. See Building URLs within plugins for more details. (#904)

Datasette’s base_url configuration setting was the forcing factor around this piece of work.

It allows you to configure Datasette to serve content starting at a path other than /—for example:

datasette --config base_url:/path-to-datasette/ 

This will serve all Datasette pages at locations starting with /path-to-datasette/.

Why would you want to do this? It’s useful if you are proxying traffic to Datasette from within the URL hierarchy of an existing website.

The feature didn’t work properly, and enough people care about it that I had a steady stream of bug reports. For 0.51 I gathered them all into a single giant tracking issue and worked through them all one by one.

It quickly became apparent that the key challenge was building URLs within Datasette—not just within HTML template pages, but also for things like HTTP redirects.

Datasette itself needed to generate URLs that took the base_url setting into account, but so do Datasette plugins. So I built a new datasette.urls collection of helper methods and made them part of the documented internals API for plugins. The Building URLs within plugins documentation shows how these should be used.

I also added documentation on Running Datasette behind a proxy with example configs (tested on my laptop) for both nginx and Apache.

The datasette.client mechanism from Datasette 0.50 allows plugins to make calls to Datasette’s internal JSON API without the overhead of an HTTP request. This is another place where plugins need to be able to construct valid URLs to internal Datasette pages.

I added this example to the documentation showing how the two features can work together:

table_json = (     await datasette.client.get(         datasette.urls.table("fixtures", "facetable", format="json")     ) ).json()

One final weird detail on this: Datasette now has various methods that automatically add the base_url prefix to a URL. I got worried about what would happen if these were applied more than once (as above, where datasette.urls.table() applies the prefix so does datasette.client.get()).

I fixed this using the same trick that Django and Jinja use to avoid appliying auto-escaping twice to content that will be displayed in HTML: the datasette.urls methods actually return a PrefixedUrlString object which is a subclass of str that knows that the prefix has been applied! Code for that lives here.

Smaller changes

A few highlights from the “smaller changes” in Datasette 0.51:

  • Wide tables shown within Datasette now scroll horizontally (#998). This is achieved using a new <div class="table-wrapper"> element which may impact the implementation of some plugins (for example this change to datasette-cluster-map).

I think this is a big improvement: if your database table is too wide, it now scrolls horizontally on the page (rather than blowing the entire page out to a wider width). You can see that in action on the global-power-plants demo.

If you are signed in as root the new navigation menu links to a whole plethora of previously-undiscoverable Datasette debugging tools. This new permission controls the display of those items.

  • Link: HTTP header pagination. (#1014)

Inspired by GitHub and WordPress, which both use the HTTP Link header in this way. It’s an optional extra though: Datasette will always offer in-JSON pagination information.

  • Edit SQL button on canned queries, (#1019)

Suggested by Jacob Fenton in this issue. The implementation had quite a few edge cases since there are certain categories of canned query that can’t be executed as custom SQL by the user. See the issue comments for details and a demo.

  • --load-extension=spatialite shortcut. (#1028)

Inspired by a similar feature in sqlite-utils.

  • datasette -o option now opens the most relevant page. (#976)

This is a fun little feature. If your Datasette only loads a single database, and that database only has a single table (common if you’ve just run a single CSV import) then running this will open your browser directly to that table page:

datasette data.db -o 
  • datasette --cors option now enables access to /database.db downloads. (#1057)

This was inspired by Mike Bostock’s Observable Notebook that uses the Emscripten-compiled JavaScript version of SQLite to run queries against SQLite database files.

It turned out you couldn’t use that notebook against SQLite files hosted in Datasette because they weren’t covered by Datasette’s CORS option. Now they are!

Recommendations for plugin authors, inspired by a question from David Kane on Twitter. David has been building datasette-reconcile, a Datasette plugin that offers a reconciliation API endpoint that can be used with OpenRefine. What a brilliant idea!

Plugin of the week: datasette-copyable

One of my goals with Datasette is to make getting data out of it as easy as possible. I want Datasette to be the obvious place to store your data because you’ll be able to transform it (either manually or automatically) into any other format you could possible want, often by taking advantage of plugins.

datasette-copyable is an acknowledgement that often the easiest way to export data is using copy and paste! It adds the ability to export Datasette tables in 21 different formats - from TSV to Jira table markup to LaTeX, Mediawiki markup or HTML.

The most useful of these is the first one: TSV. The great thing about tab-separated data is that you can paste it directly into Microsoft Excel, Apple Numbers or Google Sheets and it will be automatically arranged into cells that correspond to the original table. This makes copy-as-TSV by far the easiest way to get small amounts of data out of Datasette and into a spreadsheet.

The other flavours of export are provided by the excellent Tabulate Python library, which is also used for the different command-line export options provided by sqlite-utils.

You can try out datasette-copyable on fivethirtyeight.datasettes.com - this linked example provides exports of airline safety data.

Since copy-and-paste is such a useful way of moving data around, I’m starting to percolate the idea of supporting copy-and-paste of data into Datasette as well.