Publish Interactive Data Visualizations for Free with Python and Marimo

-


Working in Data Science, it will probably be hard to share insights from complex datasets using only static figures. All of the facets that describe the form and meaning of interesting data should not all the time captured in a handful of pre-generated figures. While we now have powerful technologies available for presenting interactive figures — where a viewer can rotate, filter, zoom, and usually explore complex data  —  they all the time include tradeoffs.

Here I present my experience using a recently released Python library — marimo — which opens up exciting recent opportunities for publishing interactive visualizations across your complete field of knowledge science.

Interactive Data Visualization

The tradeoffs to contemplate when choosing an approach for presenting data visualizations could be broken into three categories:

  • Capabilities — what visualizations and interactivity am I in a position to present to the user?
  • Publication Cost — what are the resources needed for displaying this visualization to users (e.g. running servers, hosting web sites)?
  • Ease of Use – how much of a brand new skillset / codebase do I want to learn upfront?

JavaScript is the muse of portable interactivity. Every user has an online browser installed on their computer and there are numerous different frameworks available for displaying any degree of interactivity or visualization you would possibly imagine (for instance, this gallery of fantastic things people have made with three.js). Since the applying is running on the user’s computer, no costly servers are needed. Nonetheless, a big drawback for the info science community is ease of use, as JS doesn’t have most of the high-level (i.e. easy-to-use) libraries that data scientists use for data manipulation, plotting, and interactivity.

Python provides a useful point of comparison. Due to its continually growing popularity, some have called this the “Era of Python”. For data scientists particularly, Python stands alongside R as certainly one of the foundational languages for quickly and effectively wielding complex data. While Python could also be easier to make use of than Javascript, there are fewer options for presenting interactive visualizations. Some popular projects providing interactivity and visualization have been Flask, Dash, and Streamlit (also price mentioning — bokeh, HoloViews, altair, and plotly). The largest tradeoff for using Python has been the fee for publishing – delivering the tool to users. In the identical way that shinyapps require a running computer to serve up the visualization, these Python-based frameworks have exclusively been server-based. That is under no circumstances prohibitive for authors with a budget to spend, nevertheless it does limit the variety of users who can reap the benefits of a selected project.

Pyodide is an intriguing middle ground — Python code running directly in the online browser using WebAssembly (WASM). There are resource limitations (just one thread and 2GB memory) that make this impractical for doing the heavy lifting of knowledge science. , this could be greater than sufficient for constructing visualizations and updating based on user input. Since it runs within the browser, no servers are required for hosting. Tools that use Pyodide as a foundation are interesting to explore because they offer data scientists a possibility to jot down Python code which runs directly on users’ computers without their having to put in or run anything outside of the online browser.

As an aside, I’ve been interested previously in a single project that has tried this approach: stlite, an in-browser implementation of Streamlit that enables you to deploy these flexible and powerful apps to a broad range of users. Nonetheless, a core limitation is that Streamlit itself is distinct from stlite (the port of Streamlit to WASM), which implies that not all features are supported and that advancement of the project depends on two separate groups working along compatible lines.

Introducing: Marimo

This brings us to Marimo.

The first public announcements of marimo were in January 2024, so the project may be very recent, and it has a singular combination of features:

  • The interface resembles a Jupyter notebook, which shall be familiar to users.
  • Execution of cells is reactive, in order that updating one cell will rerun all cells which rely upon its output.
  • User input could be captured with a versatile set of UI components.
  • Notebooks could be quickly converted into apps, hiding the code and showing only the input/output elements.
  • Apps could be run locally or converted into static webpages using WASM/Pyodide.

marimo balances the tradeoffs of technology in a way that’s well suited to the skill set of the everyday data scientists:

  • Capabilities — user input and visual display features are relatively extensive, supporting user input via Altair and Plotly plots.
  • Publication Cost — deploying as static webpages is essentially free — no servers required
  • Ease of Use — for users acquainted with Python notebooks, marimo will feel very familiar and be easy to select up.

Publishing Marimo Apps on the Web

The perfect place to begin with marimo is by reading their extensive documentation

As an easy example of the style of display that could be useful in data science, consisting of explanatory text interspersed with interactive displays, I even have created a barebones GitHub repository. Try it out yourself here.

Using just just a little little bit of code, users can:

  • Attach source datasets
  • Generate visualizations with flexible interactivity
  • Write narrative text describing their findings
  • Publish to the online without cost (i.e. using GitHub Pages)

For more details, read their documentation on web publishing and template repository for deploying to GitHub Pages.

Public App / Private Data

This recent technology offers an exciting recent opportunity for collaboration — publish the app publicly to the world, but users can only see specific datasets that they’ve permission to access.

Somewhat than constructing a dedicated data backend for each app, user data could be stored in a generic backend which could be securely authenticated and accessed using a Python client library — all contained inside the user’s web browser. For instance, the user is given an OAuth login link that may authenticate them with the backend and permit the app to temporarily access input data.

As a proof of concept, I built an easy visualization app which connects to the Cirro data platform, which is used at my institution to administer scientific data. Full disclosure: I used to be a part of the team that built this platform before it spun out as an independent company. In this way users can:

  • Load the general public visualization app — hosted on GitHub Pages
  • Connect securely to their private data store
  • Load the suitable dataset for display
  • Share a link which is able to direct authorized collaborators to the identical data

Try it out yourself here.

Example visualization app sourcing user controlled data (image created by creator)

As an information scientist, this approach of publishing free and open-source visualization apps which could be used to interact with private datasets is incredibly exciting. Constructing and publishing a brand new app can take hours and days as a substitute of weeks and years, letting researchers quickly share their insights with collaborators after which publish them to the broader world.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x