PhilipMat

Automating TIL Posts from GitHub Issues

I read a lot of things on the internet and rarely write about them. The friction of opening an editor, writing front matter, crafting a summary, and opening a PR is just high enough that it doesn’t happen. So I set up a workflow to do most of it for me.

The idea: create a GitHub issue with a link (or just some notes), and a GitHub Action kicks off, fetches the article, asks an LLM to summarize it and generate tags, then opens a PR with a ready-to-publish TIL post. All I have to do is review and merge.

How It Works

The workflow lives in .github/workflows/til.yml and triggers on any issue I open on the repository (it checks the issue author against the repo owner, so it ignores issues from anyone else).

When it fires, a Python script (.github/scripts/create_til.py) does the following:

  1. Fetches the issue via the GitHub API – title and body.
  2. Extracts a URL from the body if one is present. If found, trafilatura fetches the page and strips it down to the main article text, dropping nav, ads, and boilerplate.
  3. Calls OpenRouter (GPT-4o) with the article content and any notes I added to the issue body. The LLM returns a JSON object with a title, URL slug, 1-2 paragraph summary, a one-line snippet, and a set of tags.
  4. Assembles the post – front matter plus body – and writes it to _posts/YYYY-MM-DD-{slug}.md.
  5. Opens a PR and posts a comment on the original issue with a link to it.

If there’s no URL in the issue, the body text is used directly as the post content and the LLM only generates the metadata (title, slug, tags).

The Post Format

Each generated post gets the standard layout: post front matter plus a few extra fields:

---
layout: post
title: "TIL: How Vector Databases Work"
tags: [databases, ai, embeddings]
source_url: https://example.com/the-article
snippet: A look at approximate nearest neighbor search in vector databases.
---

The snippet field is already used by the archive page template, so TIL posts show up with descriptions in the archive without any template changes. The tags are stored for future use – the site doesn’t render them yet.

What I Had to Set Up

One secret in the repo: OPENROUTER_API_KEY. That’s it. The GITHUB_TOKEN is provided automatically by Actions.

The full plan for this workflow is in plan-opus.md in the repo root, including edge case handling, prompt design, and cost estimates.

This work was done using Claude Opus for planning, Codex for implementation (with an alternative by Kimi), and Sonnet for review.

TIL: Building a self-updating profile README for GitHub

Summary

Today I learned about a new feature on GitHub that allows you to create a profile README by setting up a repository with the same name as your GitHub account.

  1. Create a repo with the same name as your account: github.com/foobar/foobar;
  2. Add a README.md to it; and
  3. GitHub will render the contents at the top of your personal profile page.

Bonus: automate update with GH Actions

Source: Building a self-updating profile README for GitHub

Categories: TIL, Github, Automation, Python, Graphql, Github Actions

Connect From PyODBC to SQL Server in DataBricks Cluster

It’s a shame that clusters created in Azure DataBricks don’t have the pyodbc drivers for Microsoft SQL Server installed by default.
Perhaps the people who normally use DataBricks don’t typically connect to SQL Server, but it feels that it’d be easier if the images provided by Microsoft would enable that by default.

Then, to make things more difficult, a lot of the answers on the Internet on how to install these drivers are for the older versions of DataBricks.
The newer versions change slighly the approach used to install them.

The gist of the install process is as following:

  1. Create an install shell-script to install the msodbcsql17 package
  2. Save it as a file in DataBricks, whether as a Workspace script or in a connected repository.
  3. When configuring the cluster, add this file as an “init script”.

Install Script

The script below is centered around installing the msodbcsql17 driver.
For this, we need to import the Ubuntu sources for apt and then install the package along with the unixodbc-dev:

curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
apt-get -y install unixodbc-dev
sudo apt-get install python3-pip -y

Saving Script in DataBricks

Save the script as a file, say pyodb-setup.sh, in either a DataBricks repository:

PyODBC Setup File in Connected Repo

or directly in a Workspace file (in the example below, stored in Workspace/Shared/pyodbc-setup.sh):

PyODBC Setup File in Workspace Shared Folder

Configure DataBricks Cluster

Either during creation, or after, configure the DataBricks cluster by expanding the “Advanced Options” section on the “Configuration” tab, then by selecting the “Init Scripts” tab.
There will be three options to add a new script: “Workspace”, “ABFSS”, and “DBFS”.

The “DBFS” is deprecated, and the “ABFSS” (Azure Blob File System or Azure Data Lake Storage Gen) is a bit more complicated to set up; the “Workspace” approach outlined above is the simplest.

The path to the pyodbc-setup.sh script is relative to the root of “Workspace”, so /Shared/pyodb-init.sh or /Repos/user/repo/pyodb-install.sh if in the repository.

Setting up init scripts in DataBricks cluster config

Connecting

This whole setup allows pyodbc to connect to a SQL Server using a connection string specifying the SQL Server 17 driver, "DRIVER=ODBC Driver 17 for SQL Server;...", like so:

server = "example-server.database.windows.net"
db_name = "example-db"
user = "example_user"
pwd = "example password"
conn_string = f"DRIVER=;SERVER={server};DATABASE={db_name};UID={user};PWD={pwd}"
conn = pyodbc.connect(conn_string)
with conn:
  conn.execute("select * from table")

As a note, there’s a newer version, msodbcsql18 – see here the whole list.
The script remains the same, save for pointing to a different Ubuntu, for example: https://packages.microsoft.com/config/ubuntu/18.04/prod.list.

Cached Claims when Using Windows Authentication in ASP.NET Core

In Loading Claims when Using Windows Authentication in ASP.NET Core we examined an approach for injecting Claims into the ClaimsPrincipal in order to enable policy usage – [Authorization(Policy = "SomePolicy")] – on controller actions.

One of the purposes of the IClaimsTransformation implementation is to provide an easier, and somewhat efficient, way to use authorization policies. As such, we wouldn’t be wrong to perform some expensive operations in the class implementing this interface. For example, querying a database.

Having than happen on every request is a bit more than annoying while in development.

While we cannot avoid the calls to the claims transformer, we can avoid the expensive calls by using a caching approach.
The title is misleading a bit at this point. We will be caching the expensive calls and not the claims.

In the Windows claims example, we have MagicPowersInfoProvider as a way to provide information to the claims transformer, MyClaimsLoader, which in turn determines whether a claim needs to be added to the ClaimsIdentity (in TransformAsync).

MagicPowersInfoProvider is registered as a singleton, which makes is a good place to handle caching.

services.AddSingleton<Auth.MagicPowersInfoProvider>();

It only makes sense to cache when running under IIS Express.
Luckily, we don’t need to perform any complex detection of IIS Express. We just need to modify the launchSettings.json file to add an environment variable:

{
  "profiles": {
    "IIS Express": {
      "commandName": "IISExpress",
      "launchBrowser": true,
      "environmentVariables": {
        "ASPNETCORE_ENVIRONMENT": "Development",
        "CacheClaims": "true"
      }
    },
  ...

Then MagicPowersInfoProvider can make use of an injected IMemoryCache when the "CacheClaims" key is true, which would only be when running the application from Visual Studio and under IIS Express.

public class MagicPowersInfoProvider
{
    private const string CacheClaimsKey = "CacheClaims";
    private const int ClaimCacheInSeconds = 5 * 60;
    private readonly bool _cacheClaims;
    private readonly IMemoryCache _memoryCache;

    public MagicPowersInfoProvider(IConfiguration config, IMemoryCache memoryCache)
    {
        _memoryCache = memoryCache;
        _cacheClaims = config.GetValue<bool>(CacheClaimsKey);
    }

    public async Task<bool> CanHasPowerAsync(string userId)
    {
        if (!_cacheClaims)
        {
            return await ExpensiveHasPowerOperation(userId);
        }

        return await _memoryCache.GetOrCreateAsync<bool>(
            $"power-{userId}",
            async cacheEntry =>
            {
                cacheEntry.SlidingExpiration = TimeSpan.FromSeconds(ClaimCacheInSeconds);
                bool hasPower = await ExpensiveHasPowerOperation(userId);
                return hasPower;
            });
    }

    private Task<bool> ExpensiveHasPowerOperation(string userId)
        => Task.FromResult(true);
}

For a full example, containing all the code, see this repo.

Replicating macOS's say command in Windows

Is say, macOS has a wonderful command line utility which I found to be useful to use in conjuction with long-running processes or even debugging to help draw attention, more so that typical beeping would do.

In short, say speaks text - is a Text-to-Speech (TTS) program.

say "Hello, there"

I wanted something similar on Windows and while there’s no direct equivalent, luckily .NET provides an entire host of utilities through the System.Speech.Synthesis namespace.

The say command has a number of parameter, mostly dealing with technical attributes such as voice selection (the speaker), output of spoken text, quality, e.t.c.

For this example, we’ll stick with the default voice of the speech synthesizer. As such, the solution is really simple using a Powershell script:

[cmdletbinding()]
param(
    [Parameter(Position = 1, Mandatory = $true)]
    [String]
    $message
)
Add-Type -AssemblyName System.Speech
$synth = New-Object -TypeName System.Speech.Synthesis.SpeechSynthesizer
$synth.Speak($message)

The Position-al parameter binding allow us to either call it directly:

say.ps1 'Hello there'

Or pass is with a switch argument:

say.ps1 -message 'Hello there'

I wish Visual Studio still had the ability to call macros on breakpoint because the code could translate into a one-liner in C#:

new System.Speech.Synthesis.SpeechSynthesizer()
  .Speak("Breakpoint hit");

As such, one would have to wrap it first into a method that can then get called when a breakpoint is hit.

Debug action with speech synthesis

It would be interesting to replicate the rest of the commands, in particular the voices since that would also allow for proper I18N speech synthesis.