Reviving GraphFrames: Notes from Maintaining a 10-Year-Old OSS Project

About a year ago, I got involved in reviving and maintaining GraphFrames, a 10-year-old OSS project with a lot of history and not enough active maintenance. I was doing it neither for money nor to sell anything, but out of a still old-fashioned belief in free software and in the idea that it is worth spending time on software that is genuinely useful to others. This post is a reflection on what that experience has actually looked like: a constant tension between building new features and doing unglamorous maintenance, inherited code and forgotten assumptions everywhere, and the persistent fear of breaking backward compatibility in a library that had long ago been wired into important production processes.

April 8, 2026 · 30 min · Sem Sinchenko

Dreaming of Graphs in the Open Lakehouse

While Open Lakehouse platforms now natively support tables, geospatial data, vectors, and more, property graphs are still missing. In the age of AI and growing interest in Graph RAG, graphs are becoming especially relevant – there’s a need to deliver Knowledge Graphs to RAG systems, with standards, ETL, and frameworks for different scenarios. There’s a young project, Apache GraphAr (incubating), that aims to define a storage standard. For processing, there is a good tooling already. GraphFrames is like Spark for Iceberg – batch and scalable on distributed clusters; Kuzu is like DuckDB for Iceberg – fast, in-memory, and in-process; Apache HugeGraph is like ClickHouse or Doris for graphs – a standalone server for queries. I’m currently working also on graphframes-rs to bring Apache DataFusion and its ecosystem into this picture. All the pieces seem to be here—it just remains to put them together. More thoughts in the full post.

June 26, 2025 · 13 min · Sem Sinchenko