DRUID 2024 and the Challenge of Explainable Semantic Networks

For the past eighteen months, my focus has been largely confined to the local HPC cluster at TUHH—wrestling with data ingestion, optimizing prefix trees, and fine-tuning transformer models. However, computational models are only as valuable as the insights they provide to the broader scientific community.

In June, I traveled to Nice, France, to present our paper, “The AI Innovation Compass: Constructing Semantic Networks from AI Concepts to Identify and Measure Technology Innovation”, at the DRUID 2024 conference. DRUID is a premier forum for scholars of innovation, entrepreneurship, and technical change, making it the ideal proving ground for our methodology.

Presenting the Compass

The premise of our paper addresses a critical measurement problem: AI is acting as a General Purpose Technology (GPT), but traditional bibliometrics fail to capture its nuanced integration into domains like biology or environmental science.

I presented our end-to-end unsupervised machine learning pipeline. We detailed how we started with foundational AI concepts from textbook indices and the Computer Science Ontology, and dynamically extended this list by analyzing semantic similarities in hundreds of thousands of abstracts from the PapersWithCode dataset.

To validate these concepts, we generated a phrase-document matrix with over 1.6 million entries and utilized a logistic regression model against a massive negative sample from OpenAlex. The result is a robust, weighted dictionary of 10,797 AI Concept Phrases.

By treating these concepts as nodes and their occurrences in documents as edges, we constructed an “AI Semantic Network”. It was highly rewarding to show the audience empirical data visualizing AI’s expanding footprint across disciplines over time.

The Epiphany: Prediction vs. Explanation

The feedback from the DRUID audience—comprising predominantly economists, sociologists, and innovation management researchers—was excellent, but it highlighted a fundamental tension between Computer Science and the Social Sciences.

In CS, we often optimize for predictive accuracy or computational efficiency. If the Contrastive Learning model clusters patents perfectly, or the logistic regression cross-validates at 87%, we consider the system a success.

However, for an innovation scholar, the algorithm is merely a lens. They are interested in causality. Why did a specific Graph Neural Network architecture suddenly appear in materials science in 2021? Was it driven by a specific funding grant, a high-profile co-authorship, or an algorithmic breakthrough?

High-dimensional embeddings are incredibly powerful for identifying that a link exists, but they are notoriously “black-box” when it comes to explaining why.

Returning to the Lab

The discussions overlooking the French Riviera were a necessary reminder of the ultimate goal of this PhD. We are not just building pipelines to process 100 million documents efficiently; we are building instruments to understand human knowledge creation.

The next phase of my research back in Hamburg must bridge this gap. We need to take the raw semantic networks generated by our AI Compass and overlay them with institutional, geographical, and author-level metadata. The models need to move from merely tracking diffusion to helping us explain it.

DRUID 2024 and the Challenge of Explainable Semantic Networks

Presenting the Compass

The Epiphany: Prediction vs. Explanation

Returning to the Lab

Share this article

Explore More

Imagine having all of science in one place...

Visualizing Invention: Fine-Tuning SBERTa on the USPTO Dataset