AI Infrastructure for Digital Autonomy in Universities

Benjamin Paaßen, Stefanie Go, Maximilian Mayer, Benjamin Kiesewetter, Anne Krüger, Jonas Leschke, Christian M. Stracke for the Research network Artificial Intelligence and Digital Autonomy in Research and Education (AIDARE).

Click here for the German version.

For future research and teaching in universities, access to large language models (LLMs) will be crucial. As such, universities ought to avoid dependencies on proprietary LLM suppliers and, instead, build a diversified AI infrastructure that supports rather than undermines digital autonomy of students, teachers, researchers, and the university as a whole. This document is directed at university leadership to support strategic steps toward such an infrastructure that can be implemented in the short- and mid-term with tangible benefits to digital autonomy.

Why digital autonomy?

With respect to artificial intelligence (AI), digital autonomy is core to the purpose of universities as teaching and research institutions: As institutions, they ought to be independent from the influence of AI hyperscalers; university researchers ought to be free to choose their own research tools and objectives instead of being limited by the constraints of proprietary AI systems; university teachers ought to be able to choose whether and how to integrate AI systems into their learning design and didactics without sending personal data of their students to servers abroad; and university students ought to become responsible citizens and autonomous experts in their field without offloading their cognition and academic responsibility to AI tools. Therefore, universities should build an AI infrastructure that promotes, rather than undermines digital autonomy in the sense of self-determination, epistemic agency and –justice, academic responsibility, and competencies (within and beyond specific academic subjects). Promoting digital autonomy in this ambitious sense involves many facets of universities, including teaching, research, administration, governance (such as guidelines), ethos. This document focuses on AI infrastructure, meaning the technological foundations for autonomy, such as algorithmic transparency and flexibility, that enable university members to manifest autonomy in the first place. We frame universities and their members as responsible actors that can shape the future of AI usage – instead of treating AI as an overwhelming external force – and we emphasize short- and mid-term steps universities can take to achieve tangible benefits to digital autonomy.

The status quo: LLM Chat-Interfaces

Many German universities have already made first steps toward more digital autonomy: They host their own website interfaces to chat with LLMs, such as HAWKI^[1] or KI:connect.nrw^[2]. These websites make sure that the account information of university members stay internal, while only the chat messages are forwarded to external providers who host the actual LLMs. This is a crucial first step for more data privacy and less dependency that can be implemented by universities at almost no additional cost (beyond what needs to be paid for LLM tokens, anyways). We recommend that universities open such interfaces for all their members, including students, teachers, researchers, and administrators to provide a meaningful alternative to proprietary systems. However, we emphasize that AI use must remain voluntary, and can even be discouraged in certain contexts (e.g., in teaching when building foundational knowledge and skills that are needed to competently judge AI outputs).

The next step: OpenWeight LLM Hosting

Just offering a chat interface is an insufficient basis for digital autonomy. If no further steps are taken, universities remain dependent on external providers of proprietary LLMs. The competitive position of these providers is further strengthened by the high-volume contracts with universities as well as the valuable research- and teaching-related chat data provided by universities – thus, potentially, deepening dependencies and leading to lock-in effects. Finally, privacy concerns remain as the chat messages themselves may leak personal data. Therefore, a diversified AI infrastructure is needed, meaning a diversity of hosters and a diversity of LLMs.

Some universities have, therefore, partnered with high performance computing (HPC) centers which host OpenWeight LLMs, such as Meta’s Llama models, DeepSeek models, or even more open models, such as Apertus^[3]. Such arrangements have crucial advantages for universities: They can alleviate privacy concerns, can guarantee access to transparent models with known parameters, and can control costs more reliably. Germany already has working best-practice examples, most notably the GWDG^[4], which connects to dozens of universities, but other initiatives like Open Source-KI.nrw^[5] have started in this direction, as well. Universities should secure contracts with such OpenWeight LLM hosters to enable LLM access without privacy or dependency concerns for their members. If such hosters are not yet available, universities should partner with HPC institutions to enable OpenWeight LLM hosting on their servers. Such partnerships have also been recommended by the GWDG paper on AI basic infrastructure (“KI-Grundversorgung”)^{^[4]}. To promote digital autonomy we recommend hosting at multiple HPCs, meaning more technical redundancy, LLM hosting capabilities in more locations, and less dependency on single providers.

Supporting research with LLM API access

A chatbot interface supports research processes at very small scales but is insufficient for larger research applications, such as automatically annotating/classifying large amounts of text data, transcription tasks, or building custom systems that need LLMs as a component. Research applications of LLMs are not limited to fields like computer science and computer linguistics but cross disciplinary boundaries, including natural sciences, social science and humanities. Establishing valid scientific methods with LLM involvement is an on-going process. Developing and applying such methods requires reliable LLM access with full transparency.

For such research use, an application programming interface (API) is required. Currently, API access to LLMs is (almost) only offered by proprietary vendors. However, depending on vendors who keep training data, LLM architecture and surrounding software secret, severely limits researchers’ epistemic agency in the sense of critically engaging with underlying biases in the LLMs, as well as good scientific practice in the sense of transparency and reproducibility of research. To build an alternative that promotes the digital autonomy of researchers, HPC centers must be equipped to offer APIs to researchers, which means handling a large expected volume of research-related LLM inference queries, far exceeding the 10 queries per day and user currently estimated^{^[4]}. Importantly, this inference infrastructure is separate from and additional to classic scientific computing, also situated at HPCs: Inference APIs are intended for prototyping and small-compute research tasks, whereas classic scientific computing typically requires a proposal to apply for a large, multi-hour to multi-week computing effort. Classic scientific computing will still be needed, not least to fine-tune and train LLMs.

Universities should strategically apply for and politically demand investments in HPC centers to equip them with hardware and personnel to handle large volumes of research-related LLM inference queries and to provide API access to all researchers. This ensures that researchers have full freedom to choose whether and which LLM to use and can research every aspect of the models.

LLM integration in open source digital teaching tools

In teaching, chatbot interfaces are already useful for students and provide an alternative to personal accounts with proprietary vendors. However, many useful teaching applications require additional functionality, such as tutoring chatbots that should be able to answer questions and provide feedback and hints that are based on the material in one course. If such teaching applications are offered, they should not force students (or teachers) to transmit teaching material or student data to proprietary vendors; and teachers as well as students should have full autonomy how the prompts to LLMs are configured and which data is used to support teaching and learning. OpenSourceKI.nrw and GWDG have already developed prototype systems in this direction; the practice projects of KI:edu.nrw have shown how such infrastructure can be used in teaching^[6]. Universities should support open source developments that equip digital learning tools with freely configurable open weight LLM functionalities, including retrieval augmented generation (RAG) based on teaching material, and give their teachers and students the choice to integrate these functionalities in their courses.
We emphasize that we advocate for an autonomy-respecting option to use LLMs if desired. Universities should facilitate discussions for informed decisions by teachers and students regarding LLM use in specific learning contexts (i.e. a specific course in a specific subject for a specific learner).

The timeline

We believe that university chat interfaces and OpenWeight LLM hosting are steps that can be taken immediately or within months. To achieve API access for researchers and OpenSource teaching tools, prototypes already exists and universities should take steps to facilitate investments and developments (e.g., via proposals and political advocacy) and build partnerships with institutions (such as HPCs) who can become their suppliers for API access and OpenSource teaching tools. With coordinated effort, we believe that even these mid-term goals can be achieved within two years. We emphasize that these are only short- and mid-term steps to provide a technological foundation for digital autonomy at the university level. Universities will need to take additional steps in education, research, administration, governance, etc. Further, policy action on the national or even European level will be needed to achieve an autonomy-promoting infrastructure for training LLMs and gathering training data in a way that respects autonomy.

Related Initiatives

We are not the first to propose similar activities. The recommendations in this paper are particularly well aligned with the strategy paper of KI:edu.nrw^[7], the “KI-Zukunftsfonds Hochschule”^[8], the “KI Grundversorgung”^[4] and the expert hearings on “Souveräne KI-Infrastrukturen” of the Hochschulforum Digitalisierung in Germany. Other initiatives toward high performance computing hardware for AI are the AI (Giga-)factories (e.g. HammerHAI^[9]), the JUPITER system^[10] at FZ Jülich, supercomputing for LLM training in Darmstadt^[11]. Related initiatives for the training of fully open LLMs in Europe are OpenEuroLLM^[12], the Swiss AI Iniative^[3] and Open GPT-X^[13]. In terms of promoting digital autonomy of all university members and remaining skeptical toward AI hype, our document aligns with the guidelines “Ethical AI in Higher Education” for teachers^[14] and for students^[15] (both by the Network “Ethical Use of AI”^[16]). All these initiatives (and many more) play a role to build an infrastructure that promotes digital autonomy in universities.