Keynotes
Fifty-one years ago, Jim Gray and his IBM colleagues published their first papers that defined the transaction abstraction and mechanisms to support it: two-phase locking for isolation and logging for atomicity and durability. Three years later, I published my first paper on the topic. By the early 1990’s, the transaction problem seemed to be solved. Yet research continues. One reason is that platform shifts expose new problems and present new optimization opportunities. Today’s big platform shift is the cloud. In this talk, I will start by reviewing some early history of transaction research. I’ll then focus on recent work by me and others on transaction mechanisms for cloud databases. I’ll close with a few topics where I’d like to see future work.
Bio: Philip A. Bernstein is a Distinguished Scientist at Microsoft Research. Over the past 50 years, he has been a product architect at Microsoft and Digital Equipment Corp., a professor at Harvard University and Wang Institute of Graduate Studies, and a VP Software at Sequoia Systems. During that time, he has co-authored over 200 papers on database management and two books on the theory and implementation of transaction processing systems. He is a Fellow of the ACM and AAAS, a winner of the E.F. Codd SIGMOD Innovations Award, a member of the Washington State Academy of Sciences, and a member of the U.S. National Academy of Engineering. He received a B.S. degree from Cornell and M.Sc. and Ph.D. from University of Toronto. More details are at https://research.microsoft.com/~philbe.
The last few years have shown that making language models reliable requires rethinking how they retrieve information. Our 2021 ACL paper on Database reasoning over text demonstrated that transformer models struggle with database‑style queries requiring joins and aggregations; a modular architecture combining retrieval and reasoning over multiple spans could answer complex questions and scale to thousands of facts. This can be considered a for of retrieval‑augmented generation (RAG) ante-litteram. Follow‑up studies revisited the retrieval component itself. In Rethinking relevance we analyzed how “relevant”, distracting and random documents influence RAG and found that adding random noise often improves the accuracy of large language models, whereas high‑scoring distractors can harm them. The power of noise extended this analysis, showing systematically that the retriever’s top‑scored non‑answer passages reduce effectiveness, but injecting random documents can unexpectedly boost performance. Our latest work, ECLIPSE, applies this intuition to dense retrieval: by using irrelevant documents as a reference for identifying noisy embedding dimensions, it learns which components of the vector space carry signal and improves ranking without costly query‑dependent pruning. Along with other recent papers we argue that embracing noise and irrelevance, rather than discarding it, is an essential step for building more robust RAG systems.
Bio: Fabrizio Silvestri is a full professor in the Department of Computer, Control and Management Engineering (DIAG) at Sapienza University of Rome. His research spans natural language processing, web search, graph learning and explainable AI, with a focus on social‑media integrity and trustworthy machine learning. Before returning to academia he worked at Facebook AI and Yahoo Research and earlier developed algorithms for web search at the Italian National Research Council; he also serves on program committees and editorial boards for major AI venues. Fabrizio Silvestri co‑authored a series of pioneering studies that examined how retrieval strategies influence retrieval‑augmented generation (RAG) systems, revealing that carefully curating retrieval pipelines—and sometimes even including seemingly irrelevant documents—can paradoxically improve large‑language‑model accuracy. To elevate the often‑overlooked role of information retrieval in generative AI, he co‑organized the Information Retrieval’s Role in RAG Systems workshop. His early monograph Mining Query Logs: Turning Search Usage Data into Knowledge explored techniques for extracting insights from search logs to improve retrieval systems; this foundation in query‑log analysis informs his current efforts to design more effective retrieval components for RAG models.
Invited Lecturer
As AI systems in general, and LLMs in particular, increasingly mediate consequential decisions, the demand for explainable and trustworthy results continues to be a major challenge. In this talk, I argue that two classical, well-understood fields, namely database provenance and computational argumentation, offer complementary foundations for a principled answer. Provenance tells us why (or why not) a result was derived; argumentation tells us which defensible position survives challenge, and how. Bringing the two together yields a dialogical, evidence-based account of AI outputs in which claims can be examined, attacked, defended, and traced back to their data and assumptions. I will sketch this perspective along three threads: (i) game- and provenance-based explanations of stable solutions in abstract argumentation frameworks; (ii) visual and interactive tools for resolving ambiguity in legal argumentation (AF-XRAY); and (iii) a vision for "Trustworthy AI Results" (TAIR) via evidence structures and certificates that loosely couple symbolic reasoning with LLM-enhanced knowledge-management pipelines. The aim is a modest but useful step toward neurosymbolic AI that "knows what it knows" and that can show its work.
Bio: Bertram Ludäscher is Professor in the School of Information Sciences (iSchool) at the University of Illinois at Urbana-Champaign, with affiliate appointments at NCSA and the Siebel School of Computing & Data Science. He directs the Center for Informatics Research in Science & Scholarship (CIRSS). His research spans declarative data science, database provenance, scientific workflows, and -- most recently -- computational argumentation and explainable, trustworthy AI. He received his Dipl.-Inform. from Universität Karlsruhe (1992) and his Dr.rer.nat. from Universität Freiburg, Germany (1998). He was previously Professor of Computer Science at UC Davis and a research scientist at the San Diego Supercomputer Center (SDSC, UCSD).