Data Quality for LLMs: Building a Reliable Data Foundation
April 24, 2024 | 11:00 AM – 12:00 PM
Virtual
If you use large language models (LLMs) in your cancer research, register for this seminar to hear Elucidata’s Dr. Abhishek Jha discuss how data quality impacts LLM performance.
A reliable foundation that is well annotated and accessible to an LLM plays a major role in the value of its results.
You’ll see examples of how LLM-powered artificial intelligence (AI) agents query across three versions of the same gene expression corpus with differing results, including:
- unstructured data from the public repository Gene Expression Omnibus.
- structured data from the Crowd Extracted Expression of Differential Signatures project (tool developed by the Ma’ayan Lab at the Icahn School of Medicine at Mount Sinai).
- clean, linked, and harmonized data.
Dr. Jha will use these examples to discuss how the different quality in these data sources impacts LLM performance.