Denny’s prime session picks for Knowledge + AI Summit 2025

Knowledge + AI Summit 2025 is only a few weeks away! This 12 months, we’re providing our largest choice of classes ever, with over 700+ to select from. Register to hitch us in-person in San Francisco or nearly.

With a profession rooted in open supply, I’ve seen firsthand how open applied sciences and codecs are more and more central to enterprise technique. As a long-time contributor to Apache Spark™ and MLflow, a maintainer and committer for Delta Lake and Unity Catalog, and most lately a contributor to Apache Iceberg™, I’ve had the privilege of working alongside a number of the brightest minds within the business.

For this 12 months’s classes, I’m specializing in the intersection of open supply and AI – with a selected curiosity round multimodal AI. Particularly, how open desk codecs like Delta Lake and Iceberg, mixed with unified governance by means of Unity Catalog, are powering the following wave of real-time, reliable AI and analytics.

My High Picks

The upcoming Apache Spark 4.1: The Subsequent Chapter in Unified Analytics

Apache Spark™ has lengthy been acknowledged because the main open-source unified analytics engine, combining a easy but highly effective API with a wealthy ecosystem and top-notch efficiency. Within the upcoming Spark 4.1 launch, the neighborhood reimagines Spark to excel at each huge cluster deployments and native laptop computer growth. Hear and ask inquiries to:

Xiao Li an Engineering Director at Databricks, an Apache Spark Committer, and a PMC member.
DB Tsai is an engineering chief on the Databricks Spark crew. He’s an Apache Spark Venture Administration Committee (PMC) Member and Committer

Iceberg Geo Kind: Reworking Geospatial Knowledge Administration at Scale

Geospatial is changing into increasingly more vital for lakehouse codecs. Study from Jia Yu, Co-founder and Chief Architect of Wherobots Inc., and Szehon Ho, Software program Engineer at Databricks, on the most recent and biggest across the geospatial information sorts in Apache Iceberg™.

Let’s Save Tons of Cash with Cloud-native Knowledge Ingestion!

R. Tyler Croy from Scribd, Delta Lake maintainer, and shepherd of delta-rs since its inception, will dive into the cloud-native structure Scribd has adopted to ingest information from AWS Aurora, SQS, Kinesis Knowledge Firehose, and extra. Through the use of off-the-shelf open supply instruments like kafka-delta-ingest, oxbow, and Airbyte, Scribd has redefined its ingestion structure to be extra event-driven, dependable, and most significantly: cheaper. No jobs wanted!

This session will dig into the worth props of a lakehouse structure and cost-efficiencies inside the Rust/Arrow/Python ecosystems. Just a few beneficial movies to observe beforehand:

Daft and Unity Catalog: a multimodal/AI-native lakehouse

Multimodal AI will basically change the panorama as information is extra than simply tables. Workflows now typically contain paperwork, photographs, audio, video, embeddings, URLs and extra.

This session from Jay Chia, Co-founder of Eventual, will present how Daft + Unity Catalog can assist unify authentication, authorization and information lineage, offering a holistic view of governance, with Daft, a preferred multimodal framework.

Bridging Massive Knowledge and AI: Empowering PySpark with Lance Format for Multi-Modal AI Knowledge Pipelines

PySpark has lengthy been a cornerstone of massive information processing, however the rise of multimodal AI and vector search introduces challenges past its capabilities. Spark’s new Python information supply API allows integration with rising AI information lakes constructed on the multi-modal Lance format.

This session will dive into how the Lance format works and why it is a vital element for multimodal AI information pipelines. Allison Wang, Apache Spark™ committer, and Li Qiu, LanceDB Database Engineer and Alluxio PMC member, will dive into how combining Apache Spark (PySpark) and LanceDB permits you to advance multi-modal AI information pipelines.

Streamlining DSPy Growth: Observe, Debug and Deploy with MLflow

Chen Qian, Senior Software program Engineer at Databricks, will present tips on how to combine MLflow with DSPy to carry full observability to your DSPy growth.

You’ll get to see tips on how to observe DSPy module calls, evaluations, and optimizers utilizing MLflow’s tracing and autologging capabilities. Combining these two instruments makes it simpler to debug, iterate, and perceive your DSPy workflows, then deploy your DSPy program end-to-end.

From Code Completion to Autonomous Software program Engineering Brokers

Kilian Lieret, Analysis Software program Engineer at Princeton College, was lately a visitor on the Knowledge Brew videocast for a captivating dialogue on new instruments for analysis and enhancing AI in software program engineering.

This session is an extension of this dialog, the place Kilian will dig into SWE-bench (a benchmarking instrument) and SWE-agent (an agent framework), the present frontier of agentic AI for builders, and tips on how to experiment with AI brokers.

Composing high-accuracy AI techniques with SLMs and mini-agents

The always-amazing Sharon Zhou, CEO and Founding father of Lamini, discusses tips on how to make the most of small language fashions (SLMs) and mini-agents to cut back hallucinations utilizing Combination of Reminiscence Exports (i.e., MoME is aware of finest)!

Discover out a little bit bit extra about MoME on this enjoyable Knowledge Brew by Databricks episode that includes Sharon: Combination of Reminiscence Exports.

Past the Tradeoff: Differential Privateness in Tabular Knowledge Synthesis

Differential privateness is a vital instrument to offer mathematical ensures round defending the privateness of the people behind the information. This discuss by Lipika Ramaswamy of Gretel.ai (now a part of NVIDIA) explores the usage of Gretel Navigator to generate differentially non-public artificial information that maintains excessive constancy to the supply information and excessive utility on downstream duties throughout heterogeneous datasets.

Some good pre-reads on the subject:

Constructing Data Brokers to Automate Doc Workflows
One of many greatest guarantees for LLM brokers is automating all information work over unstructured information — we name these “information brokers.” Jerry Liu, Founding father of LlamaIndex, dives into tips on how to create information brokers to automate doc workflows. What can generally be complicated to implement, Jerry showcases tips on how to make this a simplified circulate for a elementary enterprise course of.

My High Picks

Leave a Reply Cancel reply

Related News

LLM Analysis Papers from 2025 You Ought to Learn

Databricks Invests in Core Information Engineering, Operations