MinneBOS has ended
Presented by MinneAnalytics. Hosted by Boston University Questrom School of Business.
Back To Schedule
Thursday, August 23 • 10:30am - 11:00am
Analyzing Massive Genomics Datasets using Apache Spark

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Powered by the continuous decrease of the cost of sequencing a single human genome, "big data" sequencing studies (>10,000 sample) are becoming common in both industrial and research settings. To work with datasets at this size and scale, we need to allow bioinformaticians to write genomic analysis queries that can be distributed across large compute clusters. Recently, several prominent libraries like GATK4, ADAM, and Hail have used Apache Spark to achieve this goal. Apache Spark is a "map-reduce"-like system that allows code written in Scala, Java, Python, R, or SQL to be run in parallel across a cluster with hundreds to thousands of cores. In this talk, we will briefly explain what Apache Spark is and how it works. Then, we will look at a few genomic analyses where Apache Spark drops latency from hours to minutes, which enables a human-in-the-loop data science workflow. As part of these analyses, we will also explore how Apache Spark can be used to integrate other data sources (clinical measurements, imaging) with genomics data, and we will extract best practices for architecting scientific analyses on Apache Spark.

avatar for Frank Austin Nothaft	, PhD

Frank Austin Nothaft , PhD

Core Committer, Big Data Genomics ADAM project, GTM Lead, Genomics Databricks
Prior to joining Databricks, Frank was a lead developer on the Big Data Genomics/ADAM project at UC Berkeley, and worked at Broadcom Corporation on design automation techniques for industrial scale wireless communication chips. Frank holds a PhD and Masters of Science in Computer... Read More →

Thursday August 23, 2018 10:30am - 11:00am EDT
Room 306 Boston University Questrom School of Business, 595 Commonwealth Avenue Boston