There has never been more software companies focused on creating amazing technologies to help marketers achieve success. But although choice and change are good — marketers shine best at inflection points that open up new opportunities — too much choice and change is, well, challenging. In this candid talk, martech pioneer Scott Brinker will take you through the latest martech landscape and explain how businesses can harness it to their advantage.
Is there consolidation in the martech landscape – and if so, how will that impact you?
What are the best ways to manage the incredible rate of change in marketing technology?
How can you leverage your stack to build trust and scale the art of marketing?
I will work through examples of how different business goals can lead to very different machine learning models, even if the use cases look similar. I will use these examples to highlight principles of how data scientists and product managers can better communicate to increase the impact of data and machine learning on a product.
My MIT consortium (trust.mit.edu) has established and test deployed methods of analyzing data that are dramatically better from user privacy, commercial secrets, and cybersecurity perspectives. Work is supported by the EU and Chinese governments, and major multinational companies. Test deployments are underway in several countries.
Professor, MIT’s Human Dynamics Laboratory and the MIT Media Lab Entrepreneurship Program, MIT Sloan
Alex `Sandy' Pentland directs MIT’s Human Dynamics Laboratory and the MIT Media Lab Entrepreneurship Program, and co-leads the World Economic Forum Big Data and Personal Data initiatives.
Thursday August 23, 2018 9:30am - 10:15am EDT
Exec Ed: 426, 428, 430Boston University Questrom School of Business, 595 Commonwealth Avenue Boston
This talk will explore the competitive landscape among graph database solutions providers, discuss key trends in the graph database industry, and criteria for selecting graph database solutions. There are over thirty graph databases available currently with a smaller subset focused on RDF graphs. An optimal solution often depends on a specific use case, but some common characteristics can help identify it, such as openness, horizontal scalability, query language support, deployment options, and community support.
In healthcare, a machine learning model can mean the difference between prescribing a lifesaving medical treatment or death. This session will cover high and low stakes models in production, how they change over time, how to monitor their performance, and what to do when they underperform.
In an era where the dream of precision medicine is poised to become a reality the primary challenge is enabling researchers to manage and analyze the rapidly growing sea of data available to them. To meet this challenge, the Data Sciences Platform of the Broad Institute of MIT and Harvard has developed a suite of web-based services and applications. This suite, dubbed Workbench, provides a set of secure API-accessible capabilities including data management, batch and interactive data analysis, a repository for analytical tools, and management of users and billing.
We have data everywhere…in databases and data warehouses, in files and spreadsheets, in a data lake and on servers, in the cloud and on-premise. Finding the data that you need can be difficult in a modern data ecosystem and cloud adoption can make this downright challenging. Data Virtualization is a technology that allows users to find and access the data that they need without having to replicate it into a consolidated central repository, such as a data lake. Data Virtualization provides a semantic layer on top of existing your data stores - whether on-premise or in the cloud - that allows users to find and access the data that they need, in real-time, as and how they need it. This session explains how Data Virtualization achieves this promise of data accessibility for everyone - self-service with guard rails - even with the largest of analytical data sets.
The Startup Showcase features a lineup of promising startups related to data and tech. Each startup gets six minutes to pitch to an audience of peers, investors and media, followed by a Q&A. More startups to be announced soon! If your startup would like to participate, contact graeme@minneanalytics.org.
This research-based presentation highlights the rising prominence of machine learning as an enterprise capability, juxtaposed with the surfacing challenges surrounding ML development itself.
Do you see lots of data but not the story buried in it? Do your dashboards, reports and infographics look like ransom notes? Then this is the presentation you need to attend!
Too often, the tables and graphs used to communicate health and healthcare data are poorly designed, at best failing to and at worst even incorrectly communicating the critical information used to measure performance, educate and inform patients, and identify the right opportunities for change and improvement to our health and healthcare systems.
In this interactive session you will learn the science behind how we see and understand information, and simple yet powerful ways to display and communicate information so that the opportunities are clear and people are moved to action.
Machine learning and AI models now outperform humans on tasks ranging from image recognition to language translation. However, sending video, audio, and other sensor data up to the cloud and back is too slow for apps like Snapchat, features like “Hey, Siri!”, and autonomous machines like self-driving cars. Artificial Intelligence has transitioned from a moonshot idea to accessible technology. Developers seeking to provide seamless user experiences must now move their models down to devices on the edge of the network where they can run faster, at lower cost, and with greater privacy. As a result, the way we design, build, and market technology products is playing catch-up. To deliver the best experiences we must create AI-first applications, in the same way that we transitioned to create mobile-first applications.
Dr. Jameson Toole is the CEO and cofounder of Fritz—building tools to help developers optimize, deploy, and manage machine learning models on mobile devices. He holds undergraduate degrees in Physics, Economics, and Applied Mathematics from the University of Michigan as well as... Read More →
Thursday August 23, 2018 10:30am - 11:00am EDT
Room 404Boston University Questrom School of Business, 595 Commonwealth Avenue Boston
In this talk, we will trace the development of Wayfair’s deep learning capability and how deep learning is driving strong results in projects across merchandising, personalization and marketing for the company. Wayfair has invested in building deep learning experience over multiple years and has a large team of data scientists, machine learning engineers, product managers and more dedicated to deploying these solutions at scale. We will discuss learnings from this growth process, highlight successes (and failures) and how deep learning has evolved to be a core aspect of the work the Wayfair team is doing.
Dan oversees Wayfair’s Data Science team, which works on projects stretching across the customer experience in Marketing, Merchandising & Storefront. Prior to Wayfair, Dan was a strategy consultant at the Boston Consulting Group in Chicago. Dan holds a B.A. from Columbia University... Read More →
Thursday August 23, 2018 10:30am - 11:00am EDT
Room 406Boston University Questrom School of Business, 595 Commonwealth Avenue Boston
Organizations of various sizes and across verticals face challenges with how their data evolves as they scale. One of the most significant challenges as they look to integrate data from additional systems is duplication of entities across systems. These duplicated entities may not have clear criteria to help engineers identify and combine them. However, as organizations are increasingly looking to aggregate data sources—both internal and external—addressing this challenge is critical to realizing value from the data. At medium- to large-scale data, combining the data sets can't be done without the aid of software. Traditional techniques for tackling this challenge involved rule-based systems to look at every pairwise combination and determine whether there was a match or not. More recently, machine learning approaches that block and score probable matches allow engineers to set an acceptable probability threshold, as well as validate and tune the algorithm to help it continuously improve.
In this presentation, we will present entity resolution in the context of a detailed example scenario using physician data matching. We will walk through a machine learning model we created by forking and building on an open source entity resolution library, dedupe.io. We will describe how the model initially performed, how we evaluated it, and how we improved it. Finally, we will walk through high-level workflows for performing entity resolution on your own data.
Powered by the continuous decrease of the cost of sequencing a single human genome, "big data" sequencing studies (>10,000 sample) are becoming common in both industrial and research settings. To work with datasets at this size and scale, we need to allow bioinformaticians to write genomic analysis queries that can be distributed across large compute clusters. Recently, several prominent libraries like GATK4, ADAM, and Hail have used Apache Spark to achieve this goal. Apache Spark is a "map-reduce"-like system that allows code written in Scala, Java, Python, R, or SQL to be run in parallel across a cluster with hundreds to thousands of cores. In this talk, we will briefly explain what Apache Spark is and how it works. Then, we will look at a few genomic analyses where Apache Spark drops latency from hours to minutes, which enables a human-in-the-loop data science workflow. As part of these analyses, we will also explore how Apache Spark can be used to integrate other data sources (clinical measurements, imaging) with genomics data, and we will extract best practices for architecting scientific analyses on Apache Spark.
Core Committer, Big Data Genomics ADAM project, GTM Lead, Genomics Databricks
Prior to joining Databricks, Frank was a lead developer on the Big Data Genomics/ADAM project at UC Berkeley, and worked at Broadcom Corporation on design automation techniques for industrial scale wireless communication chips. Frank holds a PhD and Masters of Science in Computer... Read More →
Thursday August 23, 2018 10:30am - 11:00am EDT
Room 306Boston University Questrom School of Business, 595 Commonwealth Avenue Boston
AI is forcing companies to think differently about how they make decisions and run their operations. One of the most challenging part of becoming an AI-driven enterprise is deploying predictive models to production. That means that moving to production is becoming the primary roadblock to getting value out of AI. There are several reasons for this, for example the fact that existing systems do not have robust and straightforward process for supporting predictions, so companies frequently must develop new applications and pathways to support new workflows. This can be costly and time-consuming. In this talk, you will learn how to easily operationalize and move different versions of your AI models to production.
AI & Machine Learning Cloud Developer Advocate, Microsoft
Francesca Lazzeri, PhD is AI & Machine Learning Cloud Developer Advocate at Microsoft. Francesca has 8 years of experience as data scientist and data-driven business strategy expert; she is passionate about innovations in big data technologies and the applications of machine learning-based... Read More →
Thursday August 23, 2018 10:30am - 11:00am EDT
Exec Ed: 426, 428, 430Boston University Questrom School of Business, 595 Commonwealth Avenue Boston
The revolution in baseball analytics didn't just start with Moneyball. This talk will review how data and technology has changed the game of baseball and how it has impacted winning decisions both on and off the field. Pitch F/X and Statcast technology and data will be discussed.
Andy Andres teaches Biology and Mathematics at Boston University (mostly introductory biology, physics, and human genetics), but also developed and teaches the highly successful MOOC “Sabermetrics 101: An Introduction to Baseball Analytics” to about 50,000 registered learners... Read More →
Thursday August 23, 2018 11:15am - 12:00pm EDT
Room 408Boston University Questrom School of Business, 595 Commonwealth Avenue Boston
Data Scientists work on almost every aspect of a business nowadays. This is true for both technical innovations which are unthinkable without use of machine learning algorithms and AI solutions, and industries where “business as usual” existed for decades and even centuries and now can be dramatically optimized using Data Science.
Some of these applications are related to the areas and industries which are not the first to come to mind when thinking about Data Science problems. This presentation will go over some examples of Data Science and Machine Learning being applied to the HR space, where employees of the companies are the data being analyzed. It’s common knowledge that employees are the most valuable asset for each company which creates the conclusion that it’s ultimately through talent analytics that the highest return on data science investments can be achieved.
This presentation won’t dive deep into various techniques and algorithms that can be used to solve applied Data Science problems in HR space, but will mention some of the possible paths scientists might consider taking in their work.
Lily graduated last November from Boston University with a Ph.D. in Statistics. Her thesis was mainly focused on Time Series analysis and the way serial dependency can be incorporated for statistical inference. Upon graduation, she joined Amazon’s Alexa team as a Data Scientist... Read More →
Thursday August 23, 2018 11:15am - 12:00pm EDT
Room 324Boston University Questrom School of Business, 595 Commonwealth Avenue Boston
DevOps is a fairly well understood domain. Analytic Ops on the other hand is an emerging area. Analytic Ops attempts to address the challenges of bringing process, people and technology together to create the foundation of an organization’s analytic framework. This session will discuss how crucial it is for an organization to have a well defined analytics process to retain its competitive edge and how to go about setting it. It will also include a short demo of Think Big’s Analytic Ops Accelerator that provides an end-to-end, flexible framework for the orchestration, deployment and management of analytic models at scale.
What if I told you I had evidence of a serious threat to American national security – a terrorist attack in which a jumbo jet will be hijacked and crashed every 12 days. Thousands will continue to die unless we act now. This is the question before us today – but the threat doesn’t come from terrorists. The threat comes from climate change and air pollution.
We have developed an artificial neural network model that uses on-the-ground air-monitoring data and satellite-based measurements to estimate daily pollution levels across the continental U.S., breaking the country up into 1-square-kilometer zones. We have paired that information with health data contained in Medicare claims records from the last 12 years, and for 97% of the population ages 65 or older. We have developed statistical methods and computational efficient algorithms for the analysis over 460 million health records.
Our research shows that short and long term exposure to air pollution is killing thousands of senior citizens each year. This data science platform is telling us that federal limits on the nation’s most widespread air pollutants are not stringent enough.
This type of data is the sign of a new era for the role of data science in public health, and also for the associated methodological challenges. For example, with enormous amounts of data, the threat of unmeasured confounding bias is amplified, and causality is even harder to assess with observational studies. These and other challenges will be discussed.
Unstructured free text is plentiful and valuable, for example: Doctor's notes, news stories, call-center notes. We would like to take advantage of such text for warnings of serious medical conditions, classifying reports, and identifying customers likely to leave for a competitor.
Text can be hard to use for predictive modeling, but in some respects it is also easier to use than structured numeric data. This talk will give a high-level overview of opportunities and challenges in using text in predictive models, and survey technical approaches for representing and modeling text.
We are in a mobile online world, which creates a massive wealth of data for all kinds of purposes. Virgin Pulse, a Branson-founded wellbeing software provider, engages millions of its members daily on ways to get and keep them healthy, happy and productive. And it aims to make the experience fun and personalized, and to be seen as a trusted personal coach to each user, using gamification, machine learning and advanced models to create the best outcomes for all. At the same time, balancing critical data security and privacy with supporting the optimization of the daily experience for members is a key to optimizing the macro needs of the employers that invest in these programs. In this session, you’ll see both strategic perspectives and practical examples of how we leverage previously untapped “big data” that creates value and impact for both individuals and the companies they work for.
Time series data is ubiquitous: weekly initial unemployment claim, daily term structure of interest rates, tick level stock prices, weekly company sales, daily foot traffic recorded by mobile devices, and numerous google search terms, just to name a few.
Some of the most important and commonly used data science techniques in time series forecasting are those developed in the field of machine learning and statistics. Data scientists and quants should have at least a few basic time series statistical and machine learning modeling techniques in their toolkit.
Given the resurgence of neural network-based techniques in recent years, it is important for data science practitioner to understand how to apply these techniques and the tradeoffs between neural network-based and traditional statistical methods. This lecture discusses two specific techniques: Vector Autoregressive (VAR) Models and Recurrent Neural Network (RNN). The former is one of the most important class of multivariate time series statistical models applied in finance while the latter is a neural network architecture that is suitable for time series forecasting. I will demonstrate how they are implemented in practice and compares their advantages and disadvantages. Real-world applications are used to illustrate these techniques.
Jeffrey is the Chief Data Scientist at AllianceBernstein, a global investment firm managing over $500 billions. In his current role, Jeffrey is leading the centralized data science team, partnering with investment professionals to create investment signals using data science, and... Read More →
Thursday August 23, 2018 11:15am - 12:00pm EDT
Exec Ed: 426, 428, 430Boston University Questrom School of Business, 595 Commonwealth Avenue Boston
Many data science projects have failed due to a mismatch between the business opportunity and the staff's skills and expectations. In thissession you'll learn how to assess whether it's time to hire a data scientist, or whether you should pursue another path such as hiring business analysts or building a blended team with consultants instead. You'll also learn techniques for evaluating candidates who have more technical knowledge than you, and I'll explain why cutting edge machine learning techniques like deep neural networks are usually the wrong tool for most new initiatives. Students and job seekers will learn how to identify a company and project life cycle stage which aligns with their expertise and interests.
Many AI initiatives fail. Not because IT picked the wrong technology or hired the wrong AI whiz kid. Instead, failure is often a function of simply being unable to train the AI with the right data and content. The good news is, enabling your enterprise data for AI is not a mysterious process and many of the assets that are needed by AI driven apps are also the ones that also make employees more productive. So, it is a win-win.
In this talk, Seth makes the case to executives for enterprise information architecture – a foundational exercise that has the power to make or break your AI dreams.
Many important care quality measures, especially those that relates to communication, are not properly operationalized in the clinical context because they exist as unstructured free text in notes and other places. Our work aims to develop techniques that we can extract documentation of some of these important conversations for seriously ill patients. Our deep neural network attained 90%+ precision and recall for the documentation of care preferences.
The advent of deep learning has turned feature engineering into a lost art. However, there are still many cases where clever features can outperform deep learning models, especially in terms of efficiency, predictability, and training time.
Robert is a Machine Learning Engineer and Data Scientist. He contracts with data-driven companies both in Boston and around the world. He previously worked on Natural Language Processing at Google and has degrees in Math, CS, and Linguistics from Columbia University.
Thursday August 23, 2018 1:00pm - 1:45pm EDT
Room 406Boston University Questrom School of Business, 595 Commonwealth Avenue Boston
A look at the state of hospital & physician ranking systems in the US healthcare marketplace, with comparison and contrast to the opportunity assessment approach taken by Medtronic IHS.
No-Magic-Wands: Benefits and Dangers of Automated Machine Learning
Automated machine learning promised data scientists a better, faster way to build models. But the reality never matched the hype so far. Most automated machine learning solutions are black boxes that restrict the ability of data scientists to understand how the models work. Putting models like these into production is reckless and sometimes even dangerous. Is this the end of the citizen data scientist then? Not necessarily. But we need a new approach to data science, machine learning, and artificial intelligence. Automated machine learning needs to guide analysts and not overrule their decisions. Novel approaches need to focus on productivity first and on democratization second. But most importantly, they need to deliver reliable models which stop putting organizations or people at risk.
Join Dr. Ingo Mierswa, RapidMiner Founder, for an in-depth discussion on automated machine learning. We will explore:
What automated machine learning can do to help accelerate building machine learning models
What are the dangers and what could go wrong
How the different groups of data scientists can use automated machine learning
How to do it right and get most benefits out of automated machine learning
The increasing number of Electronic Health Record (EHR) clinical free text documents has urged the need to build novel clinical Natural Language Processing (NLP) solutions towards optimizing patient outcomes. Deep Learning (DL) techniques have demonstrated superior performance over other Machine Learning (ML) approaches for various NLP tasks in recent years. This talk will present a brief overview of various DL-driven clinical NLP algorithms developed in the Artificial Intelligence lab at Philips Research - such as diagnostic inferencing from unstructured clinical narratives, clinical paraphrase generation, and medical image caption generation.
Senior Scientist, Artificial Intelligence Lab, Philips Research North America
Sadid Hasan is a Senior Scientist at the Artificial Intelligence Lab in Philips Research, Cambridge, MA. His recent work involves solving problems related to clinical question answering, paraphrase generation, and medical image caption generation using Deep Learning. Before joining... Read More →
Thursday August 23, 2018 2:00pm - 2:45pm EDT
AuditoriumBoston University Questrom School of Business, 595 Commonwealth Avenue Boston
There is a tremendous progress in the compute infrastructure that is the backbone for emerging machine learning applications for digital transformation in our society. Nascent storage class memory with low latency, high endurance and low power coupled with memory centric computer architecture will provide the next boost in computing power to accelerate digital transformation with many new applications emerging in edge computing.
Extracting actionable insights from unstructured data is a major challenge facing Data Science. In addition, without unbiased and high-quality labeled data, one cannot train AI models to perform new tasks with high accuracy and fairness. We introduce the first automated data labeling (ADL) technology that fuels AI by identifying core concepts (topics) from raw text data across several languages. The alternative is to use human-based data labeling via crowdsourcing that takes months vs. minutes using ADL. We discuss the major use cases of data labeling and ontology discovery technology for automated large-scale data cleaning; semantic search; sentiment and emotion analysis; automated feature engineering; predictive text analytics; and conversational AI with application to healthcare, finance (banking & investment management), and insurance. Our proprietary ADL technology relies on Unsupervised Learning plus recent advances in Deep Learning and Natural Language Processing (NLP). Several informative examples with data visualization will be presented.
When dealing with any analytics problem, the first challenge is identifying and collecting data. When dealing with analytics problems with individual data, it is equally important to identify the data you don't need. Privacy by Design requires that you not only preserve the privacy of individuals in the data but also purposefully design your systems around principles that protect the privacy of the individual.
Ride-sharing platforms employ surge pricing to match anticipated capacity spillover with demand. We develop an optimization model to characterize the relationship between surge price and spillover. We test predicted relationships using a spatial panel model on a dataset from Uber's operation. Results reveal that Uber's pricing accounts for both capacity and price spillover. There is a debate in the management community on the efficacy of labor welfare mechanisms associated with shared capacity. We conduct counterfactual analysis to provide guidance in regards to the debate, for managing congestion, while accounting for consumer and labor welfare through this online platform.
Essential Administrative functions (aka back office functions) for health care we very well suited for assistance and automation with Deep Learning. These important functions are often too complex for rules but with lots of well-labeled data, they are a good fit for Deep Learning. OptumLabs Center for Applied Data Science will share how they design and train Deep Learning Neural Networks for these use cases.
The recent hype about automated machine learning has focused on the modeling part of the pipeline, while the critical step of feature engineering is left out of the discussion. Feature engineering, the process of extracting predictor variables from a dataset, largely determines the success or failure of a machine learning project. That is why automating feature engineering has long been a dream for machine learning practitioners. In this talk, we will walk through several real-world use cases using Featuretools, an open source library for automated feature engineering developed and maintained by Boston-based startup Feature Labs. We will show how automated feature engineering can reduce development time by 10x, deliver better predictive performance, create meaningful features with real-world insights, and prevent data leakage.
While the medical domain has a wealth of data, much of it is unstructured and poorly formatted. Doctors’ and nurses’ notes are notoriously noisy and often contain incomplete sentences, run-ons, and invented acronyms. This lack of structure in medical text confounds traditional NLP methods that rely on sentence structure to infer entities and relationships. However, this problem is not unique to the medical domain. Anyone trying to analyze web searches, social media posts, or any poorly formatted text face similar challenges. The core issue is that many NLP methods depend on well formatted text for analysis and break down when dealing with poorly formatted text. In this talk, John and Krishna will present a number of methods that can be used to wrangle messy text for various machine learning and NLP use cases, within and outside of the medical domain.
Krishna Srihasam is a senior data scientist at Wolters Kluwer Health. He has been applying ML and AI techniques to Health and Patient data for more than 3 years. He holds a Ph.D in Computational Neuroscience and has published several articles on applying ML techniques to neuroscience... Read More →
Thursday August 23, 2018 3:00pm - 3:30pm EDT
Exec Ed: 426, 428, 430Boston University Questrom School of Business, 595 Commonwealth Avenue Boston
We all are interested in finding new and disruptive technologies that can change the future and better our lives. In this era of big data overflow, how can we more efficiently and precisely analyze the big data to get truly meaningful insights backed by AI? I will present a new way of viewing and understanding 100,000 documents at a time to predict the future of companies and industries, find new business opportunities, and uncover white spaces by using unsupervised machine learning and innovative visualization algorithms.
Data Scientists like to tackle problems, and sometimes the results can be beautiful through visualization. Join Liza Duffy to explore some works of “art” and get inspired to create!
This presentation will focus on showing both supervised and unsupervised learning methods to work with claims data and how they can complement each other. A supervised method will look at CKD patients at-risk to develop ESRD, and unsupervised approach will look at classification of patients that tend to develop this disease faster than others.
This session sets the stage for the investigation of a classifier, hoping to extract an explanation of why it classified a specific point the way it did. Marc Light will review a handful of methods for such an interrogation and compare and contrast them.