Follow my blog for updates.
Highly effective data science teams: A list of seven habits
July 21, 2022.
Challenges in data transfer: Case 1—Evaluating MLOps platforms
March 5, 2022.
Citations and coverage. Data Engineering Weekly #80 (Ananth Packkildurai), What I learned from attending Tecton’s apply(meetup) 2022 (James Le).
Batter plate discipline: It doesn’t pay to swing at every pitch
September 9, 2021.
Models for integrating data science teams within organizations: A comparative analysis
July 31, 2019.
Citations and coverage. University of Virgina Data Science, 97 Ways (Matt Wright), Projects to Know (Amplify Partners — Sarah Catanzaro), The Data Science Roundup (Fishtown Analytics — Tristan Handy), Normcore Tech (Vicki Boykis), Femstreet (Sarah Nöckel), Linear Digressions, Analytical IQ (Adam Lorton), Hex Blog (Hex — Barry McCardel), Full Stack Deep Learning, The ML Times, nibble dispatch, Hiring Data Scientists and Machine Learning Engineers: A Practical Guide (a book by Roy Keyes), Blog Cast (Sam Bail), dbt Blog (Erin Vaughan & Janessa Lantz), Building The Modern Data Team (Pedram Navid), On Search Leadership (Daniel Tunkelang), Modern Data Teams Hub (Amplify Partners — Emilie Schario).
Data science team sizing and allocation: An algorithm
July 29, 2019.
Management best practices: A list of 20 things
December 18, 2018.
Q&A with Steven Sinofsky at Twitter HQ: Developing cross-functional teams
November 16, 2018.
Abstract. After introducing General Folders, I'll go over three impactful data projects. The first is about designing OKRs to encourage collaboration among product teams at Twitter. Second is building the feature creation pipeline for fast updates on the fraud detection engine at Paytm. Last is sales enablement at Carbon Health via quantifying risk and presenting helpful data to prospects.
Abstract. Not so long ago, I met with over 30 MLOps companies to learn of their workflows at the very first step in the evaluation process — that of data collection and transfer. I had a hunch this part of the pipeline posed challenges. In this talk, I review the myriad roadblocks faced by companies evaluating MLOps platforms in providing access to their data for evaluation purposes. Then I discuss a potential solutions.
Abstract. The first part of the talk is an overview of the Data Science team roadmap and infrastructure decisions, with a tour of the clinical decision support system and covidclinicaldata.org. The second part is a review of our efforts for the COVID-Ready program. We report on recommendations that can be made to employers, based on simulations surfacing how testing cadence and other policies affect outbreaks in the workplace.
Abstract. In this talk, we review concepts from the audio signal processing field. We then show how familiarity with these concepts allows for a better understanding of DJing tools and techniques, and vice versa.
Abstract. Traditional approaches to managing technical projects can be at odds with achieving success with machine learning. In this session, we discuss how ML and AI executives can build effective teams, support them with the right processes and tools, and shift the broader organizational culture in ways that reinforce innovation in machine learning.
Abstract. In this meetup, we hear about data science projects that succeeded in spite of the limitations of existing methodology.
Abstract. Hear from people that have experienced startups and large corporations in a range of industries reveal tips to work faster, more efficiently, and create an org-wide culture that supports effective ML.
Abstract. Meet women in data science from all over the Bay Area at this WiDS post-conference screening. The event will be an opportunity to meet like-minded women as well as listen to the great lineup of panelists.
Abstract. Thanks to Grant, the episode has turned into a good review of my work history.
Abstract. Online social networks are ubiquitous graphs. To test algorithms that scale with the size and order of these networks, we require synthetic samples. In this talk, we go over several methods for generating random graphs representative of online social networks. We are especially interested in the M-GEOP model (Bonato et al., 2014), and in assessing the fit of these models to the Facebook dataset.
Abstract. Master's thesis defense slides.
Abstract. We propose sparse approximation weighted regression (SPARROW), a nonparametric method of regression that takes advantage of the sparse linear approximation of a query point. SPARROW employs weights based on sparse approximation in the context of locally constant, locally linear, and locally quadratic regression to generate better estimates than for e.g., k-nearest neighbor regression and more generally, kernel-weighted local polynomial regression. Our experimental results show that SPARROW performs competitively.
Abstract. Sparse coding is achieved by solving an under-determined system of linear equations under sparsity constraints. We briefly look at several algorithms that solve the resulting optimization problem (exactly or approximately). We then see how this optimization principle can be applied in both a supervised and unsupervised context: multiclass classification and feature learning, respectively. Next, we talk about dictionary learning and some of its well-known instances. Applications of dictionary learning include image denoising and inpainting.
Abstract. An image can be represented at different levels, starting from pixels, going on to edges, to parts, to objects, and beyond. Over the years, many attempts have been made at engineering useful descriptors that are able to extract low-to-high level features from images. But what if we could make this process automatic? What if we could "learn" to detect layer after layer of features of increasing abstraction and complexity? After all, it would be impossible for us to foresee and hard-code all the kinds of invariances necessary to build features for our ever more complicated tasks. In this talk, we go over several unsupervised feature learning methods that have been in the making since 2006.
Details. This is a brief tutorial on learning theory for a machine learning class.
Details. This is a brief tutorial on applying the EM algorithm for estimating the parameters of a Gaussian mixture model.
Abstract. Apart from raising computational costs, high-dimensional data behave in counterintuitive ways. In this seminar, we talk about why in some situations, more features fail to result in increased accuracy in clustering and classification tasks. To deal with the "curses of dimensionality", many dimensionality reduction (DR) methods have been proposed. These methods map the data points to a lower-dimensional space, while preserving the important properties of the data in its original space. We go over one linear and two nonlinear DR methods. Then, through some examples, we see how the prior assumptions and computational complexities of each method affects its application in reducing the dimensionality of certain datasets.
Details. This is an overview of the split Bregman method for solving an $\ell_1$-regularized problem arising from TV denoising.
On automatic music genre recognition by sparse representation classification using auditory temporal modulations
with Bob L. Sturm
Computer Music Modeling and Retrieval: Lecture Notes in Computer Sciences (LNCS). Springer, 2012.