Follow my blog or subscribe via RSS or email for updates.
Startup advice: Some lessons I learned while building General Folders
August 12, 2024.
The state of data exchange: A survey of data transfer and sharing methodology
April 3, 2023.
Public speaking best practices: Towards more engaging conference talks
February 22, 2023.
Interfaces and bundle boundaries: Categorizing design decisions in enterprise product development
August 31, 2022.
Highly effective data science teams: A list of seven habits
July 21, 2022.
Challenges in data sharing and transfer: Case I. Evaluating AI platforms
March 5, 2022.
Citations and coverage. Data Engineering Weekly #80 (Ananth Packkildurai), What I learned from attending Tecton’s apply(meetup) 2022 (James Le).
Batter plate discipline: It doesn’t pay to swing at every pitch
September 9, 2021.
(Data ∩ Water) Terms: All at sea, exploring the data dictionary
July 19, 2021.
Citations and coverage. Building Recommendation Systems in Python and JAX, Chapter 5.
Models for integrating data science teams within organizations: A comparative analysis
July 31, 2019.
Citations and coverage. University of Virgina Data Science, 97 Ways (Matt Wright), Beyond the POC: How to Make Machine Learning Real in the Enterprise (Sam Charrington), Projects to Know (Amplify Partners — Sarah Catanzaro), The Data Science Roundup (Fishtown Analytics — Tristan Handy), Normcore Tech (Vicki Boykis), Femstreet (Sarah Nöckel), Linear Digressions, Analytical IQ (Adam Lorton), Hex Blog (Hex — Barry McCardel), Full Stack Deep Learning, The ML Times, nibble dispatch, Hiring Data Scientists and Machine Learning Engineers: A Practical Guide (a book by Roy Keyes), Blog Cast (Sam Bail), dbt Blog (Erin Vaughan & Janessa Lantz), Building The Modern Data Team (Pedram Navid), Data Science Org Design for Startups (Nirant Kasliwal), On Search Leadership (Daniel Tunkelang), Building A Data Platform From Scratch At Collectors (Sam Bail), Modern Data Teams Hub (Amplify Partners — Emilie Schario).
Data science team sizing and allocation: An algorithm
July 29, 2019.
SF Engineering Leadership Community Summit 2019: Some notes from the gathering
January 28, 2019.
Management best practices: A list of 20 things
December 18, 2018.
Q&A with Steven Sinofsky at Twitter HQ: Developing cross-functional teams
November 16, 2018.
Abstract. This presentation explores the concept of data collaboration and its use cases in marketing. We'll review how marketers leverage data collaboration to drive decision-making, enhance customer experiences, and achieve business outcomes. We'll also discuss current practices, challenges, and potential solutions for streamlining data sharing and enabling seamless collaboration across organizations.
Abstract. Businesses collaborate through data — every contract includes a data sharing or transfer clause. However, data collaboration tools have a long way to go to serve modern enterprise needs. In this talk, we will discuss some of the macro trends and practices impacting products in the data collaboration space. Some of these topics remain open and evolving debates.
Abstract. Join us at Snapdragon Stadium for the first ever Techstars San Diego powered by San Diego State University Demo Day. Meet the incredible cohort of companies as they showcase their progress.
Abstract. Data exchange is integral to business collaboration. However, data exchange pipelines are time consuming to build, prone to leaks, difficult to monitor, and costly to audit. In this talk, we present an overview of the methods companies use to exchange data. We then discuss solutions that better match the efficiency and security standards of today.
Abstract. Data exchange is integral to business collaboration. However, data exchange pipelines are time consuming to build, prone to leaks, difficult to monitor, and costly to audit. In this talk, we present an overview of the methods companies use to exchange data. We then discuss solutions that better match the efficiency and security standards of today.
Abstract. Data exchange is integral to every business relationship. Yet data exchange practices are highly manual, prone to leaks, difficult to validate, impossible to monitor, and costly to audit. In this talk, we present an overview of the methods enterprises use to exchange data and the outstanding challenges. We conclude by enumerating the properties of a good solution.
Abstract. After introducing General Folders, we'll review three impactful data projects. First, the design of OKRs to encourage collaboration among product teams at Twitter; second, the feature creation pipeline for fraud detection at Paytm; and finally, sales enablement at Carbon Health via risk quantification.
Abstract. Not so long ago, I met with over 30 AI companies to learn of their workflows at the very first step in the evaluation process — that of data collection and transfer. I had a hunch this part of the pipeline posed challenges. In this talk, I review the myriad roadblocks faced by companies in providing access to their data. Then I discuss potential solutions.
Abstract. The first part of the talk is an overview of the Data Science team roadmap and infrastructure decisions, with a tour of the clinical decision support system and covidclinicaldata.org. The second part is a review of our efforts for the COVID-Ready program. We report on recommendations that can be made to employers, based on simulations surfacing how testing cadence and other policies affect outbreaks in the workplace.
Abstract. In this talk, we review concepts from the audio signal processing field. We then show how familiarity with these concepts allows for a better understanding of DJing tools and techniques, and vice versa.
Abstract. Traditional approaches to managing technical projects can be at odds with achieving success with machine learning. In this session, we discuss how ML and AI executives can build effective teams, support them with the right processes and tools, and shift the broader organizational culture in ways that reinforce innovation in machine learning.
Abstract. In this meetup, we hear about data science projects that succeeded in spite of the limitations of existing methodology.
Abstract. Hear from people that have experienced startups and large corporations in a range of industries reveal tips to work faster, more efficiently, and create an org-wide culture that supports effective ML.
Abstract. Meet women in data science from all over the Bay Area at this WiDS post-conference screening. The event will be an opportunity to meet like-minded women as well as listen to the great lineup of panelists.
Abstract. Join us for an engaging conversation with Pardis Noorzad, Founder and CEO of General Folders. Learn how she is revolutionizing B2B data collaboration and transforming the way businesses handle data logistics.
Abstract. The conversation includes the importance of data collaboration and sharing, the challenges and complexities of data sharing in various industries, the need for efficient and secure solutions, and the underlying definitions and dimensions of the data exchange problem—including infrastructure, security, economics, user needs, and more!
Abstract. Thanks to Grant, the episode has turned into a good review of my work history.
Abstract. Online social networks are ubiquitous graphs. To test algorithms that scale with the size and order of these networks, we require synthetic samples. In this talk, we go over several methods for generating random graphs representative of online social networks. We are especially interested in the M-GEOP model (Bonato et al., 2014), and in assessing the fit of these models to the Facebook dataset.
Abstract. Master's thesis defense slides.
Abstract. We propose sparse approximation weighted regression (SPARROW), a nonparametric method of regression that takes advantage of the sparse linear approximation of a query point. SPARROW employs weights based on sparse approximation in the context of locally constant, locally linear, and locally quadratic regression to generate better estimates than for e.g., k-nearest neighbor regression and more generally, kernel-weighted local polynomial regression. Our experimental results show that SPARROW performs competitively.
Abstract. Sparse coding is achieved by solving an under-determined system of linear equations under sparsity constraints. We briefly look at several algorithms that solve the resulting optimization problem (exactly or approximately). We then see how this optimization principle can be applied in both a supervised and unsupervised context: multiclass classification and feature learning, respectively. Next, we talk about dictionary learning and some of its well-known instances. Applications of dictionary learning include image denoising and inpainting.
Abstract. An image can be represented at different levels, starting from pixels, going on to edges, to parts, to objects, and beyond. Over the years, many attempts have been made at engineering useful descriptors that are able to extract low-to-high level features from images. But what if we could make this process automatic? What if we could "learn" to detect layer after layer of features of increasing abstraction and complexity? After all, it would be impossible for us to foresee and hard-code all the kinds of invariances necessary to build features for our ever more complicated tasks. In this talk, we go over several unsupervised feature learning methods that have been in the making since 2006.
Details. This is a brief tutorial on learning theory for a machine learning class.
Details. This is a brief tutorial on applying the EM algorithm for estimating the parameters of a Gaussian mixture model.
Abstract. Apart from raising computational costs, high-dimensional data behave in counterintuitive ways. In this seminar, we talk about why in some situations, more features fail to result in increased accuracy in clustering and classification tasks. To deal with the "curses of dimensionality", many dimensionality reduction (DR) methods have been proposed. These methods map the data points to a lower-dimensional space, while preserving the important properties of the data in its original space. We go over one linear and two nonlinear DR methods. Then, through some examples, we see how the prior assumptions and computational complexities of each method affects its application in reducing the dimensionality of certain datasets.
Details. This is an overview of the split Bregman method for solving an $\ell_1$-regularized problem arising from TV denoising.
Efficient classification based on sparse regression
MSc Thesis, Amirkabir University of Technology, July 2012.
📔 thesis, 📕 translation, 📽 slides
Regression with sparse approximations of data
with Bob L. Sturm
European Signal Processing Conference (EUSIPCO), 2012.
📃 paper, 📰 poster
On automatic music genre recognition by sparse representation classification using auditory temporal modulations
with Bob L. Sturm
Computer Music Modeling and Retrieval: Lecture Notes in Computer Sciences (LNCS). Springer, 2012.
📃 paper