Pardis Noorzad ☊ Pubs

Essays

Follow my blog or subscribe via RSS or email for updates.

The state of data exchange: A survey of data transfer and sharing methodology
April 3, 2023.
Public speaking best practices: Towards more engaging conference talks
February 22, 2023.
Interfaces and bundle boundaries: Categorizing design decisions in enterprise product development
August 31, 2022.
Highly effective data science teams: A list of seven habits
July 21, 2022.
Challenges in data sharing and transfer: Case I. Evaluating AI platforms
March 5, 2022.
Citations and coverage. Data Engineering Weekly #80 (Ananth Packkildurai), What I learned from attending Tecton’s apply(meetup) 2022 (James Le).
Batter plate discipline: It doesn’t pay to swing at every pitch
September 9, 2021.
(Data ∩ Water) Terms: All at sea, exploring the data dictionary
July 19, 2021.
Citations and coverage. Building Recommendation Systems in Python and JAX, Chapter 5.
Models for integrating data science teams within organizations: A comparative analysis
July 31, 2019.
Citations and coverage. University of Virgina Data Science, 97 Ways (Matt Wright), Beyond the POC: How to Make Machine Learning Real in the Enterprise (Sam Charrington), Projects to Know (Amplify Partners — Sarah Catanzaro), The Data Science Roundup (Fishtown Analytics — Tristan Handy), Normcore Tech (Vicki Boykis), Femstreet (Sarah Nöckel), Linear Digressions, Analytical IQ (Adam Lorton), Hex Blog (Hex — Barry McCardel), Full Stack Deep Learning, The ML Times, nibble dispatch, Hiring Data Scientists and Machine Learning Engineers: A Practical Guide (a book by Roy Keyes), Blog Cast (Sam Bail), dbt Blog (Erin Vaughan & Janessa Lantz), Building The Modern Data Team (Pedram Navid), Data Science Org Design for Startups (Nirant Kasliwal), On Search Leadership (Daniel Tunkelang), Building A Data Platform From Scratch At Collectors (Sam Bail), Modern Data Teams Hub (Amplify Partners — Emilie Schario).
Data science team sizing and allocation: An algorithm
July 29, 2019.
SF Engineering Leadership Community Summit 2019: Some notes from the gathering
January 28, 2019.
Management best practices: A list of 20 things
December 18, 2018.
Q&A with Steven Sinofsky at Twitter HQ: Developing cross-functional teams
November 16, 2018.

Talks

A new era in B2B data collaboration
MDS Fest 2.0, April 10, 2024.
📹 video
Abstract. Businesses collaborate through data — every contract includes a data sharing or transfer clause. However, data collaboration tools have a long way to go to serve modern enterprise needs. In this talk, we will discuss some of the macro trends and practices impacting products in the data collaboration space. Some of these topics remain open and evolving debates.
General Folders: The first AI-powered data logistics company
Demo Day: Techstars San Diego Powered by SDSU, December 7, 2023.
📰 coverage 1, 2
Abstract. Join us at Snapdragon Stadium for the first ever Techstars San Diego powered by San Diego State University Demo Day. Meet the incredible cohort of companies as they showcase their progress.
Cross-company data exchange for the cloud
Scale By the Bay: Code and Data in the Age of AI, November 15, 2023.
📹 video, 📰 coverage 1, 2
Abstract. Data exchange is integral to business collaboration. However, data exchange pipelines are time consuming to build, prone to leaks, difficult to monitor, and costly to audit. In this talk, we present an overview of the methods companies use to exchange data. We then discuss solutions that better match the efficiency and security standards of today.
Rethinking B2B data exchange and collaboration
Crunch Conference Budapest, October 6, 2023.
📹 video, 📰 coverage
Abstract. Data exchange is integral to business collaboration. However, data exchange pipelines are time consuming to build, prone to leaks, difficult to monitor, and costly to audit. In this talk, we present an overview of the methods companies use to exchange data. We then discuss solutions that better match the efficiency and security standards of today.
The state of cross-company data exchange
Data Council Austin, March 30, 2023.
📹 video, 📽 slides, 📃 blog post
Abstract. Data exchange is integral to every business relationship. Yet data exchange practices are highly manual, prone to leaks, difficult to validate, impossible to monitor, and costly to audit. In this talk, we present an overview of the methods enterprises use to exchange data and the outstanding challenges. We conclude by enumerating the properties of a good solution.
Making an impact with data
with Gorkem Yurtseven and Britt Allen, moderated by Elizabeth Dlha
Data Mash #2, June 2, 2022.
📽 slides
Abstract. After introducing General Folders, we'll review three impactful data projects. First, the design of OKRs to encourage collaboration among product teams at Twitter; second, the feature creation pipeline for fraud detection at Paytm; and finally, sales enablement at Carbon Health via risk quantification.
Data transfer challenges in evaluating AI platforms
apply(meetup), February 10, 2022.
📹 video, 📃 blog post
Abstract. Not so long ago, I met with over 30 AI companies to learn of their workflows at the very first step in the evaluation process — that of data collection and transfer. I had a hunch this part of the pipeline posed challenges. In this talk, I review the myriad roadblocks faced by companies in providing access to their data. Then I discuss potential solutions.
Data Science for tech-enabled healthcare
with Rebekkah Ismakov
The AI Summit, October 1, 2020.
📹 video, 📃 blog post, 📊 data, 🎙 discussion
Abstract. The first part of the talk is an overview of the Data Science team roadmap and infrastructure decisions, with a tour of the clinical decision support system and covidclinicaldata.org. The second part is a review of our efforts for the COVID-Ready program. We report on recommendations that can be made to employers, based on simulations surfacing how testing cadence and other policies affect outbreaks in the workplace.
DJing and the art of audio signal processing
Twitter HQ, Sept. 6, 2017.
Abstract. In this talk, we review concepts from the audio signal processing field. We then show how familiarity with these concepts allows for a better understanding of DJing tools and techniques, and vice versa.

Panels

Building teams and culture that support ML innovation
with Ziad Asghar and Ameen Kazerouni, moderated by Sam Charrington
TWIMLcon, January 22, 2021.
📹 video
Abstract. Traditional approaches to managing technical projects can be at odds with achieving success with machine learning. In this session, we discuss how ML and AI executives can build effective teams, support them with the right processes and tools, and shift the broader organizational culture in ways that reinforce innovation in machine learning.
Making an impact in data science: when traditional methods fail
with Eric Glover, Halim Abbas, Kevin Stumpf, and Sean McPherson
Branch HQ, February 27, 2020.
📹 video
Abstract. In this meetup, we hear about data science projects that succeeded in spite of the limitations of existing methodology.
Culture & organization for effective ML at scale
with Eric Colson and Jennifer Prendki, moderated by Maribel Lopez
TWIMLcon, Sep 27, 2019.

Abstract. Hear from people that have experienced startups and large corporations in a range of industries reveal tips to work faster, more efficiently, and create an org-wide culture that supports effective ML.
Women in Data Science meetup: Growing a career in data science
with Laura Pruitt, Alexandra Johnson, and Kasia Rachuta, moderated by Chloe Tseng
Airbnb HQ, March 8, 2018.

Abstract. Meet women in data science from all over the Bay Area at this WiDS post-conference screening. The event will be an opportunity to meet like-minded women as well as listen to the great lineup of panelists.

Podcasts

Pardis Noorzad of General Folders: Transforming B2B Data Collaboration
with Jake Villarreal
Born in Silicon Valley by Match Relevant, September 6, 2023.
🎙 podcast episode
Abstract. Join us for an engaging conversation with Pardis Noorzad, Founder and CEO of General Folders. Learn how she is revolutionizing B2B data collaboration and transforming the way businesses handle data logistics.
Making Cross-Company Data Exchange Easy
with Kostas Pardalis and Eric Dodds
The Data Stack Show, September 6, 2023.
🎙 podcast episode
Abstract. The conversation includes the importance of data collaboration and sharing, the challenges and complexities of data sharing in various industries, the need for efficient and secure solutions, and the underlying definitions and dimensions of the data exchange problem—including infrastructure, security, economics, user needs, and more!
Head of Data Science at Healthcare Tech #93
with Grant Ingersoll
Develomentor, June 29, 2020.
🎙 podcast episode
Abstract. Thanks to Grant, the episode has turned into a good review of my work history.

Academic talks

Modeling the Facebook social network: The memoryless GEO-P graph model
SOGMSC, May 21, 2014.
📽 slides
Abstract. Online social networks are ubiquitous graphs. To test algorithms that scale with the size and order of these networks, we require synthetic samples. In this talk, we go over several methods for generating random graphs representative of online social networks. We are especially interested in the M-GEOP model (Bonato et al., 2014), and in assessing the fit of these models to the Facebook dataset.
Efficient classification based on sparse regression
AUT, July 17, 2012.
📽 slides
Abstract. Master's thesis defense slides.
SPARROW: SPARse appROximation Weighted regression
UdeM, March 12, 2012 and SUT, February 22, 2012.
📽 slides
Abstract. We propose sparse approximation weighted regression (SPARROW), a nonparametric method of regression that takes advantage of the sparse linear approximation of a query point. SPARROW employs weights based on sparse approximation in the context of locally constant, locally linear, and locally quadratic regression to generate better estimates than for e.g., k-nearest neighbor regression and more generally, kernel-weighted local polynomial regression. Our experimental results show that SPARROW performs competitively.
Sparse coding and dictionary learning
SUT, October 5, 2011.
📽 slides
Abstract. Sparse coding is achieved by solving an under-determined system of linear equations under sparsity constraints. We briefly look at several algorithms that solve the resulting optimization problem (exactly or approximately). We then see how this optimization principle can be applied in both a supervised and unsupervised context: multiclass classification and feature learning, respectively. Next, we talk about dictionary learning and some of its well-known instances. Applications of dictionary learning include image denoising and inpainting.
Feature learning with deep networks for image classification
SUT, May 18, 2011.
📽 slides
Abstract. An image can be represented at different levels, starting from pixels, going on to edges, to parts, to objects, and beyond. Over the years, many attempts have been made at engineering useful descriptors that are able to extract low-to-high level features from images. But what if we could make this process automatic? What if we could "learn" to detect layer after layer of features of increasing abstraction and complexity? After all, it would be impossible for us to foresee and hard-code all the kinds of invariances necessary to build features for our ever more complicated tasks. In this talk, we go over several unsupervised feature learning methods that have been in the making since 2006.
Computational learning theory
AUT, April 26, 2011.
📽 slides
Details. This is a brief tutorial on learning theory for a machine learning class.
Parametric density estimation using GMMs
AUT, February 1, 2011.
📽 slides
Details. This is a brief tutorial on applying the EM algorithm for estimating the parameters of a Gaussian mixture model.
High dimensional data and dimensionality reduction
IPM, November 4, 2010.
📽 slides
Abstract. Apart from raising computational costs, high-dimensional data behave in counterintuitive ways. In this seminar, we talk about why in some situations, more features fail to result in increased accuracy in clustering and classification tasks. To deal with the "curses of dimensionality", many dimensionality reduction (DR) methods have been proposed. These methods map the data points to a lower-dimensional space, while preserving the important properties of the data in its original space. We go over one linear and two nonlinear DR methods. Then, through some examples, we see how the prior assumptions and computational complexities of each method affects its application in reducing the dimensionality of certain datasets.
The split Bregman method for total variation denoising
AUT, May 30, 2010.
📽 slides
Details. This is an overview of the split Bregman method for solving an $\ell_1$-regularized problem arising from TV denoising.

Publications

Efficient classification based on sparse regression
MSc Thesis, Amirkabir University of Technology, July 2012.
📔 thesis, 📕 translation, 📽 slides
Regression with sparse approximations of data
with Bob L. Sturm
European Signal Processing Conference (EUSIPCO), 2012.
📃 paper, 📰 poster
On automatic music genre recognition by sparse representation classification using auditory temporal modulations
with Bob L. Sturm
Computer Music Modeling and Retrieval: Lecture Notes in Computer Sciences (LNCS). Springer, 2012.
📃 paper

Bio Pubs Gigs Library

Essays

Talks

Panels

Podcasts

Academic talks

Publications