I am a product data scientist at Twitter, the service that enables users to inform and stay informed about what's happening.

Formerly, I was a data scientist at Paytm Labs, where I built data products for Paytm, a mobile recharge and payments platform. Before the Labs, I was a data scientist and early engineer at Rubikloud Technologies, a retail analytics company.

Prior to Rubikloud, I was a graduate student in Applied Mathematics at Ryerson University. At Ryerson, I worked with Prof. Anthony Bonato on the design of methods to assess the fit of random graph models to online social networks. Before Ryerson, I studied Artificial Intelligence at Amirkabir University of Technology and defended my thesis in July 2012. At Amirkabir, I was a member of the Image Processing and Pattern Recognition Lab led by Prof. Mohammad Rahmati. During 2011 to 2012, I had the great opportunity of working with Prof. Bob L. Sturm—then at Aalborg University Copenhagen—on a number of machine learning projects in feature learning and audio classification. I studied Software Engineering at University of Tehran and graduated in June 2009. [CV]


  • Efficient classification based on sparse regression [thesis, translation, slides]
    MSc Thesis, Department of CEIT, Amirkabir University of Technology, July 2012.
  • Regression with sparse approximations of data [paper, code, poster]
    with B. L. Sturm
    European Signal Processing Conference (EUSIPCO), 2012.
  • On automatic music genre recognition by sparse representation classification using auditory temporal modulations [paper, data+code, discussion, recognition]
    with B. L. Sturm
    Computer Music Modeling and Retrieval: Lecture Notes in Computer Sciences Series. Springer, 2012.


  • Efficient classification based on sparse regression
    AUT, July 17, 2012. [slides]

    Abstract. Master's thesis defense slides.

  • SPARROW: SPARse appROximation Weighted regression
    UdeM, March 12, 2012 and SUT, February 22, 2012. [slides]

    Abstract. We propose sparse approximation weighted regression (SPARROW), a nonparametric method of regression that takes advantage of the sparse linear approximation of a query point. SPARROW employs weights based on sparse approximation in the context of locally constant, locally linear, and locally quadratic regression to generate better estimates than for e.g., k-nearest neighbor regression and more generally, kernel-weighted local polynomial regression. Our experimental results show that SPARROW performs competitively.

  • Sparse coding and dictionary learning
    SUT, October 5, 2011. [slides]

    Abstract. Sparse coding is achieved by solving an under-determined system of linear equations under sparsity constraints. We briefly look at several algorithms that solve the resulting optimization problem (exactly or approximately). We then see how this optimization principle can be applied in both a supervised and unsupervised context: multiclass classification and feature learning, respectively. Next, we talk about dictionary learning and some of its well-known instances. Applications of dictionary learning include image denoising and inpainting.

  • Feature learning with deep networks for image classification
    SUT, May 18, 2011. [slides]

    Abstract. An image can be represented at different levels, starting from pixels, going on to edges, to parts, to objects, and beyond. Over the years, many attempts have been made at engineering useful descriptors that are able to extract low-to-high level features from images. But what if we could make this process automatic? What if we could "learn" to detect layer after layer of features of increasing abstraction and complexity? After all, it would be impossible for us to foresee and hard-code all the kinds of invariances necessary to build features for our ever more complicated tasks. In this talk, we go over several unsupervised feature learning methods that have been in the making since 2006.

  • Computational learning theory
    AUT, April 26, 2011. [slides]

    Details. This is a brief tutorial on learning theory for a machine learning class.

  • Parametric density estimation using GMMs
    AUT, February 1, 2011. [slides]

    Details. This is a brief tutorial on applying the EM algorithm for estimating the parameters of a Gaussian mixture model.

  • High dimensional data and dimensionality reduction
    IPM, November 4, 2010. [slides]

    Abstract. Apart from raising computational costs, high-dimensional data behave in counterintuitive ways. In this seminar, we talk about why in some situations, more features fail to result in increased accuracy in clustering and classification tasks. To deal with the "curses of dimensionality", many dimensionality reduction (DR) methods have been proposed. These methods map the data points to a lower-dimensional space, while preserving the important properties of the data in its original space. We go over one linear and two nonlinear DR methods. Then, through some examples, we see how the prior assumptions and computational complexities of each method affects its application in reducing the dimensionality of certain datasets.

  • The split Bregman method for total variation denoising
    AUT, May 30, 2010. [slides]

    Details. This is an overview of the split Bregman method for solving an $\ell_1$-regularized problem arising from TV denoising.