Applied Scientist at Amazon.com (July 2022 - Present)
Staff Data Scientist at Betterworks (March 2021 - May 2022)
Computational Social Scientist at Humanyze (Late 2017 - Mar 2021)
Earned PhD in Computational Social Science from GMU (Late 2020)
Graduate Research Assistant in Center for Social Complexity (2016 - 2017)
Graduate Research Assistant in Machine Learning and Inference Lab (2012 - 2016)

toz <at> gmu <dot> edu
tozCSS

Bio. Talha got his PhD in Computational Social Science from George Mason University and has worked on two People Analytics companies as a Data Scientist. During his graduate studies, he worked in the Center for Social Complexity and Machine Learning and Inference Lab, and has published papers in venues like SBP-BRIMS (International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation).
At Humanyze, an MIT Media Lab spinoff, Talha designed new systems, algorithms, and metrics to reveal how work gets done in teams and organizations, using face-to-face and digital communication metadata collected from wearable sensors, IoT devices, as well as from workplace communication and collaboration tools such as calendar, Skype, and email. The theories and methods on which his algorithms rely were primarily from the fields of organizational behavior and social network analysis.
Talha joined BetterWorks as their first data scientist and developed several machine learning models to be used in the product. His competency prediction model accepts a feedback text and identifies the competencies discussed in the text as well as their contexts (whether the competency is a strength of or opportunity for the feedback receiver). In another project Talha developed a solution to infer the topics and sentiments of free-text survey responses.

At Amazon. Have completed several projects at Amazon:

  • Led to $31M+/yr increase in profits (as revealed by experiment results) by improving the main algorithm used in email marketing (productionized)
  • Calculated the cost of opting out from emails (and push notifications) using Double ML & propensity score matching; and integrated it into our marketing program valuation model (Spark Scala, AWS Step functions)
  • Built a neural network based push notification tap propensity model (soon to be deployed) (PyTorch, Sagemaker)
  • Mentored an Applied Scientist intern to build an email click propensity model (soon to be deployed)
At Betterworks. I have worked on four projects at Betterworks (in chronological order):
  • "Can we identify good employees with our data. If so, how?" was the CPO's question that started this project. I conceptualized and developed a PoC for the insights module based on Google's Project Aristotle. I showed how Psychological Safety (the number one determinant of team effectiveness), and the other four factors can be modeled, modeled them, and shared the results.
  • “How can we show the impact of our product on employee performance with our data?” The second project was about justifying the benefits of the product empirically to support the customers' ROI. I formed five hypotheses (employees that use our product properly/often have better goal progress) and tested them on our data. Some of the tests were simple statistical tests and correlations but I also applied some causal inference techniques such as differences in differences.
  • “Given a feedback text, can you create a model that can identify the competencies of the feedback receiver and their contexts (strength/opportunity)?” I built a model that can identify the competencies in a feedback text with 89% precision and recall.
  • "Given a free-text survey response, find its topic and sentiment." My topic model (40-class classifier) had about 86% precision and 99% recall. And I came up with a 3-class sentiment model solution that beat the old solution by 14%.

At Humanyze. I worked as the computational social scientist (data scientist) and led the research area at Humanyze (a startup that was born out of Sandy Pentland's Human Dynamics group at MIT Media Lab). The main problem I was trying to solve is how to model the indicators of engagement of employees, productivity of teams, and adaptability of organizations using the metadata (no content) of the workplace technologies such as calendar, e-mail, Slack, etc., and sensors (if available). The theories and methods on which my algorithms rely are primarily from the fields of organizational behavior and social network analysis. My responsibilities include writing papers, collaborating with researchers, and supervising interns. Some of my research output:

At George Mason University. While studying towards my PhD in Computational Social Science with Andrew Crooks I was a graduate research assistant in the Center for Social Complexity (one year) and Machine Learning and Inference Lab (for five years). My work there includes:

  • My PhD Defense Presentation (2020): US$ 1 Trillion is the annual cost of lower productivity due to stress (WHO, 2019). To help solving this problem, in my PhD, I proposed and showcased a strategy leveraging communication (meta)data and computational techniques to identify behaviors leading to and resulting from collective stress.
  • Computational Social Science of Disasters: Opportunities and Challenges (2019): We extensively reviewed the roles of subfields of social sciences, crisis informatics, and computational social sciences in disaster research (260 citations!). By discussing opportunites and challenges, this paper justifiably invites (i) CSS researchers to work on disasters and (ii) traditional disaster research people to pay more attention to CSS.
  • Attribution of Blame and Responsibility - #FlintWaterCrisis (2018): Tested the generalizibility of a set of theories in the sociology of disasters using online communication data. Formed and tested our hypotheses on (i) who to blame, (ii) role of partisan predisposition, (iii) concerned geographies, and (iv) contagion of complaining. An earlier version of this work was presented at Social Web for Disaster Management (SWDM'16) workhop.
  • Generation of Realistic Mega-Cities (2017): As part of the project What Happens If a Nuclear Bomb Goes Off in Manhattan?, we propose a method for synthesizing populations and social networks for agent-based modeling. I created a synthetic population (of two NY counties), their social networks, and road networks to simulate the responses to a nuclear attack. (Jupyter Notebook).
  • Doomsayers of Pollyannas? News Sentiment and Public Gatekeeping on Twitter: Does the public tend to favor more positive news when retweeting or not? To investigate the power of the public to reshape the news flow, we conducted sentiment analyses of published news, tweeted news, and retweeted news from eight mainstream news organizations. (Jupyter Notebook)
  • Politicians Busted while Agenda-setting on Social Media: To what extent the Members of Congress (MCs) use social media for agenda building? That is, do they talk (tweet) on some topics but avoiding others? I created the co-commentation network of MCs of 113th Congress and detected two emergent communities in this network, which correspond to the two parties in the Congress. The group memberships overlapped by 95%+. I presented this work in PolNet (2015) workshop and CSSS summit (2015).
  • AirBnB++: Search Listings by Reputation and Description: AirBnB lacks the features of (i) filtering the search results by seller reputations and (ii) keyword search within listing contents. I developed a geo-web app to fix these issues. Please see the video demo towards the end of the notebook.
  • Twlets: Twitter→Excel: I think this Chrome Web Browser Extension is the most convenient way to download data from Twitter. With a single click you can download tweets, followers, etc. as an MS Excel file.
Collective Stress in the Digital Age - My PhD defense presentation