Talha got his PhD in Computational Social Science from George Mason University and has worked on two People Analytics companies as a Data Scientist. During his graduate studies, he worked in the Center for Social Complexity and Machine Learning and Inference Lab, and has published papers in venues like SBP-BRIMS (International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation).
At Humanyze, an MIT Media Lab spinoff, Talha designed new systems, algorithms, and metrics to reveal how work gets done in teams and organizations, using face-to-face and digital communication metadata collected from wearable sensors, IoT devices, as well as from workplace communication and collaboration tools such as calendar, Skype, and email. The theories and methods on which his algorithms rely were primarily from the fields of organizational behavior and social network analysis.
Talha joined BetterWorks as their first data scientist and developed several machine learning models to be used in the product. His competency prediction model accepts a feedback text and identifies the competencies discussed in the text as well as their contexts (whether the competency is a strength of or opportunity for the feedback receiver). In another project Talha developed a solution to infer the topics and sentiments of free-text survey responses.
I have completed several projects at Amazon:
- Led to $31M+/yr increase in profits (as revealed by experiment results) by improving the main algorithm used in email marketing (productionized)
- Calculated the cost of opting out from emails (and push notifications) using Double ML & propensity score matching; and integrated it into our marketing program valuation model (Spark Scala, AWS Step functions)
- Built NN-based push tap propensity model that outperformed & replaced the production model (PyTorch, Sagemaker)
- Mentored an Applied Scientist intern to build a click propensity model (productionized)
I have worked on four projects at Betterworks (in chronological order):
"Can we identify good employees with our data. If so, how?" was the CPO's question that started this project. I conceptualized and developed a PoC for the insights module based on Google's Project Aristotle. I showed how Psychological Safety (the number one determinant of team effectiveness), and the other four factors can be modeled, modeled them, and shared the results.
“How can we show the impact of our product on employee performance with our data?” The second project was about justifying the benefits of the product empirically to support the customers' ROI. I formed five hypotheses (employees that use our product properly/often have better goal progress) and tested them on our data. Some of the tests were simple statistical tests and correlations but I also applied some causal inference techniques such as differences in differences.
“Given a feedback text, can you create a model that can identify the competencies of the feedback receiver and their contexts (strength/opportunity)?” I built a model that can identify the competencies in a feedback text with 89% precision and recall.
"Given a free-text survey response, find its topic and sentiment." My topic model (40-class classifier) had about 86% precision and 99% recall. And I came up with a 3-class sentiment model solution that beat the old solution by 14%.
At Humanyze. I worked as the computational social scientist (data scientist) and led the research area at Humanyze (a startup that was born out of Sandy Pentland's Human Dynamics group at MIT Media Lab). The main problem I was trying to solve is
how to model the indicators of engagement of employees, productivity of teams, and adaptability of organizations using the metadata (no content) of the workplace technologies such as calendar, e-mail, Slack, etc., and sensors (if available). The theories and methods on which my algorithms rely are primarily from the fields of organizational behavior and social network analysis. My responsibilities include writing papers, collaborating with researchers, and supervising interns. Some of my research output:
- Exploring the Impact of Mandatory Remote Work during the COVID-19 Pandemic (Oz and Crooks, 2020). Using workplace communication metadata, I examined the heterogeneous effects of mandatory remote work. Presented this work at SBP-BRIMS.
- Digital Trails of Work Stressors (Oz, 2020). This paper proposes a strategy with which stressor measurement becomes less disruptive and more cost effective. My solution oriented paper (Watts, 2017) has been accepted to SBP-BRIMS, and gives a sense of my work at Humanyze.
How Can Organizational Network Analysis (ONA) Help Improve Company Performance? My whitepaper featured by People Analytics @ Harvard.
- The effects of temporal distance on communication patterns: We exploit a natural experiment – the annual change of clocks from Daylight Savings Time (DST) to measure this. With Tommy Fang (HBS), Jasmina Chauvin (Georgetown), and Raj Choudhury (HBS).
- Modeling communication styles of leaders with a network embeddings model and analyzing them pre/post promotions. With Avi Goyal (Stanford CS), Amir Goldberg (GSB), and Sameer Srivastava (Haas).
Modeling temporal change in communication patterns using graphlets (motifs) and social sequence analysis. With Ryan Compton, PhD (now with c3.ai).
At George Mason University. While studying towards my PhD in Computational Social Science with Andrew Crooks I was a graduate research assistant in the Center for Social Complexity (one year) and Machine Learning and Inference Lab (for five years). My work there includes:
My PhD Defense Presentation (2020): US$ 1 Trillion is the annual cost of lower productivity due to stress (WHO, 2019). To help solving this problem, in my PhD, I proposed and showcased a strategy leveraging communication (meta)data and computational techniques to identify behaviors leading to and resulting from collective stress.
- Computational Social Science of Disasters: Opportunities and Challenges (2019): We extensively reviewed the roles of subfields of social sciences, crisis informatics, and computational social sciences in disaster research (260 citations!). By discussing opportunites and challenges, this paper justifiably invites (i) CSS researchers to work on disasters and (ii) traditional disaster research people to pay more attention to CSS.
- Attribution of Blame and Responsibility - #FlintWaterCrisis (2018): Tested the generalizibility of a set of theories in the sociology of disasters using online communication data. Formed and tested our hypotheses on (i) who to blame, (ii) role of partisan predisposition, (iii) concerned geographies, and (iv) contagion of complaining. An earlier version of this work was presented at Social Web for Disaster Management (SWDM'16) workhop.
- Generation of Realistic Mega-Cities (2017): As part of the project What Happens If a Nuclear Bomb Goes Off in Manhattan?, we propose a method for synthesizing populations and social networks for agent-based modeling. I created a synthetic population (of two NY counties), their social networks, and road networks to simulate the responses to a nuclear attack. (Jupyter Notebook).
- Doomsayers of Pollyannas? News Sentiment and Public Gatekeeping on Twitter: Does the public tend to favor more positive news when retweeting or not? To investigate the power of the public to reshape the news flow, we conducted sentiment analyses of published news, tweeted news, and retweeted news from eight mainstream news organizations. (Jupyter Notebook)
- Politicians Busted while Agenda-setting on Social Media: To what extent the Members of Congress (MCs) use social media for agenda building? That is, do they talk (tweet) on some topics but avoiding others? I created the co-commentation network of MCs of 113th Congress and detected two emergent communities in this network, which correspond to the two parties in the Congress. The group memberships overlapped by 95%+. I presented this work in PolNet (2015) workshop and CSSS summit (2015).
- AirBnB++: Search Listings by Reputation and Description: AirBnB lacks the features of (i) filtering the search results by seller reputations and (ii) keyword search within listing contents. I developed a geo-web app to fix these issues. Please see the video demo towards the end of the notebook.
- Twlets: Twitter→Excel: I think this Chrome Web Browser Extension is the most convenient way to download data from Twitter. With a single click you can download tweets, followers, etc. as an MS Excel file.