Talha Oz

Applied Scientist at Amazon.com (July 2022 - Present)
Staff Data Scientist at Betterworks (March 2021 - May 2022)
Computational Social Scientist at Humanyze (Late 2017 - Mar 2021)
Earned PhD in Computational Social Science from GMU (Late 2020)
Graduate Research Assistant in Center for Social Complexity (2016 - 2017)
Graduate Research Assistant in Machine Learning and Inference Lab (2012 - 2016)

Social: LinkedIn - GitHub - X (Twitter)

Bio. Talha got his PhD in Computational Social Science from George Mason University and before joining Amazon worked in two People Analytics companies as a Data Scientist. During his graduate studies, Talha published papers in venues like SBP-BRIMS (International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation) and worked in the Center for Social Complexity and Machine Learning and Inference Lab as a research assistant.
At Humanyze, Talha developed innovative solutions that empowered leaders to assess the effectiveness of communication and collaboration in their organizations. These solutions enabled leaders to enhance employee engagement, boost team productivity, and foster organizational agility. His algorithms were grounded in the principles of organizational behavior and social network analysis, utilizing metadata from workplace communication and collaboration platforms like calendars, Slack, and email -- see paper1, paper2, patent.
Talha joined BetterWorks as their inaugural Data Scientist, where he spearheaded the development of several machine learning models to enhance product features. One notable solution focused on analyzing feedback exchanged between employees, identifying the competencies mentioned within the feedback, and categorizing their context as either a strength or an opportunity. In another project, Talha built a sentiment analysis model that outperformed the existing solution, Google Sentiment Analysis API, by 13% in accurately interpreting the sentiment of free-text survey responses.

AWS. Joined Q Developer Code Transformation Team:

Developed an LLM-powered agentic system that transforms JAVA 8/11 projects to JAVA 17.
Multiple patents and publications submitted -- currently in review.

Amazon. Have completed several projects at Amazon:

Led to $31M+/yr increase in profits (as revealed by experiment results) by improving the main algorithm used in email marketing (productionized)
Calculated the cost of opting out from emails (and push notifications) using Double ML & propensity score matching; and integrated it into our marketing program valuation model (Spark Scala, AWS Step functions)
Built a neural network based push notification tap propensity model (soon to be deployed) (PyTorch, Sagemaker)
Mentored an Applied Scientist intern to build an email click propensity model (soon to be deployed)

Betterworks. I have worked on four projects at Betterworks (in chronological order):

"Can we identify good employees with our data. If so, how?" was the CPO's question that started this project. I conceptualized and developed a PoC for the insights module based on Google's Project Aristotle. I showed how Psychological Safety (the number one determinant of team effectiveness), and the other four factors can be modeled, modeled them, and shared the results.
“How can we show the impact of our product on employee performance with our data?” The second project was about justifying the benefits of the product empirically to support the customers' ROI. I formed five hypotheses (employees that use our product properly/often have better goal progress) and tested them on our data. Some of the tests were simple statistical tests and correlations but I also applied some causal inference techniques such as differences in differences.
“Given a feedback text, can you create a model that can identify the competencies of the feedback receiver and their contexts (strength/opportunity)?” I built a model that can identify the competencies in a feedback text with 89% precision and recall.
"Given a free-text survey response, find its topic and sentiment." My topic model (40-class classifier) had about 86% precision and 99% recall. And I came up with a 3-class sentiment model solution that beat the old solution by 14%.

Humanyze. I worked as the computational social scientist (data scientist) and led the research area at Humanyze (a startup that was born out of Sandy Pentland's Human Dynamics group at MIT Media Lab). The main problem I was trying to solve is how to model the indicators of engagement of employees, productivity of teams, and adaptability of organizations using the metadata (no content) of the workplace technologies such as calendar, e-mail, Slack, etc., and sensors (if available). The theories and methods on which my algorithms rely are primarily from the fields of organizational behavior and social network analysis. My responsibilities include writing papers, collaborating with researchers, and supervising interns. Some of my research output:

Exploring the Impact of Mandatory Remote Work during the COVID-19 Pandemic (Oz and Crooks, 2020). Using workplace communication metadata, I examined the heterogeneous effects of mandatory remote work. Presented this work at SBP-BRIMS.
Digital Trails of Work Stressors (Oz, 2020). This paper proposes a strategy with which stressor measurement becomes less disruptive and more cost effective. My solution oriented paper (Watts, 2017) has been accepted to SBP-BRIMS, and gives a sense of my work at Humanyze.
How Can Organizational Network Analysis (ONA) Help Improve Company Performance? My whitepaper featured by People Analytics @ Harvard.
The effects of temporal distance on communication patterns: We exploit a natural experiment – the annual change of clocks from Daylight Savings Time (DST) to measure this. With Tommy Fang (HBS), Jasmina Chauvin (Georgetown), and Raj Choudhury (HBS).
Modeling communication styles of leaders with a network embeddings model and analyzing them pre/post promotions. With Avi Goyal (Stanford CS), Amir Goldberg (GSB), and Sameer Srivastava (Haas).
Modeling temporal change in communication patterns using graphlets (motifs) and social sequence analysis. With Ryan Compton, PhD (now with c3.ai).

George Mason University. While studying towards my PhD in Computational Social Science with Andrew Crooks I was a graduate research assistant in the Center for Social Complexity (one year) and Machine Learning and Inference Lab (for five years). My work there includes:

My PhD Defense Presentation (2020): US$ 1 Trillion is the annual cost of lower productivity due to stress (WHO, 2019). To help solving this problem, in my PhD, I proposed and showcased a strategy leveraging communication (meta)data and computational techniques to identify behaviors leading to and resulting from collective stress.
Computational Social Science of Disasters: Opportunities and Challenges (2019): We extensively reviewed the roles of subfields of social sciences, crisis informatics, and computational social sciences in disaster research (260 citations!). By discussing opportunites and challenges, this paper justifiably invites (i) CSS researchers to work on disasters and (ii) traditional disaster research people to pay more attention to CSS.
Attribution of Blame and Responsibility - #FlintWaterCrisis (2018): Tested the generalizibility of a set of theories in the sociology of disasters using online communication data. Formed and tested our hypotheses on (i) who to blame, (ii) role of partisan predisposition, (iii) concerned geographies, and (iv) contagion of complaining. An earlier version of this work was presented at Social Web for Disaster Management (SWDM'16) workhop.
Generation of Realistic Mega-Cities (2017): As part of the project What Happens If a Nuclear Bomb Goes Off in Manhattan?, we propose a method for synthesizing populations and social networks for agent-based modeling. I created a synthetic population (of two NY counties), their social networks, and road networks to simulate the responses to a nuclear attack. (Jupyter Notebook).
Doomsayers of Pollyannas? News Sentiment and Public Gatekeeping on Twitter: Does the public tend to favor more positive news when retweeting or not? To investigate the power of the public to reshape the news flow, we conducted sentiment analyses of published news, tweeted news, and retweeted news from eight mainstream news organizations. (Jupyter Notebook)
Politicians Busted while Agenda-setting on Social Media: To what extent the Members of Congress (MCs) use social media for agenda building? That is, do they talk (tweet) on some topics but avoiding others? I created the co-commentation network of MCs of 113th Congress and detected two emergent communities in this network, which correspond to the two parties in the Congress. The group memberships overlapped by 95%+. I presented this work in PolNet (2015) workshop and CSSS summit (2015).
AirBnB++: Search Listings by Reputation and Description: AirBnB lacks the features of (i) filtering the search results by seller reputations and (ii) keyword search within listing contents. I developed a geo-web app to fix these issues. Please see the video demo towards the end of the notebook.
Twlets: Twitter→Excel: I think this Chrome Web Browser Extension is the most convenient way to download data from Twitter. With a single click you can download tweets, followers, etc. as an MS Excel file.