We’re seeking an Applied Research & Data Scientist who’s deeply curious about how people learn.
You’ll lead research and modeling efforts across handwriting and voice datasets, combining regression analysis, causal inference, and machine learning to explore learning behaviors, uncover cognitive patterns, and build predictive models.
This role sits at the intersection of data science, behavioral research, and applied AI — where statistical rigor meets creative experimentation. You’ll collaborate with product and pedagogy teams to transform data into insights that meaningfully enhance real-world learning outcomes.
Your mission: pioneer how AI understands human learning. By analyzing handwriting strokes, speech tone, and timing, you’ll uncover how students think, feel, and learn — turning raw, multimodal data into personalized insights that guide each learner’s next best step.
Key Responsibilities
Analytical Research & Modeling
-
Conduct exploratory data analysis (EDA) on handwriting and voice datasets to identify behavioral patterns and anomalies.
-
Build and interpret regression models (linear, logistic, hierarchical, LASSO, Ridge) to isolate key performance and engagement drivers.
-
Use multivariate and non-linear regression to explore relationships between behavioral variables (e.g., writing acceleration vs. tone modulation).
-
Apply causal inference methods (backdoor criterion, DAGs, propensity scoring, mediation analysis) to uncover true cause–effect relationships.
-
Engineer features from raw data — such as hesitation indices, cognitive delay markers, or pitch variability — to enrich models.
-
Employ dimensionality reduction (PCA, UMAP, t-SNE) and unsupervised learning (clustering, mixture models) to discover latent learning traits.
-
Quantify uncertainty, confidence intervals, and perform model diagnostics (VIF, residual analysis, cross-validation).
Machine Learning & AI Integration
-
Develop predictive models to forecast engagement, confidence, and completion speed.
-
Experiment with speech emotion recognition, sequence models, and multimodal fusion networks combining handwriting and audio data.
-
Collaborate with engineers to build LLM-driven insight layers that summarize and explain behavioral findings.
-
Use NLP and embedding models to analyze transcribed speech and open-ended responses for affective and cognitive insights.
Research & Hypothesis Testing
-
Design and execute experiments, quasi-experiments, or A/B tests to validate research hypotheses.
-
Apply statistical testing (ANOVA, chi-square, t-tests, permutation testing) to verify observed effects.
-
Use causal reasoning to determine variables most strongly influencing learning efficiency.
-
Maintain reproducible research pipelines (Jupyter, MLflow, Weights & Biases) with proper version control and documentation.
Data Infrastructure & Visualization
-
Collaborate with engineers to maintain clean, reliable multimodal data pipelines.
-
Optimize ETL workflows for handwriting and audio data ingestion.
-
Create interactive dashboards and visualizations (Streamlit, Plotly, Tableau) to communicate insights effectively.
-
Document datasets, assumptions, and model findings in both technical and narrative formats.
Collaboration & Communication
-
Partner with educators and product managers to interpret insights within the learning context.
-
Translate analytical outputs into actionable recommendations for curriculum and product design.
-
Present complex findings in clear, visual, and accessible formats for non-technical stakeholders.
-
Provide leadership with data-driven metrics to shape AI models, pedagogy strategies, and product direction.
Qualifications
Must-Have
-
Bachelor’s or Master’s degree in Data Science, Statistics, Machine Learning, AI, Cognitive Science, or a related quantitative field.
-
Proven experience in regression modeling (linear, logistic, hierarchical, regularized) and causal inference.
-
Proficiency in Python (pandas, NumPy, scikit-learn, statsmodels, PyTorch/TensorFlow, causalml, DoWhy).
-
Strong grasp of statistical hypothesis testing, experimental design, and model diagnostics.
-
Experience with feature engineering, dimensionality reduction, and unsupervised learning.
-
Excellent analytical reasoning, hypothesis formulation, and data storytelling skills.
-
High proficiency in English for research communication and cross-functional collaboration.
Good-to-Have
-
Exposure to LLMs, prompt engineering, or multimodal representation learning.
-
Familiarity with Bayesian modeling, causal ML, or hierarchical models.
-
Background in educational data mining, learning analytics, or human performance modeling.
-
Experience integrating analytical models into AI-driven dashboards or feedback systems.
-
Hands-on experience with speech signal processing (librosa, OpenSMILE) or handwriting trajectory modeling (CNN/RNN-based stroke analysis).