Broadly, I'm an interdisciplinary data scientist with special interests in psycholinguistics, computational linguistics, statistics, and Japanese. More specifically, I apply statistical and natural language processing techniques to solve problems and enrich my and my team's understanding of whatever data is at hand. I bring a strong statistical background to my work, whether in the form of data handling, ad-hoc statistical analyses, or machine learning applications.
-
Programming (R, Python, SQL, Ruby, Julia, Go)
-
Data analysis (MiniTab, SPSS, Tableau, KNIME, Google Data Studio, Qualtrics, Excel, Access, Power BI)
-
Research (Research design, data visualization, scientific writing)
-
Statistics (Bayesian & frequentist approaches, multiple regression, ANOVA, SEM/MLM)
-
Language (Advanced Japanese, language instruction, natural language processing)
Take a look at my Resumes & CV
- particles, contextual particle frequency in written Japanese, taking a swing at the age old question of は vs が
- thesis, the scripts and analyses from my graduate thesis work, "Investigating Emotion-label and Emotion-laden Words in a Semantic Satiation Paradigm"
- aozora_corpus, a compilation of Japanese texts pulled from 青空文庫, also available on kaggle
- embs, a project to provide tools streamlining sentence embedding or clustering techniques
- siftr, a Shiny app using SIF sentence embeddings to separate out unwanted text data
- priors, an experiment combining pretrained and bag of words embeddings to incorporate prior semantic knowledge
- bowts, an experiment combining pretrained and bag of words embedding approaches for embedding vector space manipulation
- iterate, iterative clustering for sklearn clusterers
- topics, experiments and utilities for text topic extraction using decision trees
- simsort, sorting texts by semantic similarity
- nlt, representing tabular data in natural language
Some stats/NLP/dataviz side work I've done, partially of personal interest, partially to learn different data techniques, and partially to serve as a quick reference in my day to day work:
- aozora_annotator, text annotator for Aozora Bunko corpus texts
- probs, bayesian modeling quick reference for pymc, bambi, rstan, and rstanarm
- dists, simple reference and tools for working with probability distributions
- trendsim, simulating social media traffic for Japanese authors using Markov Chains, MongoDB, and Kafka
- genji, character networks in The Tale of Genji
- shrimp, a bayesian time series analysis of some very specific tweets
- radicals, some experiments with embedding kanji in vector spaces based on radical composition, readings, and meanings
- hanakotoba, a project looking at the use of 花言葉 in literature
- yoji, applying neural networks to generate novel 四字熟語 idioms
- ebook_tokenizer, a command to add spaces between Japanese words in eBooks to work with Kindle WordWise
- kyoto, an exploration of restaurants around train/subway stations in Kyoto
- manyogana, an application to translate Japanese text to a modern implementation of manyogana, as well as converting arabic numerals to kansuuji
- movies, a dataviz/exploration dashboard for the 10,000 movies dataset
- michelin, exploration of Michelin star restaurants
- tea_temps, a quick dataviz and reference for getting a good cup of tea
If you're in need of tutoring or consultation for any of the following topics, please get in touch! I've worked with students ranging from high school to PhD level, both in person and digitally.
-
Statistics
-
Psychology
-
Japanese
-
Linguistics
-
Natural Language Processing
-
R, Python, Julia, SQL, Ruby
-
Machine Learning