Summary
Overview
Work History
Education
Skills
Projects
Certification
Timeline
Generic

Crystal(Tianhui) Hu

NLP Research Scientist/ AI Engineer
Tokyo

Summary

Grew up in Australia & Aussie citizen, results-driven AI and data scientist with over six years of experience specializing in credit risk, fraud detection, and RAG in natural language processing. Global perspective enriched by five professional years in Sydney with local English and two years in Tokyo, collaborating directly with Japanese stakeholders in their native language. Expertise includes building predictive models and engineering advanced NLP solutions, highlighted by the development of a patented RAG-based 'Evidence Tracker' that effectively eliminates LLM hallucinations to ensure reliable information delivery. Thrives in cross-functional environments with a strong commitment to aligning technical solutions with business objectives, leveraging a comprehensive, user-focused mindset to shape product direction and enhance tool robustness through innovative NLP methods.

Overview

6
6
years of professional experience
2020
2020
years of post-secondary education
6
6
Certifications
4
4
Languages

Work History

NLP Research Engineer

Moneyforward
06.2025 - Current
  • Developed a patented 'Evidence Tracker' tool based on Corrective Retrieval-Augmented Generation (CRAG) to eliminate hallucinations from large language models (LLMs), supporting the customer support team in finding accurate answers. This involved building and optimizing deep learning-based embeddings for document retrieval.
  • Identifying promising research topics from the latest studies in reducing hallucination in critical high risk field like finance , law , tax.
  • Evaluated the new system's performance, and performed fine-tuning to optimize results by self-designed evaluation metrics and industry benchmarks.
  • Core algorithm development and UI development via the Streamlit interface.

Research Scientist in Credit Risk

Money Forward
06.2024 - Current
  • Conducted research on credit limit determination for factoring services, focusing on improving accuracy through statistical analysis and machine learning.
  • Investigated cutting-edge credit control methods within B2B lending sector, facilitating long-term strategy formulation for Risk Department
  • Scrutinized large datasets, highlighting essential variables like lead desired amount and purchased amount, and advocated for new methodologies in setting credit limits.
  • Coordinated with BFW (BizForward) to integrate research goals with business priorities, articulating and demonstrating findings to stakeholders.
  • Evaluated limitations of present credit limit estimation approaches and devised pre-screening measures to inform users of lending limits ahead of application evaluation
  • Facilitated strategic planning for entire research project

Data Scientist

Commonwealth Bank of Australia
12.2021 - 10.2023
  • Developed and deployed advanced credit risk assessment models (PD model), applying statistical techniques and advanced analytics to drive data-driven decisions.
  • Conducted comprehensive data pre-processing, ensuring data quality by handling unbalanced datasets, missing value imputation, and one-hot encoding. Built and deployed ETL pipelines in Azure/AWS for data cleaning, transformation, and model training.
  • Design new metrics in home loan-related parameters and requirements, and forge robust data science roadmaps.
  • Gathered reporting requirements about construction loan, developed documentation, and created reports using Tableau and Power BI.
  • Assisted with A/B testing , casual interfence to improve product effectiveness and identify target segments for product launches.
  • Developed predictive models using machine learning algorithms to optimize product offerings.
  • Collaborated with cross-functional teams to integrate data analytics into business strategies.
  • Mentored junior data scientists in best practices for data analysis and model development.

Data Analyst and Developer

Hub 24
02.2021 - 12.2021
  • Automated manual reporting processes using reporting services and tools such as Microsoft Report Builder and SSRS Service, streamlining data analysis and insights generation.Generated actionable business insights through data modeling techniques.
  • Designed and deployed reporting solutions using SSRS Service, providing valuable insights to assist in decision making.Supported ad-hoc data requests through SQL queries, leveraging proficiency in database management and hands-on experience with GCP and AWS.
  • Led database redesign initiatives, ensuring stable staging and implementing data warehousing technologies. Successfully managed data migration from legacy systems to new platforms and designed ETL pipelines.
  • Employed data visualization tools like Yellowfin, Tableau, and Data Studio, Salesforce to create visually appealing and informative dashboards and reports.

Software (Web) Developer

Hub 24
06.2020 - 02.2021
  • Demonstrated expertise in modern and responsive web UI design, implementing and optimizing user interfaces.
  • Leveraged deep knowledge and research results of jQuery-UI, jQuery DataTable, Bootstrap, Tabulator, Cleave.js, and other relevant technologies.
  • Conducted front-end UI performance optimization, user experience improvement testing, and security testing.
  • Collaborated with the team to design feature solutions, ensuring seamless integration and optimal functionality.
  • Developed responsive web applications using HTML, CSS, and JavaScript frameworks.
  • Led back-end data structure design, data flow design, API development, testing, and database operations.

Web Developer and IT Tutor

Navigator Union
06.2019 - 06.2020
  • Tutoring IT courses / Test performance of company webpage / Design and optimize webpage UI

Education

MASTER OF INFORMATION TECHNOLOGY - SOFTWARE ENGINEERING AND DATA ANALYTIC AND MANAGEMENT

The University of Sydney

Master of Science - Computer Science

The University of Sydney
Sydney Australia

Skills

Reactjs development

Projects

Kanji Generation with Stable Diffusion (NLP Research Project, Aug 2025 - Present)
Goal: The goal o f this project was to train a stable diffusion model t o generate novel Japanese Kanji characters from English definitions, reproducing the viral experiment that demonstrated Al's ability t o "hallucinate" new cultural symbols for modern concepts like "YouTube", "Gundam", and "Elon Musk" that don't have existing Kanji representations.

  • Data engineering and dataset creation: built a comprehensive dataset o f 6,410 Kanji characters by parsing KANJIDIC2 XML files t o extract English meanings and KanjiVG SVG files for stroke data, converted vector SVG drawings to 128x128 pixel images with pure black strokes (#000000) o n white backgrounds, ensuring no stroke order numbers were rendered, created complete metadata mapping with 16,692 English meanings (average
    2.6 per Kanji) and proper text prompts for training, implemented a quality control pipeline t o validate image consistency and remove artifacts
  • Generation and results: Developed an advanced concept generation system capable of creating Kanji for modern concents like "¡Phone " "Bitcoin " "Netflix " "Tesla. " "Instagram " "COVID-19." and "artificial intelligence"

Backprop: NEAT implementation for neural architecture search (Kaggle project, 2025 Jun -Aug)
Goal: developed a comprehensive Backprop NEAT system combining evolutionary architecture search with gradient-based weight optimization t o solve 2D classification tasks and game environments using JAX for high-performance computing
Key achievements:
• Implemented complete NEAT algorithm with JAX optimization for simultaneous evolution o f network topologies and gradient-based weight training for 2D classification tasks (Circle, Spiral, XOR) and SlimeVolley game environment.
• Designed advanced network initialization strategies including multi-layer seed networks and domain-specific expert networks with specialized hidden nodes for different input types,
• Developed curriculum learning with 5 progressive difficulty levels, mixed opponent training against 5 Aypes, and targeted sub-skill training for 7 specific abilities, implemented behavioral diversity tracking and novelty search to prevent strategy convergence and encourage exploration of diferent network architectures, built comprehensive network evolution analysis tools with complexity tracking, structural
innovation metrics, and multi-dimensional visualization
Technologies used: Python, JAX, NumPy, Matplotlib, EvoJAX, NEAT algorithm, neural architecture search, evolutionary computation, gradient descent, GPU computing


Loan Prediction (Commonwealth Bank, Feb. 2022 – Apr. 2022)

Goal: The goal of the loan prediction model was to accurately predict whether a loan would be successfully approved or not based on the selected metrics. By utilizing feature selection techniques and building both Random Forest and logistic regression models, we aimed to achieve high prediction accuracy and validate the performance of the models.

  • For feature selection, selected the most meaningful metrics according to business scope and end goal: LoanId, gender, marriage status, dependents, education, applicantIncome, coapplicantIncome, LoanAmount, LoanAmountTerm, credithistory.
  • Filled missing values using mode or mean value, and split the data into 70% training data and 30% testing data.
  • Built a Random Forest Model and fit it to the dataset, achieving an accuracy of 77.2% which is fast and simple enough to predict if a loan would be successfully approved or not.
  • Built a logistic regression model and fit it to the dataset, achieving an accuracy of 82.2% to compare the prediction accuracy and validation score.


Probability of default (PD) model development (Commonwealth Bank, Apr 2023 - Sep 2023)
Goal: developed a probability o f default (PD) model t o estimate the likelihood o f default for credit card holders based on historical data and relevant features
• Developed a probability o f default (PD) model t o estimate the likelihood o f default for credit card holders based on historical data and relevant features, such as customer demographics, credit history, and financial Indicators
• Conducted exploratory data analysis to understand the underlying patterns and relationships i n the dataset
• Preprocessed and cleaned the data by handling missing values, outlier detection, and feature engineering
• Selected appropriate machine learning algorithms, including logistic regression, decision trees, and gradient boosting, to build and train the PD model
• Evaluated the model's performance using various metrics, such as accuracy, precision, recall, and F 1 score and fine-tuned the model t o optimize its predictive power
• Incorporated model interpretability techniques, such as feature importance analysis and partial dependence plots, to gain insights into the factors driving the default probability
• Collaborated with stakeholders, including risk management teams, and business analysts, t o validate and refine the PD model's performance, and ensure its alignment with business objectives

Certification

Google Analytic Certificate, GOOGLE

Timeline

NLP Research Engineer

Moneyforward
06.2025 - Current

Research Scientist in Credit Risk

Money Forward
06.2024 - Current

Data Scientist

Commonwealth Bank of Australia
12.2021 - 10.2023

Data Analyst and Developer

Hub 24
02.2021 - 12.2021

Software (Web) Developer

Hub 24
06.2020 - 02.2021

Web Developer and IT Tutor

Navigator Union
06.2019 - 06.2020

Master of Science - Computer Science

The University of Sydney

MASTER OF INFORMATION TECHNOLOGY - SOFTWARE ENGINEERING AND DATA ANALYTIC AND MANAGEMENT

The University of Sydney
Crystal(Tianhui) HuNLP Research Scientist/ AI Engineer