Physics Research

My past data science experience in physics research includes searching for the most suitable astronomical data to answer scientific questions, implementing astronomical data and simulation code on remote Linux cluster with clean format (data implementation and cleaning), developing and applying statistical methods to large data sets (data modeling and statistical analysis), designing and running monte-carlo simulations to test causal relationships (data experiment design), visualization and interpreting results of those simulations (data visualization and interpretation), and asking scientific questions from data results (data insights).

Constructing Higher Order Statistics with Seperate Universe Simulations

  • Read scientific statement for the project here
  • Investigated various simulation codes in search for the most suitable program to answer scientific questions.
  • Configured the simulation codes on remote Linux cluster. Ran a large set of simulations on cluster with varying parameters. Streamlined the simulation process using shell scripting.
  • Developed numerical analysis programs in python for calculating the higher order statistics of structure formation from simulation results.
  • Read my report here

Differentiating f(R) Gravity from General Relativity Cosmology Using Cosmic Velocity Field

  • Read scientific statement for the project here
  • Collaborated remotely with researchers at Durham University (UK) who possessed a large set of cosmological simulation data with both Modified Gravity (MG) Cosmology and Dark Energy - Cold Dark Matter (LCDM) Cosmology.
  • Crafted statistical algorithm in python for particle interpolation (cloud-in-cell and Nearest-Neighbor algorithm), velocity field decomposition (fast-fourier-transform and Particle-Mesh method), and matter \& velocity power spectra calculation. Compared the differences in power spectra between MG and LCDM Cosmology.
  • Wrote a report on my scientific findings. Gave an award winning presentation on my research in Midstates Consortium for Math and Science.

Forecasting Weak Lensing Observation for LSST

How do we measure the matter and structure of the universe? Imagine pilling an onion layer by layer: we can do the same to the universe! This process is called tomographic lensing. But how do we make sure we pill the onion in a way that's always giving us new information about the it? That's what I've worked on!

  • Read scientific statement for the project here
  • Built a robust Python package to forecast weak lensing observational schedule for LSST (Legacy survey for space and time)
  • Developed optimazation algorithm to maximize new information obtained for trial weak lensing survey runs using Bayesian statistics and fisher information matrices
  • Automated the iterative process to find optimal observational schedule based on all previously obtained data

Galactic Dynamics, Compact Objects, and Mergers with Cluster-Monte-Carlo Simulation

Wonder what astrophysical events are producing the gravitational waves that LIGO and LISA observed? Recently, scientists confirmed through observation that a massive black hole resides in the center of our milky-way galaxies. That's something we can infer from our simulation before they are observed. In this project, we simulate the stellar dynamic in different galactic environment (some are in dense globular clusters, some are in sparse galactic fields). When a merging event occurs in simulation, we calculate their gravitational waveform. By doing this, we form a catalogue for compact object mergers.

  • Read scientific statement for the project here
  • One of the first team members to implement and troubleshoot CMC-COSMIC (cluster monte-carlo) simulation code for cluster dynamics and stellar evolutions.
  • Identified issues in CMC source code. Improved program correctness and efficiency using unit, regression, and integration tests. Developed software engineering solutions to answer scientific questions on galactic dynamics.
  • Implemented streamline workflow for running 80 CMC simulations on remote linux cluster using shell scripting and python subprocessing.
  • Created visualizations with matplotlib and seaborn for blackhole mergers from simulation results.
  • Formalized scientific solutions to galactic dynamics problems by calculating cluster profiles from simulation outputs using various python data science packages (Numpy, Scipy, Pandas, Scikit-learn).

Optical Coherence Tomography

  • Build a Modified Mach-Zehnder Interferometer, with the ultimate goal to enhance the optical coherence tomography technology in the medical imaging field.
  • Wrote numerical analysis scripts in Matlab to analyze the interference pattern and hysteresis effects.
  • Designed LabView interface programs to control and calibrate the interferometer arms.
  • Click here to read Presentation sides

detailed scientific statements coming later

Ising Model of Ferromagnetics

  • Wrote metropolis-hastings and Markov-Chain Monte-Carlo algorithm in Matlab to simulate spin orientation change during phase transition.
  • Created 3D visualization using Matlab.

detailed scientific statements coming later

Data Science Projects

On top of data science experience I gained from conducting research in physics, I have also worked on a few side projects for run and enrichment. I have extensive experience with many different types of simulations and statistical analysis, including classical regression, clustering, logistic regression, Markov chain Monte Carlo, brownian motion, principal component analysis, Fourier analysis, forecasting, Bayesian statistics, and network analysis.

Investment Portfolio Optimization Using Monte Carlo Simulation

  • Prepared stock market data from various sources. Performed portfolio analysis using CAPM (capital asset pricing model).
  • Portfolio Optimization using 2000 Monte Carlo Simulations.
  • Optimize arbitrary initial portfolio weights by maximizing sharpe_ratio using SLSQP (Sequential Least Squares Programming).
  • On average, the optimizer increased expected annual return by 73.99%, and increase expected sharpe ratio by 25.40%, making the investment more profitable and less violatile at the same time.

Bank Customer Segmentation Using Unsupervised Learning

  • Used Unsupervised learning (K-Mean Clustering) to perform Customer Segmentation on bank customer data set.
  • Visualized High Dimensional Data in 2D using Principle Component Analysis (PCA), Multidimensional scaling (MDS), and T-distributed Stochastic Neighbor Embedding (tSNE).
  • Applied Dimensionality Reduction by training an Autoencoder neural network.

Simulate Trading Process and Stock Market Behavior Using Stochastic Dealer Model

  • Explored the stochastic dealer model from a paper published on Physical Review E in 2008 during an global economical recession. The study aim for understanding the economical impacts of different trading behaviors.
  • Fetched real-time stock data from Yahoo using pandas_datareader API. Calculated statistical properties of 1-, 30-, 90-, and 180-day logistic change in Price. Forecast future stock prices using Regression and LSTM.
  • Used stochastic process theories to simulate and visualize the trading process as a 2D random walk. Characterized the impacts of different trend-following and trend-contrarian behaviors on market prices and returns.
  • Created a dealer model that resemble real market behavior in statistical properties.

Does Knowledge Change Fate: study on education equality and social mobility in China

  • This research is the final product of second year bachelor intermediate statistics course.
  • Collected Data from multiple resources, constructed dummy variables for analysis.
  • Conducted statistical analysis in R using multiple logistic regression and ANOVA.
  • Identified key factors for social mobility from statistical analysis and past researches Thirteen Economical Facts about Social Mobility and and the Role of Education (2013). Constructed a statistical profile of factors for social mobility specifically for Chinese data.
  • We found that, getting a bachelor degree increase people's upward social mobility by 64% compared to those getting a technical education/ junior college degree.

Data Engineering and DevOps

IBM Data Engineering Professional [Github repo] [Certificate]

  1. Introduction to Data Engineering (finished)
  2. Python for Data Science, AI & Development (finished)
  3. Python Project for Data Engineering (finished)
  4. Introduction to Relational Databases (RDBMS) (finished)
  5. Databases and SQL for Data Science with Python [(finished)
  6. Hands-on Introduction to Linux Commands and Shell Scripting (finished)
  7. Relational Database Administration (DBA) (finished)
  8. ETL and Data Pipelines with Shell, Airflow and Kafka (finished)
  9. Getting Started with Data Warehousing and BI Analytics (finished)
  10. Introduction to NoSQL Databases (finished)
  11. Introduction to Big Data with Spark and Hadoop (finished)
  12. Data Engineering and Machine Learning using Spark (finished)
  13. Data Engineering Capstone Project (finished)

HealthCare Prescriber Data ETL Pipeline on Google Cloud Using pyspark

  • Used software engineering practices (error handling, Logging, Encapsulation, Inheritance) to create a data pipeline (ingestion, preprocessing, transform, storage ,persist and transfer) in pyspark.
  • Included pyspark, logging, datetime, os, sys, re (regular expressions), and pyspark.sql.functions in building the pipeline.
  • Installed a single Node Cluster at Google Cloud and integrate the cluster with Spark to run the pipeline.
  • Performed data transfer from HDFS storage to local, then to AWS S3 and Azure Blobs. Added Data persist on Hive Database.

About Me

A Physicist by training, Xiaoqi is interested in technologies and cultures that make the world better. Xiaoqi is autistic. Outside of her professional life, she is a neurodiverse advocate and a content contributor at https://au-ti.com/.

Originally from Wuhan, China (the city that changed your life forever. Also, Xiaoqi's mom worked at Wuhan Institute for Virology from 1990 to 1998), Xiaoqi studied in New Hampshire during high school, went to Undergraduate College in Minnesota, and went to Graduate School in Pennsyvania. She obtained her bachelor degree in Physics and Mathematics and master degree in Physics. She now lives in Amsterdam and hopes to go into industry for Data Science and DevOps.

If you are interested in talking with Xiaoqi about physics, data science, Neurodiversity, Technology, or anything else, or if you are interested in working with Xiaoqi, please contact her using the form below

Harvard Business Review on "Neurodiversity is a competitive advantage"

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form