Cinnie She - Projects

Personal Resume Website

Project Start Date: November 2020
Latest Update: 11th June, 2022

What's this Project for?

Go to GitHub

This project aims to create my personal resume website for practicing my web development skills and for job hunting purpose.

Tech Stack Used?

All the skills used in this project are obtained through self-learning, they include:

HTML5
CSS3
JavaScript
Bootstrap5
Git & GitHub

Most Challenging Part of the Project?

The most challenging part would probably be making the website responsive. In the beginning of the project, I felt a bit frustrated on understanding and applying the concepts of media queries, and responsive layout. After searching for explanations in YouTube and reading through tutorials in W3Schools, I’ve felt a lot more comfortable with the concepts. In the actual coding, I’ve made use of Bootstrap5 framework to easily manipulate the layout, for example hiding elements in the navigation bar when the screen size is small or adjusting the number of columns on the webpage as the screen size changes. It was good to see myself starting from scratch to create a website that I think is good looking and be able to record my career journey with a personalized site. I found all the challenges that I faced in the project helped me to grow my skills and knowledge in web development!

Challenger - Task Tracking Mobile Application

Project Start Date: April 2021
Latest Update: June 2021

What's this Project for?

Go to GitLab

This project started for the purpose of getting hands-on experience on mobile application development, and for understanding about the concepts in software engineering. This project is guided by software engineers in Credit Suisse and is the project for the INSPIRE Women in Technology Program that I've joined from April to June in 2021.

The project aims to create a task tracking mobile application (i.e. a TO-DO App) that have a login screen and a task screen. The login screen allows user input validation, and the task screen enables task storage. Here's a simple demo to it: Go to YouTube

Tech Stack Used?

Mentored by Dave and Waqas as mentioned in my Resume Page , and through self-learning, the following tech stack are used in the project:

JavaScript
React/React Native
Expo
SQLite
Git & GitLab
CI/CD Pipeline
Sprint
Debugging

Most Challenging Part of the Project?

The biggest challenge that I’ve faced in that 2 months was probably when I tried to use SQLite to store the tasks that user input to a database. I had spent so much time writing the code, debugging what’s warnings with my program. It was really frustrating for those 2 weeks. I just kept getting errors, getting warnings, and the App just didn’t work as expected at all. I got really stuck and didn’t know how to fix it. And so I’ve tried to ask for help from Dave and Waqas who are my mentor and buddy, and they just guided me through it, reviewing my code, showing me how we could do the trials and errors, figuring out which part of the code was wrong by commenting some parts out(i.e. the debugging mindset) and also, they’ve explained some new concepts (e.g. Promises, Async Functions) which I found it quite difficult to understand and also some common practices (e.g. normally how data is stored by a developer). And in the end, everything worked well, and I really enjoyed that feeling of satisfaction. I hope I could have more opportunities to try using databases in my projects!

Model Inference Workflow

Project Start Date: April 2021
Project End Date: May 2021

What's this Project for?

Go to GitHub

This project is assigned by a viAct.ai AI Engineer as my internship task. I was given a classification model file in onnx format, and was asked to create a model inference workflow in Python.

In the code, I have created a function that takes in an image of a person, preprocess it to the required format, pass it to the classification model, and apply a softmax function on the output which represents the probabilities of a person wearing a helmet, a person without helmet, and undetermined respectively.

In addition to that, I have created a task scheduled at 2am every day using Prefect, which it would first look at the user-specific directories where all the images are located, then perform inference on each of the images found in the directory with the function created, and save the inference results in json format. And in the end, it would move all the processed images to user-specific directories after the inference.

Tech Stack Used?

In the beginning of this project, I know only some basics of Python, and had no knowledge about onnx, and libraries like prefect, json, os, pathlib, etc. I started all of them from reading through the documentations, and tried to figure out the code individually. The full list of tech stack used are as shown below:

Python
Onnxruntime
Prefect
JSON
Keras
Skimage
Numpy
Os
Pathlib
Datetime

Most Challenging Part of the Project?

I found all parts of it challenging since I had totally no idea what I should do as I received the task from my supervisor. Many of the tools mentioned in the task description are new to me, and I started all from scratch. I was really lost at that time, but eventually, after dividing it into smaller parts (e.g. learning about onnxruntime first, then prefect, then file import and export, ...), I was able to find out a solution for the problem, and accomplished the task in the end. I found it a very fruitful learning experience!

Hangman Project - Python

Project Start Date: 11th August 2021
Project End Date: 13th August 2021

What's this Project for?

Go to GitHub

This project was done in my coding bootcamp organized by the FDM Group. It enables me to consolidate my Python knowledge learnt in university and also that from the bootcamp.

In the game, there are 3 difficulty levels, easy, medium, and hard, with fewer lives accordingly. Player 1 would write down a word for Player 2 to guess, and player 2 would guess it until his/her lives are all used up. I've used the Python library 'Turtle' to create a graphical interface for a better gaming experience for the users. Here's a simple demo to it: Go to YouTube

Tech Stack Used?

Python
Python Library: turtle
Python Library: time

Most Challenging Part of the Project?

The biggest challenge of this project was to create a graphical interface using turtle. It was hard to know which coordinates the turtle has to goto in order to create a good looking interface. In the end, I managed to do it by first designing the graphics on a graph paper online, and code it accordingly.

Course Learning 2022 App - Android

Project Start Date: February 2022
Project End Date: March 2022

What's this Project for?

Go to GitHub

This project started with the purpose of getting hands-on experience on Android mobile application development after self-learning Kotlin and app development during 2021 Winter, and for being able to revise course materials on phone with a nicely designed app. Here's a simple demo to it: Go to YouTube

Tech Stack Used?

Kotlin
Android Studio

Most Challenging Part of the Project?

I started like a blank sheet of paper, so initially I got lost on where to start learning with. There were too many things needed to learn to build a good application, for example I have to know about the 4 basic app components: Activities, Services, Broadcast Receivers, and Content Providers, and then about app architecture of how to divide an app into UI and Data layers, how to do debugging, how to write unit and intrumental tests, how to do navigation, etc. I also have to learn a new language - Kotlin, because from my research, Google had announced that Kotlin would be the official language for Android, and I would like to follow the latest trend. All of these make the learning progress tough, not only I have to keep my motivation up, but also I have to plan my learning process step by step, and implement the plan well. I am so proud that I was able to make a cool looking Android application, and I think project planning would have been the most challenging part of this project.

Sentiment Analysis - Machine Learning

Project Start Date: March 2022
Project End Date: April 2022

What's this Project for?

See PDF Summary

This project was the 1st course project for 'Advanced Data Mining for Risk Management and Business Intelligence'. The aim is to perform sentiment analysis and rank the attitude of the review data set from 1 to 5.

Firstly, explorative data analysis was performed in the beginning of the project, for example checking if the data is balanced, characteristics of review data (e.g., UPPERCASE, punctuations, tagging, etc.), frequency of N-gram, correlation between features and label using correlation matrix, and exploring the distribution of reviews' word count, sentence count, etc.

Then 4 different model architectures were experimented, including 1-layer perceptron, CNN, RNN, and RNN-CNN. Tuning was conducted on features and hyperparameters to improve the model performance. In the end, our group has passed the strong baseline given by our instructor, which is with a Macro-F1 score of 0.5609 (> 0.5370).

Tech Stack Used?

Python
tensorflow
Keras
os
nltk
numpy
matplotlib
itertools
collections
sklearn

Most Challenging Part of the Project?

This project could be said to be my first project to do all procedures in machine learning training. From feature extractor, to explorative data analysis, then to training 4 different model architectures, hyperparameters tuning, and performance evaluation. The whole process was challenging because I was unfamiliar with coding to do machine learning tasks. Therefore, this project had really given me a very good hands-on experience in machine learning, and I was so obsessed with it and keep trying it for the whole week for more than 10 hours a day!

Social Network Mining - Machine Learning

Project Start Date: April 2022
Project End Date: April 2022

What's this Project for?

This project was the 2nd course project for 'Advanced Data Mining for Risk Management and Business Intelligence'. The aim of this project is to perform link prediction between 2 users in a social network, which link prediction is done by computing node (user) similarity in the embedding space. Model performance was evaluated by the AUC-ROC score.

The pipeline of the project is as follows:

Data Loader

Load training data and validation data (from user_id and friends list --> to edges (user_id, frd1), (user_id, frd2), ...)
Load test data (from src and dst --> to edge (src, dst))
Construct a directed graph with networkx using the training edges
Generate false edges (edges that does not exist in both training and validation dataset) for later validation use

Generate Random Walk

1st-order for DeepWalk, 2nd-order for Node2Vec for later training

Training/Hyperparameters Tuning

Learn the model (either DeepWalk or Node2Vec) which transform a node to its vector representation

Validation/Performance Evaluation

Covert validation set into true edges & combine with the false edge generated before
Use the model to transform all the nodes into vector representations
Calculate the cosine similarity between 2 nodes for both true edges & false edges (as a measure to see how likely they are linked)
Calculate the AUC-ROC score to evaluate the model performance (i.e., see how well the model can represent nodes in the vector space, so that the cosine similarity between 2 nodes could reflect whether there is a link between 2 nodes)

The final model of our project is the DeepWalk model with parameters node dimension = 10, walk length = 17, and number of walks = 17, which gave us a validation AUC-ROC score of 0.9346 (> strong baseline 0.9290).

Tech Stack Used?

Python
networkx
numpy
os
pandas
matplotlib
itertools
collections
gensim.models (Word2Vec)
sklearn
plotly

Most Challenging Part of the Project?

The most challenging part of the project is to visualize the performance of different models during the hyperparameters tuning process. Since I wanted to tuning multiple hyperparameters all at once for less human monitoring effort during tuning, heatmap was not enough to visualize the models' performance. In the end, after some researches, I was able to construct a parallel coordinates plot using the plotly library, and it was a rewarding process!

Rating Prediction - Machine Learning

Project Start Date: May 2022
Project End Date: May 2022

What's this Project for?

This project was the 3rd course project for 'Advanced Data Mining for Risk Management and Business Intelligence'. The aim of this project is to predict user ratings on items based on available ratings. We had trained and tuned 2 model architectures: Neural Collaborative Filtering (NCF) and Wide and Deep Learning (WDL). Root Mean Squared Error (RMSE) was used to evaluate our prediction performance.

Our group's final model was the NCF model with epoch = 1, embedding size = 5, and output layer using multi-layer perceptron. We achieved an RMSE of 1.0568 (< strong baseline 1.09) on the validation set.

Tech Stack Used?

Python
os
numpy
random
pandas
tensorflow
keras
matplotlib
plotly
sklearn
itertools
collections

Most Challenging Part of the Project?

This project was challenging, not because of the task, but because we had a poor mindset in the beginning of our project. In our course, we have learnt both the NCF and the WDL models, but in our project, we once ignored the NCF model because the WDL model used much more features and we thought that it would give a better model performance. We were exhausted in tuning the WDL model to meet the baseline provided by our instructor and we were far away from that even there was only 1 day left before the project deadline. In the end, one of my groupmate realized that using the NCF model could easily pass the baseline, and we were shocked. This project had taught me that complex model does not equal to good performance, in practical tasks, we have to consider which model suits our problem, but not how complex it is. I have learnt that, sometimes, simple is the best.

Power Plant Machine Learning Pipeline - Spark

Project Start Date: April 2022
Latest Update: April 2022

What's this Project for?

This is a course assignment for 'Cloud Computing and Big Data Systems'.

Business Problem: The challenge for a power grid operator is how to handle a short fall in available resources versus actual demand. One solution to it is to turn on small Peaker or Peaking Power Plants which is of high cost per kilowatt hour, and another is to buy expensive power from another grid. In order to make better economic trade-offs about the number of peaker plants to turn on or whether to buy from another grid, the grid operator would like to know an estimation of the power output of a peaker power plant (while this would depends on the environmental conditions).

Task (Supervised Regression Problem): Predict power output given a set of environmental readings from various sensors (i.e., Atmospheric Temperature in C, Exhaust Vacuum Speed, Atmospheric Pressure, and Relative Humidity) in a natural gas-fired power generation plant.

Dataset:

9568 data points collected over 6 years
4 environmental attributes: AT, V, AP, RH
UCI Machine Learning Repository Combined Cycle Power Plant Data Set
Label/Target: PE (numeric)

The pipeline of the project is as follows:

Extract-Transform-Load (ETL) the data

Explorative Data Analysis

Using Spark SqL and visualization in Databricks to draw the scatter plot for different features with the label

Data Preparation

Build ML pipeline to convert the predictor features from DataFrame columns to Feature Vectors
Split into training and test dataset

Model Training & Tuning

Building pipeline: 1) convert to feature vector 2) linear regression OR decision tree regression OR random forest tree
Create a CrossValidator to perform k-fold validation on different hyperparameters values

Model Evaluation

Computing the Coefficient of Determination (R^2) and Root Mean Square Error (RMSE)

The best model in this assignment was the Random Forest Tree Model using an ensemble of 25 trees with a depth of 8.

Tech Stack Used?

Spark SQL
PySpark

Most Challenging Part of the Project?

The most challenging part in this assignment was to understand the key concepts since some of the terms used in the assignment was not taught during class, and some of them were not appeared in machine learning in a non-cloud environment. For example, I have to read through the documentation to understand the concept of Transformer, which is an algorithm that transform a DataFrame into another DataFrame (i.e. a ML model), or an Estimator, which is an algorithm that can be fit on a DataFrame to produce a Transformer (i.e., learning algorithm that trains on DataFrame to produces a model). On the other hand, learning the API for spark like ParamMap and VectorAssembler was also a bit challenging and fun XD

COVID-19 Big Data Analysis - PySpark

Project Start Date: April 2022
Project End Date: May 2022

What's this Project for?

See PDF Summary

This is the course project for 'Cloud Computing and Big Data Systems'. The aim of this project is to conduct a descriptive analysis of global COVID-19 data through the utilization of cloud computing tools and platforms learnt in class.

Problem Description: COVID-19 first identified in 2019 has now fully grown to become a global pandemic that caused vast and irreparable devastation to all parts of the world. Though with the development of vaccination and by doing mask-wearing and social-distancing etc., situations have improved, however, there still a long road ahead of us before we can truly rid ourselves of the disease. It is important for us to learn from the past data, and understand the current situation in order to adopt the best measures to fight against the disease. Hence, our project is to analysis the global COVID-19 data and gain insights from it.

Detailed insights obtained could refer to the PDF Summary above.

Tech Stack Used?

Spark SQL
PySpark

Most Challenging Part of the Project?

I think the most challenging part of the project was to gain insights from the tables and charts obtained through coding and to write the report. It was not difficult to extract the information we wanted from the dataset, because simply by understanding the API could do the job. However, extracting meaningful information and explaining the reasons behind the information we obtained was the hardest part. In this project, I realized the importance of good English writing skills. I was responsible in the coding part, and I tried to write a draft report just to explain what I've got from my code to my groupmates. They have write another one on top of mine, and their report was far better than mine because they explain everything much clearly, more organized, and with more accurate vocabularies. After this project, I saw the need of further improving my language skills and that was the biggest reward from this project.

NS-Shaft Game - MIPS

Project Start Date: May 2022
Project End Date: May 2022

What's this Project for?

This is the course project of 'Computer Organization'. The aim of this project is to construct the NS-Shaft game using the MIPS assembly language taught in class for practising using procedure calls.

Tech Stack Used?

MIPS
MARS

Most Challenging Part of the Project?

The most challenging part of the project is understand the whole flow of the skeleton code given by the course instructor. Since assembly language is less readable compare with high-level programming language, spending time to understand and create a program flow chart is quite time-consuming. However, by being patient and read the code step by step, it was still manageable.

AWS EC2 Measurement

Project Start Date: March 2022
Project End Date: March 2022

What's this Project for?

This is an assignment for the course 'Cloud Computing and Big Data Systems'. The aim of this assignment is to measure the CPU, memory, and network performances of different AWS EC2 instances.

Through the use of SysBench, which is an open-source benchmark utility that evaluates the parameter feature tests for CPU, memory, I/O, and database performance, the CPU and memory performance of different AWS EC2 instances for example t3.medium, m5.large, c5d.large, were tested. On the other hand, by using iPerf and Ping, the TCP bandwidth and round-trip time (RTT) for measuring network performance between 2 instances were also tested to observe the difference in network performance between instances deployed in the same region, and between instances deployoed in different regions.

Tech Stack Used?

AWS EC2
SysBench
iPerf
Ping

Most Challenging Part of the Project?

The most challenging part is to learn using Linux command line for establishing connections with and between different instances since I was not familiar with Linux. Besides, I had also struggled a lot to learn how to use SysBench for doing measurement, and understanding the output of the code. For example, to measure the CPU performance of the instance, SysBench is trying to calculate prime numbers up to 10000 within 10 seconds, and I have to figure out which figure to look at in order to compare the CPU performances of different instances (i.e., the CPU speed - representing how many times it calculated prime number up to 10000 within 10 seconds). Since I haven't learnt anything about CPU, memory, and networking at that time, I had struggled to understand what SysBench was doing. However, after some struggles, I was able to complete the tasks, and it was fulfilling.

Music Flip: Computer Vision Android App - CV/Android

Go to Google Play

Project Start Date: June 2022
Project End Date: August 2022

What's this Project for?

This is a commercial project that I have developed through collaborating with our HKUST RMBI + COSC alumnus Hayden Chiu during the summer of 2022. The project aims to facilitates musicians to flip their sheet music during their musical performance without the need to bother flipping it by hand while worrying about how this would affect their performance. It makes use of the Computer Vision technology to enable users flipping their sheet music by simple head gesture like nodding head or winking eyes.

I was responsible for the settings page, icon, splash screen, and notification layouts and functionalities. In particular, I have had practical experience in using the PreferenceScreen, setting locale, themes, sending intent, etc. On the other hand, I am also responsible for collecting data related to head gesture, for the training of machine learning model. Although I was not responsible for the model training part, but we have meetings sharing our development process with each other for learning purpose. It was a great learning experience, and I have learnt the whole process of embedding computer vision technologies to android mobile application development, from data collection, to training models, and deployment. It was fun, and I really appreciate him for inviting me to this interesting project!

Tech Stack Used?

Python
Java
Android Development
Git/GitHub

Personal Resume Website

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

Challenger - Task Tracking Mobile Application

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

Model Inference Workflow

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

Hangman Project - Python

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

Course Learning 2022 App - Android

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

Sentiment Analysis - Machine Learning

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

Social Network Mining - Machine Learning

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

Rating Prediction - Machine Learning

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

Power Plant Machine Learning Pipeline - Spark

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

COVID-19 Big Data Analysis - PySpark

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

NS-Shaft Game - MIPS

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

AWS EC2 Measurement

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

FinBERT Research Project - NLP

What's this Project for?

Tech Stack Used?

Most Challenging Part of the Project?

Music Flip: Computer Vision Android App - CV/Android

What's this Project for?

Tech Stack Used?