Project Start Date: November 2020
Latest Update: 11th June, 2022
This project aims to create my personal resume website for practicing my web development skills and for job hunting purpose.
All the skills used in this project are obtained through self-learning, they include:
The most challenging part would probably be making the website responsive. In the beginning of the project, I felt a bit frustrated on understanding and applying the concepts of media queries, and responsive layout. After searching for explanations in YouTube and reading through tutorials in W3Schools, I’ve felt a lot more comfortable with the concepts. In the actual coding, I’ve made use of Bootstrap5 framework to easily manipulate the layout, for example hiding elements in the navigation bar when the screen size is small or adjusting the number of columns on the webpage as the screen size changes. It was good to see myself starting from scratch to create a website that I think is good looking and be able to record my career journey with a personalized site. I found all the challenges that I faced in the project helped me to grow my skills and knowledge in web development!
Project Start Date: April 2021
Latest Update: June 2021
This project started for the purpose of getting hands-on experience on mobile application development, and for understanding about the concepts in software engineering. This project is guided by software engineers in Credit Suisse and is the project for the INSPIRE Women in Technology Program that I've joined from April to June in 2021.
The project aims to create a task tracking mobile application (i.e. a TO-DO App) that have a login screen and a task screen. The login screen allows user input validation, and the task screen enables task storage. Here's a simple demo to it: Go to YouTube
Mentored by Dave and Waqas as mentioned in my Resume Page , and through self-learning, the following tech stack are used in the project:
The biggest challenge that I’ve faced in that 2 months was probably when I tried to use SQLite to store the tasks that user input to a database. I had spent so much time writing the code, debugging what’s warnings with my program. It was really frustrating for those 2 weeks. I just kept getting errors, getting warnings, and the App just didn’t work as expected at all. I got really stuck and didn’t know how to fix it. And so I’ve tried to ask for help from Dave and Waqas who are my mentor and buddy, and they just guided me through it, reviewing my code, showing me how we could do the trials and errors, figuring out which part of the code was wrong by commenting some parts out(i.e. the debugging mindset) and also, they’ve explained some new concepts (e.g. Promises, Async Functions) which I found it quite difficult to understand and also some common practices (e.g. normally how data is stored by a developer). And in the end, everything worked well, and I really enjoyed that feeling of satisfaction. I hope I could have more opportunities to try using databases in my projects!
Project Start Date: April 2021
Project End Date: May 2021
This project is assigned by a viAct.ai AI Engineer as my internship task. I was given a classification model file in onnx format, and was asked to create a model inference workflow in Python.
In the code, I have created a function that takes in an image of a person, preprocess it to the required format, pass it to the classification model, and apply a softmax function on the output which represents the probabilities of a person wearing a helmet, a person without helmet, and undetermined respectively.
In addition to that, I have created a task scheduled at 2am every day using Prefect, which it would first look at the user-specific directories where all the images are located, then perform inference on each of the images found in the directory with the function created, and save the inference results in json format. And in the end, it would move all the processed images to user-specific directories after the inference.
In the beginning of this project, I know only some basics of Python, and had no knowledge about onnx, and libraries like prefect, json, os, pathlib, etc. I started all of them from reading through the documentations, and tried to figure out the code individually. The full list of tech stack used are as shown below:
I found all parts of it challenging since I had totally no idea what I should do as I received the task from my supervisor. Many of the tools mentioned in the task description are new to me, and I started all from scratch. I was really lost at that time, but eventually, after dividing it into smaller parts (e.g. learning about onnxruntime first, then prefect, then file import and export, ...), I was able to find out a solution for the problem, and accomplished the task in the end. I found it a very fruitful learning experience!
Project Start Date: 11th August 2021
Project End Date: 13th August 2021
This project was done in my coding bootcamp organized by the FDM Group. It enables me to consolidate my Python knowledge learnt in university and also that from the bootcamp.
In the game, there are 3 difficulty levels, easy, medium, and hard, with fewer lives accordingly. Player 1 would write down a word for Player 2 to guess, and player 2 would guess it until his/her lives are all used up. I've used the Python library 'Turtle' to create a graphical interface for a better gaming experience for the users. Here's a simple demo to it: Go to YouTube
The biggest challenge of this project was to create a graphical interface using turtle. It was hard to know which coordinates the turtle has to goto in order to create a good looking interface. In the end, I managed to do it by first designing the graphics on a graph paper online, and code it accordingly.
Project Start Date: February 2022
Project End Date: March 2022
This project started with the purpose of getting hands-on experience on Android mobile application development after self-learning Kotlin and app development during 2021 Winter, and for being able to revise course materials on phone with a nicely designed app. Here's a simple demo to it: Go to YouTube
I started like a blank sheet of paper, so initially I got lost on where to start learning with. There were too many things needed to learn to build a good application, for example I have to know about the 4 basic app components: Activities, Services, Broadcast Receivers, and Content Providers, and then about app architecture of how to divide an app into UI and Data layers, how to do debugging, how to write unit and intrumental tests, how to do navigation, etc. I also have to learn a new language - Kotlin, because from my research, Google had announced that Kotlin would be the official language for Android, and I would like to follow the latest trend. All of these make the learning progress tough, not only I have to keep my motivation up, but also I have to plan my learning process step by step, and implement the plan well. I am so proud that I was able to make a cool looking Android application, and I think project planning would have been the most challenging part of this project.
Project Start Date: March 2022
Project End Date: April 2022
This project was the 1st course project for 'Advanced Data Mining for Risk Management and Business Intelligence'. The aim is to perform sentiment analysis and rank the attitude of the review data set from 1 to 5.
Firstly, explorative data analysis was performed in the beginning of the project, for example checking if the data is balanced, characteristics of review data (e.g., UPPERCASE, punctuations, tagging, etc.), frequency of N-gram, correlation between features and label using correlation matrix, and exploring the distribution of reviews' word count, sentence count, etc.
Then 4 different model architectures were experimented, including 1-layer perceptron, CNN, RNN, and RNN-CNN. Tuning was conducted on features and hyperparameters to improve the model performance. In the end, our group has passed the strong baseline given by our instructor, which is with a Macro-F1 score of 0.5609 (> 0.5370).
This project could be said to be my first project to do all procedures in machine learning training. From feature extractor, to explorative data analysis, then to training 4 different model architectures, hyperparameters tuning, and performance evaluation. The whole process was challenging because I was unfamiliar with coding to do machine learning tasks. Therefore, this project had really given me a very good hands-on experience in machine learning, and I was so obsessed with it and keep trying it for the whole week for more than 10 hours a day!
Project Start Date: April 2022
Project End Date: April 2022
This project was the 2nd course project for 'Advanced Data Mining for Risk Management and Business Intelligence'. The aim of this project is to perform link prediction between 2 users in a social network, which link prediction is done by computing node (user) similarity in the embedding space. Model performance was evaluated by the AUC-ROC score.
The pipeline of the project is as follows:
The final model of our project is the DeepWalk model with parameters node dimension = 10, walk length = 17, and number of walks = 17, which gave us a validation AUC-ROC score of 0.9346 (> strong baseline 0.9290).
The most challenging part of the project is to visualize the performance of different models during the hyperparameters tuning process. Since I wanted to tuning multiple hyperparameters all at once for less human monitoring effort during tuning, heatmap was not enough to visualize the models' performance. In the end, after some researches, I was able to construct a parallel coordinates plot using the plotly library, and it was a rewarding process!
Project Start Date: May 2022
Project End Date: May 2022
This project was the 3rd course project for 'Advanced Data Mining for Risk Management and Business Intelligence'. The aim of this project is to predict user ratings on items based on available ratings. We had trained and tuned 2 model architectures: Neural Collaborative Filtering (NCF) and Wide and Deep Learning (WDL). Root Mean Squared Error (RMSE) was used to evaluate our prediction performance.
Our group's final model was the NCF model with epoch = 1, embedding size = 5, and output layer using multi-layer perceptron. We achieved an RMSE of 1.0568 (< strong baseline 1.09) on the validation set.
This project was challenging, not because of the task, but because we had a poor mindset in the beginning of our project. In our course, we have learnt both the NCF and the WDL models, but in our project, we once ignored the NCF model because the WDL model used much more features and we thought that it would give a better model performance. We were exhausted in tuning the WDL model to meet the baseline provided by our instructor and we were far away from that even there was only 1 day left before the project deadline. In the end, one of my groupmate realized that using the NCF model could easily pass the baseline, and we were shocked. This project had taught me that complex model does not equal to good performance, in practical tasks, we have to consider which model suits our problem, but not how complex it is. I have learnt that, sometimes, simple is the best.
Project Start Date: April 2022
Latest Update: April 2022
This is a course assignment for 'Cloud Computing and Big Data Systems'.
Business Problem: The challenge for a power grid operator is how to handle a short fall in available resources versus actual demand. One solution to it is to turn on small Peaker or Peaking Power Plants which is of high cost per kilowatt hour, and another is to buy expensive power from another grid. In order to make better economic trade-offs about the number of peaker plants to turn on or whether to buy from another grid, the grid operator would like to know an estimation of the power output of a peaker power plant (while this would depends on the environmental conditions).
Task (Supervised Regression Problem): Predict power output given a set of environmental readings from various sensors (i.e., Atmospheric Temperature in C, Exhaust Vacuum Speed, Atmospheric Pressure, and Relative Humidity) in a natural gas-fired power generation plant.
Dataset:
The pipeline of the project is as follows:
The best model in this assignment was the Random Forest Tree Model using an ensemble of 25 trees with a depth of 8.
The most challenging part in this assignment was to understand the key concepts since some of the terms used in the assignment was not taught during class, and some of them were not appeared in machine learning in a non-cloud environment. For example, I have to read through the documentation to understand the concept of Transformer, which is an algorithm that transform a DataFrame into another DataFrame (i.e. a ML model), or an Estimator, which is an algorithm that can be fit on a DataFrame to produce a Transformer (i.e., learning algorithm that trains on DataFrame to produces a model). On the other hand, learning the API for spark like ParamMap and VectorAssembler was also a bit challenging and fun XD
Project Start Date: April 2022
Project End Date: May 2022
This is the course project for 'Cloud Computing and Big Data Systems'. The aim of this project is to conduct a descriptive analysis of global COVID-19 data through the utilization of cloud computing tools and platforms learnt in class.
Problem Description: COVID-19 first identified in 2019 has now fully grown to become a global pandemic that caused vast and irreparable devastation to all parts of the world. Though with the development of vaccination and by doing mask-wearing and social-distancing etc., situations have improved, however, there still a long road ahead of us before we can truly rid ourselves of the disease. It is important for us to learn from the past data, and understand the current situation in order to adopt the best measures to fight against the disease. Hence, our project is to analysis the global COVID-19 data and gain insights from it.
Detailed insights obtained could refer to the PDF Summary above.
I think the most challenging part of the project was to gain insights from the tables and charts obtained through coding and to write the report. It was not difficult to extract the information we wanted from the dataset, because simply by understanding the API could do the job. However, extracting meaningful information and explaining the reasons behind the information we obtained was the hardest part. In this project, I realized the importance of good English writing skills. I was responsible in the coding part, and I tried to write a draft report just to explain what I've got from my code to my groupmates. They have write another one on top of mine, and their report was far better than mine because they explain everything much clearly, more organized, and with more accurate vocabularies. After this project, I saw the need of further improving my language skills and that was the biggest reward from this project.
Project Start Date: May 2022
Project End Date: May 2022
This is the course project of 'Computer Organization'. The aim of this project is to construct the NS-Shaft game using the MIPS assembly language taught in class for practising using procedure calls.
The most challenging part of the project is understand the whole flow of the skeleton code given by the course instructor. Since assembly language is less readable compare with high-level programming language, spending time to understand and create a program flow chart is quite time-consuming. However, by being patient and read the code step by step, it was still manageable.
Project Start Date: March 2022
Project End Date: March 2022
This is an assignment for the course 'Cloud Computing and Big Data Systems'. The aim of this assignment is to measure the CPU, memory, and network performances of different AWS EC2 instances.
Through the use of SysBench, which is an open-source benchmark utility that evaluates the parameter feature tests for CPU, memory, I/O, and database performance, the CPU and memory performance of different AWS EC2 instances for example t3.medium, m5.large, c5d.large, were tested. On the other hand, by using iPerf and Ping, the TCP bandwidth and round-trip time (RTT) for measuring network performance between 2 instances were also tested to observe the difference in network performance between instances deployed in the same region, and between instances deployoed in different regions.
The most challenging part is to learn using Linux command line for establishing connections with and between different instances since I was not familiar with Linux. Besides, I had also struggled a lot to learn how to use SysBench for doing measurement, and understanding the output of the code. For example, to measure the CPU performance of the instance, SysBench is trying to calculate prime numbers up to 10000 within 10 seconds, and I have to figure out which figure to look at in order to compare the CPU performances of different instances (i.e., the CPU speed - representing how many times it calculated prime number up to 10000 within 10 seconds). Since I haven't learnt anything about CPU, memory, and networking at that time, I had struggled to understand what SysBench was doing. However, after some struggles, I was able to complete the tasks, and it was fulfilling.
Project Start Date: June 2022
Project End Date: August 2022
This is the research project that I have taken in the summer of 2022. The research topic is: Evaluation of FinBERT Performance on Multi-Class ESG Classification Task based on the MSCI Framework.
Supervised by Professor Huang Allen Hao and Professor Yang Yi, and with the help of Postgraduate students Miss Hui Wang and Mr. Srijith Kannan, I have completed the research on the evaluation of FinBERT performance for multi-class ESG classification. The project included longer than one month of data labelling process, and another month for model fine-tuning process. It was a valuable experience which allows me to write up data preprocessing, training, fine-tuning, and evaluation code for as many as 11 machine learning models. These ML models include Naïve Bayes, Logistic Regression, Linear SVM, Random Forest, MLP, CNN, LSTM, Bi-directional LSTM, GRU, BERT, and FinBERT, where FinBERT is a Large Language Model developed this year that adapts to financial texts.
Through the project, besides the ML part, it was also a fresh experience to be labelling such a huge amount of data. I was responsible to ensure the data label quality, and it was a tough work. However, I enjoyed it a lot because I could make use of this opportunity to force myself reading company's CSR reports and understand more about different company's ESG operations.
The most challenging part is to prepare the code for BERT fine-tuning. It is because I have never learnt about BERT model before, and I have never used PyTorch before too, it had taken me some time to understand the process of loading the model and tokenizer, as well as defining training arguments for fine-tuning. The problem was solved unexpectedly easy because supervisor had provided a sample code of fine-tuning FinBERT model, and in the end, only a few amendment in the sample code is needed for fine-tuning BERT. Although the solution was simple, but it has led me to do a lot of research on the BERT model, and gain some basic knowledge about it. It was a rewarding learning process and I enjoyed a lot!
Project Start Date: June 2022
Project End Date: August 2022
This is a commercial project that I have developed through collaborating with our HKUST RMBI + COSC alumnus Hayden Chiu during the summer of 2022. The project aims to facilitates musicians to flip their sheet music during their musical performance without the need to bother flipping it by hand while worrying about how this would affect their performance. It makes use of the Computer Vision technology to enable users flipping their sheet music by simple head gesture like nodding head or winking eyes.
I was responsible for the settings page, icon, splash screen, and notification layouts and functionalities. In particular, I have had practical experience in using the PreferenceScreen, setting locale, themes, sending intent, etc. On the other hand, I am also responsible for collecting data related to head gesture, for the training of machine learning model. Although I was not responsible for the model training part, but we have meetings sharing our development process with each other for learning purpose. It was a great learning experience, and I have learnt the whole process of embedding computer vision technologies to android mobile application development, from data collection, to training models, and deployment. It was fun, and I really appreciate him for inviting me to this interesting project!