Biography
I am a third-year PhD student in Computer Science at the University of Texas at Arlington (John S. Schuchman Outstanding Doctoral Student). My PhD advisor is Prof. Junzhou Huang. I received my M.S. in Software Engineering from UTA and B.E. in Software Engineering from Xi'an Jiaotong University City College.
Research
- Large Foundation Models & Multimodal Large Language Models
- Graph Neural Networks
- Deep AUC Maximization
- Long-tailed Learning
Publications
- Feng Jiang, Yuzhi Guo, Hehuan Ma, Saiyang Na, Weizhi An, Bing Song, Yi Han, Jean Gao, Tao Wang and Junzhou Huang, "AlphaEpi: Enhancing B Cell Epitope Prediction with AlphaFold 3", In Proc. of the 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB'24, Shenzhen, China, November 2024.
- Feng Jiang, Yuzhi Guo, Hehuan Ma, Saiyang Na, Wenliang Zhong, Yi Han, Tao Wang, Junzhou Huang, "GTE: a graph learning framework for prediction of T-cell receptors and epitopes binding specificity." Briefings in Bioinformatics, Volume 25, Issue 4, July 2024.
- Weizhi An, Wenliang Zhong, Feng Jiang, Hehuan Ma and Junzhou Huang, "Causal Subgraphs and Information Bottlenecks: Redefining OOD Robustness in Graph Neural Networks", In Proc. of the 18th European Conference on Computer Vision, ECCV'24, Milan, Italy, October 2024.
- Hehuan Ma, Feng Jiang, Yuzhi Guo and Junzhou Huang, "Towards Robust Self-training for Molecular Biology Prediction Tasks", Journal of Computational Biology, Volume 31, Issue 3, pp. 213-228, March 2024.
- Na, S., Guo, Y., Feng Jiang, Ma, H., & Huang, J. "Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation." arXiv:2401.13220.
- Hehuan Ma, Feng Jiang, Yu Rong, Yuzhi Guo and Junzhou Huang, "Robust self-training strategy for various molecular biology prediction tasks." In Proc. of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, ACM BCB’22, Chicago, Illinois, USA, August 2022.
Teaching Experience
- 2248-CSE-6324-980-ADV TOPS SOFTWARE ENGINEERING Fall 2024
- 2245-CSE-3314-002-PROFESSIONAL PRACTICES Summer 2024
- 2242-CSE-6324-001-ADV TOPS SOFTWARE ENGINEERING Spring 2024
- 2238-CSE-6324-980-ADV TOPS SOFTWARE ENGINEERING Fall 2023
- 2232-CSE-6324-001-ADV TOPS SOFTWARE ENGINEERING Spring 2023
- 2228-CSE-6324-980-ADV TOPS SOFTWARE ENGINEERING Fall 2022
- 2228-CSE-5325-003-SFWR ENG II: MGMT, MAIN, & QA Fall 2022
Honors
- John S. Schuchman Outstanding Doctoral Student Award, UTA (April 2023)
Research Experience
(2024.2-Now) Multimodal Large Language Model
- LLaPA: We leverage multimodal approaches using large language models to enhance protein function prediction. Our method integrates protein sequences and functional descriptions from UniProt with contrastive learning techniques to predict the potential functions of unknown sequences. The model demonstrates superior performance in protein-text retrieval tasks and downstream predictions, while also showing reliable capabilities in both protein sequence and functional description generation.
- AlphaEpi: We combine sequence and structure information by using the protein language model ESM-2 for sequence data and AlphaFold 3 for structural data. A dynamic alignment module merges these modalities, leading to exceptional performance in B-cell epitope prediction. This work has been accepted to the ACM BCB 2024 Conference.
- Toxmm: We developed a multi-modal molecular embedding model for toxicity prediction by integrating SMILES structures and their textual descriptions. We curated a comprehensive dataset of over 50,000 molecular SMILES-text pairs from PubChem database for pre-training. Our approach achieved state-of-the-art performance on the toxicity dataset.
(2023.7-2024.3) A Graph Framework for TCR and Epitopes Binding Prediction
- Developed a novel graph-based framework that models TCR-epitope binding as a topology learning problem, utilizing residue interaction patterns and structural information to capture complex binding mechanisms.
- Enhanced model performance through deep AUC maximization and dynamic graph attention mechanisms, achieving state-of-the-art performance on multiple benchmark datasets.
- Our method significantly outperforms existing approaches in both accuracy and efficiency for TCR-epitope binding prediction. Published in Briefings in Bioinformatics.
(2023.3-2024.1) SAM-based Auto-prompting Fine-tuning Framework
- Segment Any Cell is an innovative framework designed to enhance the Segmentation Anything Model (SAM) for nuclei segmentation.
- Low-Rank Adaptation (LoRA) is integrated within the attention layer of the Transformer to improve the fine-tuning process. An auto prompt generator is introduced to guide the segmentation process.
(2022.3-2023.4) LLM in Protein Structure Prediction
- Developed a novel teacher-student framework to address the challenge of limited structural data in protein research. The Teacher model, trained on over 20,000 high-quality labeled structures, generates reliable pseudo labels to guide the training of a Student protein language model across millions of sequences.
- Published in BCB'22 and accepted by the Journal of Computational Biology 2024.
*Last updated on Nov 2024.