Feng Jiang

PhD Student in Computer Science

University of Texas at Arlington
Arlington, TX

Email: alanfengjiang@gmail.com

[Google Scholar]    [GitHub]    [Curriculum Vitae]

Feng Jiang

Biography

I am a third-year PhD student in Computer Science at the University of Texas at Arlington (John S. Schuchman Outstanding Doctoral Student). My PhD advisor is Prof. Junzhou Huang. I received my M.S. in Software Engineering from UTA and B.E. in Software Engineering from Xi'an Jiaotong University City College.


Research

- Large Foundation Models & Multimodal Large Language Models
- Graph Neural Networks
- Deep AUC Maximization
- Long-tailed Learning


Publications

10 selected publications including NeurIPS (Spotlight), ICCV, ECCV, AAAI, and journals.

  1. Feng Jiang, Mangal Prakash, Hehuan Ma, Jianyuan Deng, Yuzhi Guo, Amina Mollaysa, Tommaso Mansi, Rui Liao and Junzhou Huang, "TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence", In Proc. of the 39th Annual Conference on Neural Information Processing Systems, NeurIPS'25, San Diego, CA, USA, December 2025. (NeurIPS Spotlight, 3% acceptance rate)

  2. Feng Jiang, Amina Mollaysa, Hehuan Ma, Tommaso Mansi, Junzhou Huang, Mangal Prakash and Rui Liao, "GRAM-DTI: Adaptive Multimodal Representation Learning for Drug–Target Interaction Prediction", NeurIPS 2025 2nd Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences.

  3. Feng Jiang, Yuzhi Guo, Hehuan Ma, Saiyang Na, Weizhi An, Bing Song, Yi Han, Jean Gao, Tao Wang and Junzhou Huang, "AlphaEpi: Enhancing B Cell Epitope Prediction with AlphaFold 3", In Proc. of the 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB'24, Shenzhen, China, November 2024.

  4. Feng Jiang, Yuzhi Guo, Hehuan Ma, Saiyang Na, Wenliang Zhong, Yi Han, Tao Wang, Junzhou Huang, "GTE: A Graph Learning Framework for Prediction of T-Cell Receptors and Epitopes Binding Specificity", Briefings in Bioinformatics, Volume 25, Issue 4, July 2024.

  5. Wenliang Zhong, Rob Barton, Weizhi An, Feng Jiang, Hehuan Ma, Yuzhi Guo, Abhishek Dan, Shioulin Sam, Karim Bouyarmane and Junzhou Huang, "Zero-Shot Composed Image Retrieval via Dual-Stream Instruction-Aware Distillation", In Proc. of the 26th IEEE/CVF International Conference on Computer Vision, ICCV'25, Honolulu, Hawaii, USA, October 2025.

  6. Miao, Y., Guo, Y., Ma, H., Yan, J., Jiang, F., Liao, R., & Huang, J. (2025, April). GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 39, No. 1, pp. 622-630).

  7. Weizhi An, Wenliang Zhong, Feng Jiang, Hehuan Ma and Junzhou Huang, "Causal Subgraphs and Information Bottlenecks: Redefining OOD Robustness in Graph Neural Networks", In Proc. of the 18th European Conference on Computer Vision, ECCV'24, Milan, Italy, October 2024.

  8. Hehuan Ma, Feng Jiang, Yuzhi Guo and Junzhou Huang, "Towards Robust Self-training for Molecular Biology Prediction Tasks", Journal of Computational Biology, Volume 31, Issue 3, pp. 213-228, March 2024.

  9. Saiyang Na, Yuzhi Guo, Feng Jiang, Hehuan Ma, Jean Gao and Junzhou Huang, "Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation", IEEE Transactions on Neural Networks and Learning Systems, To Appear.

  10. Hehuan Ma, Feng Jiang, Yu Rong, Yuzhi Guo and Junzhou Huang, "Robust Self-training Strategy for Various Molecular Biology Prediction Tasks", In Proc. of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, ACM BCB'22, Chicago, Illinois, USA, August 2022.

Industry Experience

Janssen Research & Development, LLC, NJ, USA
R&D Data Science & Digital Health DSAI Intern (May 2025 - Aug 2025)

  • Developed multimodal large language models for molecular toxicity prediction and drug-target binding affinity prediction, integrating molecular structures with textual descriptions to enhance prediction accuracy.
  • Built and curated large-scale molecular datasets from PubChem for pre-training foundation models, achieving state-of-the-art performance on toxicity prediction benchmarks.
  • Collaborated with cross-functional teams to deploy machine learning models for early-stage drug discovery pipelines.
  • Recognition: Work selected as team highlight showcase for exceptional research impact and innovation.
  • Publications: 1 NeurIPS workshop poster, 1 ICML workshop paper, and 1 manuscript submitted to ICLR'26.


Teaching Experience

  • 2248-CSE-6324-980-ADV TOPS SOFTWARE ENGINEERING Fall 2024
  • 2245-CSE-3314-002-PROFESSIONAL PRACTICES Summer 2024
  • 2242-CSE-6324-001-ADV TOPS SOFTWARE ENGINEERING Spring 2024
  • 2238-CSE-6324-980-ADV TOPS SOFTWARE ENGINEERING Fall 2023
  • 2232-CSE-6324-001-ADV TOPS SOFTWARE ENGINEERING Spring 2023
  • 2228-CSE-6324-980-ADV TOPS SOFTWARE ENGINEERING Fall 2022
  • 2228-CSE-5325-003-SFWR ENG II: MGMT, MAIN, & QA Fall 2022

Honors

  • John S. Schuchman Outstanding Doctoral Student Award, UTA (April 2023)

Research Experience

(2024.12-Now) Advanced Multimodal Alignment for Molecular Representation Learning

  • Proposed a volume-based alignment approach (TRIDENT) by transforming GRAM (Gramian) matrices derived from multimodal representations into volumetric features, enabling more effective cross-modal correspondence learning and achieving superior performance on protein-text retrieval tasks. Accepted to NeurIPS'25 as Spotlight (3% acceptance rate).
  • Developed novel multimodal alignment techniques to improve drug-target interaction (DTI) prediction by effectively integrating molecular graphs and protein sequences. Manuscript submitted to ICLR'26.
  • Designed a disentangled multimodal alignment framework that separates modality-specific and modality-invariant features, enhancing model interpretability and generalization on downstream molecular property prediction tasks.

(2024.2-12) Multimodal Large Language Model

  • LLaPA: We leverage multimodal approaches using large language models to enhance protein function prediction. Our method integrates protein sequences and functional descriptions from UniProt with contrastive learning techniques to predict the potential functions of unknown sequences. The model demonstrates superior performance in protein-text retrieval tasks and downstream predictions, while also showing reliable capabilities in both protein sequence and functional description generation.
  • AlphaEpi: We combine sequence and structure information by using the protein language model ESM-2 for sequence data and AlphaFold 3 for structural data. A dynamic alignment module merges these modalities, leading to exceptional performance in B-cell epitope prediction. This work has been accepted to the ACM BCB 2024 Conference.
  • Toxmm: We developed a multi-modal molecular embedding model for toxicity prediction by integrating SMILES structures and their textual descriptions. We curated a comprehensive dataset of over 50,000 molecular SMILES-text pairs from PubChem database for pre-training. Our approach achieved state-of-the-art performance on the toxicity dataset.

(2023.7-2024.3) A Graph Framework for TCR and Epitopes Binding Prediction

  • Developed a novel graph-based framework that models TCR-epitope binding as a topology learning problem, utilizing residue interaction patterns and structural information to capture complex binding mechanisms.
  • Enhanced model performance through deep AUC maximization and dynamic graph attention mechanisms, achieving state-of-the-art performance on multiple benchmark datasets.
  • Our method significantly outperforms existing approaches in both accuracy and efficiency for TCR-epitope binding prediction. Published in Briefings in Bioinformatics.

(2023.3-2024.1) SAM-based Auto-prompting Fine-tuning Framework

  • Segment Any Cell is an innovative framework designed to enhance the Segmentation Anything Model (SAM) for nuclei segmentation.
  • Low-Rank Adaptation (LoRA) is integrated within the attention layer of the Transformer to improve the fine-tuning process. An auto prompt generator is introduced to guide the segmentation process.

(2022.3-2023.4) LLM in Protein Structure Prediction

  • Developed a novel teacher-student framework to address the challenge of limited structural data in protein research. The Teacher model, trained on over 20,000 high-quality labeled structures, generates reliable pseudo labels to guide the training of a Student protein language model across millions of sequences.
  • Published in BCB'22 and accepted by the Journal of Computational Biology 2024.


*Last updated on Nov 2024.