Work Location > Beijing, China
School Address > Dalian, China
Work Mail > r1228240468@gmail.com
Official Mail > diaohw@mail.dlut.edu.cn

Haiwen Diao

CS PhD Student at DLUT

About Me

PROFESSIONAL PATH

Third-year Ph.D. directed by Prof. Huchuan Lu from Dalian University of Technology.
Research Interests:   Deep Learning, Machine Learning, Computer Vision Domains: Vision-and-Language, Parameter-efficient Transfer Learning, Large Multimodal Model. 1. Vision-Language Retrieval: SGRAF (AAAI'21), RCAR (TIP'23), DBL (TIP'24), GSSF (TIP'24)
2. Parameter-Efficient Tuning: UniPT (CVPR'24), SHERL (ECCV'24), KARST (2024)
3. Large Multi-Modality Model: EVE (NeurIPS'24), DenseFusion (NeurIPS'24), PathWeave (NeurIPS'24)
4. AI Generated Content: MoTrans (ACMMM'24), NOVA (2024)


Research Pursuits:   Develop an efficient and reliable mechanism that can proficiently recognize visual-semantic perception, contextualize fine-grained interaction across modalities, and mimick human-like judgement and decision-making capabilities.
Open Resources:   [Awesome_Matching_Pretraining_Transfering]
[Awesome_Image_Text_Retrieval_Benchmark]

Sep. 2023 -- Present:   Research intern at BAAI with Dr. Xinlong Wang on Large Multimodality Model for Understanding and Generation.
Jan. 2023 -- Aug. 2023:   Remote cooperation with Ph.D. Bo Wan from KU Leuven and Asst. Prof. Long Chen from HKUST on Parameter-efficient Transfer Learning.
Jun. 2020 -- Mar. 2021:   Research intern at Tencent AI Lab with Dr. Ying Zhang and Dr. Lin Ma on Image-Text Retrieval, Cross-modal Boosting and Metric Learning.

News

Sep. 2023:   Start the research internship at BAAI.

Jun. 2020:   Start the research internship at Tencent AI Lab.

Publication

MY PAPER

2024

Autoregressive Video Generation without Vector Quantization

KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification

Unveiling Encoder-Free Vision-Language Models

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

LLMs Can Evolve Continually on Modality for X-Modal Reasoning

MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models

Exploring Dynamic Transformer for Efficient Object Tracking

2023

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory

2022

GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning

Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

Plug-and-Play Regulators for Image-Text Matching

2021

Similarity Reasoning and Filtration for Image-Text Matching






Experience

MY WORK

Research Intern

BAAI
Sep. 2023 - Present

Research Intern

Tencent AI Lab
Jun. 2020 - Mar. 2021

Education

ACADEMIC CAREER

Ph.D. Student

Dalian University of Technology - Information and Communication
Sep. 2021 - Present

Master of Science

Dalian University of Technology - Information and Communication
Sep. 2018 - Jun. 2021

Bachelor of Science

Dalian University of Technology - Electronic Information Engineering
Sep. 2014 - Jun. 2018

Service

REVIEWER CAREER

Journal Reviewer:

IEEE TPAMI,   IEEE TIP,   IEEE TNNLS

Conference Reviewer:

CVPR,   ICCV,   ECCV,   AAAI,   ACMMM