I am a senior machine learning researcher at Apple AI/ML - Foundation Model.
I obtained my PhD degree from University of Southern California with Prof. Fei Sha.
Previously, I was a research intern & student researcher at Google Brain, Google Research Language, AWS Rekognition, and Tencent AI lab at Seattle. I was a visiting student at SIAT MMLab.
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang
ECCV 2024.
[Paper]
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai, Haotian Zhang, Bowen Zhang , Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao
ECCV 2024.
[Paper] [Code]
Ferret: Refer and ground anything anywhere at any granularity
Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang , Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang
ICLR 2024.
[Paper] [Code]
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang , Zhangyang Wang, Yinfei Yang
ICLR 2024.
[Paper]
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Erik Daxberger, Floris Weers, Bowen Zhang , Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du
arxiv 2023.
[Paper]
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang
ICLR 2024.
[Paper]
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Liangliang Cao, Bowen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng
arxiv 2023.
[Paper] [Dataset]
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang
EMNLP 2023.
[Paper]
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha
Arxiv 2021.
[Paper] [Post]
Visually Grounded Concept Composition
Bowen Zhang, Hexiang Hu, Linlu Qiu, Pete Shaw, Fei Sha
Findings of EMNLP 2021.
[Paper]
Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?
Linlu Qiu, Hexiang Hu, Bowen Zhang, Pete Shaw, Fei Sha
EMNLP 2021. Oral Presentation
[Paper]
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
Bowen Zhang*, Hexiang Hu*, Joonseok Lee, Ming Zhao, Sheide Chammas, Vihan Jain, Eugene Ie, Fei Sha
Arxiv 2020.
[Paper]
Online Action Detection in Streaming Videos with Time Buffers
Bowen Zhang, Hao Chen, Meng Wang, Yuanjun Xiong
Arxiv 2020.
[Paper]
Learning to Represent Image and Text with Denotation Graph
Bowen Zhang*, Hexiang Hu*, Vihan Jain, Eugene Ie, Fei Sha
EMNLP 2020. Oral Presentation
[Paper]
Visual Storytelling via Predicting Anchor Word Embeddings in the Stories
Bowen Zhang, Hexiang Hu, Fei Sha
ICCV 2019 Workshop on Closing the Loop Between Vision and Language.
[Paper]
Topic Augmented Generator for Abstractive Summarization
Melissa Ailem, Bowen Zhang, Fei Sha
BayLearn 2019.
[Paper]
A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images
Melissa Ailem, Bowen Zhang, Aurélien Bellet, Pascal Denis, Fei Sha
EMNLP 2018.
[Paper]
Cross-Modal and Hierarchical Modeling of Video and Text
Bowen Zhang*, Hexiang Hu*, Fei Sha
ECCV 2018.
[Paper] [Project] [Code]
Real-Time Action Recognition with Deeply-Transferred Motion Vector CNNs
Bowen Zhang, Limin Wang, Zhe Wang, Yu Qiao, Hanli Wang
IEEE Transaction on Image Processing, 2018.
[Paper] [Project]
Real-time Action Recognition with Enhanced Motion Vector CNNs
Bowen Zhang, Limin Wang, Zhe Wang, Yu Qiao, Hanli Wang
CVPR 2016.
[Paper] [Project]
Powered by Jekyll and Minimal Light theme.