[Google Scholar] [Semantic Scholar] [DBLP]

Gemini Team, Google. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. Technical Report, 2024.
Part of Core Contributors within Multimodal-Vision.
[arXiv][pdf][blog]

Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut. PaLI-X: On Scaling up a Multilingual Vision and Language Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[arXiv]

Gemini Team, Google. Gemini: A Family of Highly Capable Multimodal Models. Technical Report, 2023.
Part of Core Contributors within Multimodal-Vision.
[arXiv][pdf][blog][intro page]

Soravit Changpinyo, Linting Xue, Michal Yarom, Ashish V. Thapliyal, Idan Szpektor, Julien Amelot, Xi Chen, Radu Soricut. MaXM: Towards Multilingual Visual Question Answering. In Findings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing (EMNLP-Findings), 2023.
[arXiv][project page with data]

Yang Chen, Hexiang Hu, Yi Luan, Haitian Sun, Soravit Changpinyo, Alan Ritter, Ming-Wei Chang. Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
[arXiv][project page with data][related benchmark]

Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, Eran Ofek, Idan Szpektor. What You See is What You Read? Improving Text-Image Alignment Evaluation. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
[arXiv][project page][code and data]

Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut. PreSTU: Pre-Training for Scene-Text Understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
[arXiv]

Paul Voigtlaender, Soravit Changpinyo, Jordi Pont-Tuset, Radu Soricut, Vittorio Ferrari. Connecting Vision and Language with Video Localized Narratives. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. (Highlight, top 2.5%)
[arXiv][project page with data]

Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani. MetaCLUE: Towards Comprehensive Visual Metaphors Research. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[arXiv][project page]

Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut. PaLI: A Jointly-Scaled Multilingual Language-Image Model. In International Conference on Learning Representations (ICLR), 2023. (Oral presentation, Notable top 5%)
[arXiv][openreview][blog]

Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong. 2.5D Visual Relationship Detection. In Computer Vision and Image Understanding (CVIU), 2022.
[arXiv][project page with data]

Khyathi Raghavi Chandu, Piyush Sharma, Soravit Changpinyo, Ashish Thapliyal, Radu Soricut. Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models. In Proceedings of the International Conference on Computational Linguistics (COLING), 2022.
[arXiv]

Nan Ding, Xi Chen, Tomer Levinboim, Soravit Changpinyo, Radu Soricut. PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks. In Proceedings of the European Conference on Computer Vision (ECCV), 2022. (Oral presentation)
[arXiv][code]

Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut. All You May Need for VQA are Image Captions. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022. (Oral presentation)
[arXiv][aclanthology][project page with data][blog][presentation]

Arjun Akula, Varun Jampani, Soravit Changpinyo, Song-Chun Zhu. Robust Visual Reasoning via Language Guided Neural Module Networks. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
[pdf]

Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao. On Model Calibration for Long-Tailed Object Detection and Instance Segmentation. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
[arXiv][code]

Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, Radu Soricut. CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. (Oral presentation)
[aclanthology][pdf]

Soravit Changpinyo, Jordi Pont-Tuset, Vittorio Ferrari, Radu Soricut. Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
[arXiv][trace processing colab]

Cheng Zhang, Tai-Yu Pan, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao. MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
[arXiv][code]

Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[arXiv][project page with data]

Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari. Connecting Vision and Language with Localized Narratives. In Proceedings of the European Conference on Computer Vision (ECCV), 2020. (Spotlight presentation)
[arXiv][project page with data]

Ben McCamish, Vahid Ghadakchi, Arash Termehchy, Behrouz Touri, Eduardo Cotilla-Sanchez, Liang Huang, Soravit Changpinyo. A Game-theoretic Approach to Data Interaction. In ACM Transactions on Database Systems (TODS), 2020.
[pdf][acm]

Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, Fei Sha. Classifier and Exemplar Synthesis for Zero-Shot Learning. In International Journal of Computer Vision (IJCV), 2020.
[arXiv][springer][code]

Soravit Changpinyo, Bo Pang, Piyush Sharma, Radu Soricut. Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
[pdf][supp][arXiv][aclanthology][poster]
Also presented at The 3rd Workshop on Closing the Loop Between Vision and Language, International Conference on Computer Vision (ICCV 2019 CLVL). [spotlight slides]

Soravit Changpinyo, Hexiang Hu, Fei Sha. Multi-Task Learning for Sequence Tagging: An Empirical Study. In Proceedings of the International Conference on Computational Linguistics (COLING), 2018.
[pdf][supp][arXiv][poster][code]

Soravit Changpinyo, Wei-Lun Chao, Fei Sha. Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.
[pdf][supp][arXiv][poster][code]

Soravit Changpinyo, Mark Sandler, Andrey Zhmoginov. The Power of Sparsity in Convolutional Neural Networks. 2017.
[arXiv]

Wei-Lun Chao, Soravit Changpinyo, Boqing Gong, Fei Sha. An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild. In Proceedings of the European Conference on Computer Vision (ECCV), 2016. (Spotlight presentation)
[pdf][supp][arXiv][poster][spotlight slides][code]

Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, Fei Sha. Synthesized Classifiers for Zero-Shot Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. (Oral presentation)
[pdf][supp][arXiv][poster][slides][code]

Soravit Changpinyo, Kuan Liu, Fei Sha. Similarity Component Analysis. In Advances in Neural Information Processing Systems (NIPS), 2013.
[pdf][supp][poster][code]

Theses

Soravit Changpinyo. Learning Image Attributes using the Indian Buffet Process. Undergraduate Honors Thesis. Brown University, 2012.
[pdf]