COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > CAPE Advanced Technology Lecture Series > DetClip: Scalable Open-Vocabulary Object Detection via Fine-grained Visual-language Alignment
DetClip: Scalable Open-Vocabulary Object Detection via Fine-grained Visual-language AlignmentAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Dr Mark Leadbeater. For online attendance register at: https://eng-cam.zoom.us/j/81023462406 Abstract We will present efficient and scalable training framework that incorporates large-scale image-text pairs to achieve open-vocabulary object detection (OVD). Unlike previous OVD frameworks that typically rely on a pre-trained vision-language model (e.g., CLIP ) or exploit image-text pairs via a pseudo labelling process, DetCLIP directly learns the fine-grained word-region alignment from massive image-text pairs in an end-to-end manner. We employ a maximum word-region similarity between region proposals and textual words to guide the contrastive objective. To enable the model to gain localization capability while learning broad concepts, DetCLIP is trained with a hybrid supervision from detection, grounding and image-text pair data under a unified data formulation. By jointly training with an alternating scheme and adopting low-resolution input for image-text pairs, DetCLIP exploits image-text pair data efficiently and effectively. BiographyDr Wei ZHang joined Huawei in 2012. Before that, he was an assistant researcher in Shenzhen Institute of Advanced Technology Chinese Academy of Sciences and in The Chinese University of Hong Kong (CUHK). He received his Ph.D. degree in computer science from CUHK in 2010, his MS degree from Tsinghua University in 2005 and his B.S. from Nankai University in 2002. Co-organizer of the “Self-supervised Learning for Next-Generation Industry-level Autonomous Driving” workshop at ECCV 2022 and ICCV 2021 . This talk is part of the CAPE Advanced Technology Lecture Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsJ M Keynes Fellowship Fund Lectures Imagine2027 Chemistry Departmental-wide lecturesOther talksUncovering the role of regulatory T cells in tissue regeneration Introduction to deterministic multiple scattering Networking Reception with Interactive Session (Chair: Timandra Harkness) Europe's cartographic 'Arcticulation' of the North: The use of maps in official European and national Arctic policies. 'Mere Spectacle for Idle Moments...' On the Origins of the Debate about Visual Embellishment in Graphical Display A simple introduction to multiple scattering and homogenization |