作為未來互聯(lián)網(wǎng)3.0的主要應(yīng)用場景,,元字審成為目前包括IT領(lǐng)域在內(nèi)很多應(yīng)用的熱點話題,。報告從基本的數(shù)據(jù)概念講起,,重點結(jié)合講者主持的國家重點研究計劃項目的研發(fā)進展,對目前元宇宙的一些機會和發(fā)展現(xiàn)狀,,提出了自己的一些理解和觀點,,進而針對工業(yè)互聯(lián)網(wǎng)未來的應(yīng)用需求,介紹了工業(yè)元宇宙的相關(guān)技術(shù)及發(fā)展趨勢,,進而討論了智能技術(shù)在工業(yè)領(lǐng)域更多場景的落地應(yīng)用,。
Video Moment Retrieval (VMR) aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video. Connecting computer vision and natural language, VMR has drawn significant attention from researchers in both communities. The existing solutions for this problem can be roughly divided into two categories based on whether candidate moments are generated: Moment-based approach and Clip-based approach. Both frameworks have respective shortcomings: the moment-based models suffer from heavy computations, while the performance of clip-based models is familiarly inferior to moment-based counterparts. To this end. we design an intuitive and efficient Dual-Channel Localization Network (DCLN) to balance computational cost and retrieval performance. Meanwhile, despite their effectiveness, Moment-based and Clip-based methods mostly focus only on aligning the query and single-level chip or moment features, and ignore the different granularities involved in the video itself, such as clip, moment, or video, resulting in insufficient cross-modal interaction. To this end, we also propose a Temporal Localization Network with Hierarchical Contrastive Learning (HCLNet) for the VMR task. This report will detail these two works and also share our deeper insights.
主辦:CCF
承辦:CCF協(xié)同計算專業(yè)委員會,、太原科技大學(xué)