Montreal

2016 IEEE Workshop on Multimedia Signal Processing (MMSP 2016)

21-23 September 2016, Montreal, Canada

Keynote and Industry Speakers

Prof. Shih-Fu Chang Dr. Phil Chou Prof. MinWu
Prof. Shih-Fu Chang
Columbia University
[MORE]
Dr. Phil Chou
Microsoft Research
[MORE]
Prof. Min Wu
University of Maryland
[MORE]
Dr. Poppy Crum Dr. Brian Kingsbury
Dr. Poppy Crum
Dolby Laboratoties
[MORE]
Dr. Brian Kingsbury
IBM Watson Group
[MORE]

Prof. Shih-Fu Chang

New Frontiers of Large Scale Multimedia Information Retrieval

Multimedia information retrieval aims to automatically extract useful information from large collection of images, videos, and combinations with other data like text and speech. As reported in recent news, it’s now possible to search information over millions or more of products with just an example image on the mobile phone. Intelligent apps are being deployed by major companies to automatically generate keywords or even captions of an image at a sophistication level that could not be imagined before. In this talk, I will review core technologies involved and discuss challenges and opportunities ahead. First, to address the complexity bottleneck when scaling up the data size, I will present extremely compact hash codes and deep learning image classification models that can reduce complexity by orders of magnitude while preserving approximate accuracy. Second, to support easy extension of recognition systems to new domains, instead of relying on fixed image categories, we introduce a new paradigm to automatically discover unique multimodal concepts and structures using large amounts of multimedia data available. Last, to support emerging applications beyond basic image categorization, I will discuss on-going efforts in understanding how images are used in expressing sentiments and emotions in online social media and how languages/cultures may influence such online multimedia communication.

Biography:

Shih-Fu Chang is the Sr. Executive Vice Dean and the Richard Dicker Professor of Columbia Engineering. His research is focused on multimedia information retrieval, computer vision, machine learning, and signal processing, with the goal to turn unstructured multimedia data into searchable information. His work on content-based visual search in the early 90's, VisualSEEk and VideoQ, set the foundation of this vibrant area. Over the years, he continued to create innovative techniques for image/video recognition, multimodal analysis, visual information ontology, image authentication, and compact hashing for large-scale image databases. For his long-term pioneering contributions, he has been awarded the IEEE Signal Processing Society Technical Achievement Award, ACM Multimedia SIG Technical Achievement Award, Honorary Doctorate from the University of Amsterdam, the IEEE Kiyo Tomiyasu Award, and IBM Faculty Award. For his dedicated contributions to education, he received the Great Teacher Award from the Society of Columbia Graduates. He served as Chair of Columbia Electrical Engineering Department (2007-2010), the Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), and advisor for several research institutions and companies. In his current capacity in Columbia Engineering, he plays a key role in the School's strategic planning, special research initiatives, international collaboration, and faculty development. He is a Fellow of the American Association for the Advancement of Science (AAAS) and IEEE.

Dr. Phil Chou

Telepresence: From Virtual to Reality - A Reprise

Six years ago, at the time of my last talk at MMSP, Immersive Telepresence was soon to become a reality. This year, it can be claimed that Immersive Telepresence has already arrived, and it is in the form of Augmented and Virtual Reality. However, as of yet there is no commonly agreed upon paradigm for its format, never mind how to code it. In this talk, I will discuss candidates for how to represent AR/VR, and how to code it. Graph Signal Processing emerges as an important toolset for representing, coding, and processing AR/VR

Biography:

Philip A. Chou  is a Principal Researcher at Microsoft Research. He received a BSE from Princeton, MS from Berkeley, and PhD from Stanford. He was a Member of Technical Staff both at AT&T Bell Laboratories and the Xerox Palo Alto Research Center. He was Affiliate Faculty at Stanford University, the University of Washington, and the Chinese University of Hong Kong. He was Manager at the startup VXtreme (later acquired by Microsoft) where led the compression team for the first commercial video on demand over the Internet. Key ideas in video compression today have come from his work, including rate-distortion optimization, multiple reference frame coding, tree-structured coding, and multiple bit rate streaming. He was Technical Program Co-Chair of MMSP’09 and Chair of the MMSP TC 2010-2011. He is a Fellow of the IEEE.

Prof. Min Wu

When Power Meets Multimedia

The R&D on power grid and multimedia signal processing did not seem to cross paths, until recently. An emerging line of research exploits novel signatures induced by the power network to answer intriguing questions about the time, location, and integrity of multimedia recordings and provide evidence and trust in journalism, crime solving, infrastructure monitoring, and other informational operations.

Owing to the dynamic control process to match the electricity supplies with the demands in the grid, the instantaneous electric network frequency (ENF) exhibits small random-like fluctuations. It forms signatures reflecting the attributes and conditions of the power grid, which become naturally “embedded” into various types of sensing signals. These signatures carry time and location information and may facilitate the integrity verification of the primary sensing data.

This talk will provide an overview of recent research on ENF carried out by our Media and Security Team (MAST) at University of Maryland for a variety of media applications, and discuss on-going and open research issues.

Biography:

Min Wu is a Professor of Electrical and Computer Engineering and a Distinguished Scholar-Teacher at the University of Maryland, College Park. She received her Ph.D. degree in electrical engineering from Princeton University in 2001. At UMD, she leads the Media and Security Team (MAST), with main research interests on information security and forensics and multimedia signal processing. Her research and education have been recognized by a NSF CAREER award, a TR100 Young Innovator Award from the MIT Technology Review Magazine, an ONR Young Investigator Award, a Computer World "40 Under 40" IT Innovator Award, a University of Maryland Invention of the Year Award, an IEEE Mac Van Valkenburg Early Career Early Career Teaching Award, and several paper awards from IEEE SPS, ACM, and EURASIP. She was elected IEEE Fellow for contributions to multimedia security and forensics. Dr. Wu chaired the IEEE Technical Committee on Information Forensics and Security (2012-2013), and has served as Vice President--Finance of the IEEE Signal Processing Society (2010-2012) and Founding Chief Editor of the IEEE SigPort initiative (2013-2014). Currently, she is serving as Editor-in-Chief (2015-2017) of the IEEE Signal Processing Magazine and an IEEE Distinguished Lecturer.

Dr. Poppy Crum

Data Rich Optimization of our Technologies – Let’s Make it Personal

Biography

Poppy Crum is Head Scientist at Dolby Laboratories and a Consulting Professor at Stanford University in the Center for Computer Research in Music and Acoustics and the Program in Symbolic Systems. At Dolby, Poppy directs the growth of internal science. She is responsible for integrating neuroscience and sensory data science into algorithm design, technological development, and technology strategy. At Stanford, Poppy's work focuses on the impact and feedback potential of new technologies with gaming and immersive environments on neuroplasticity. Poppy is also a U.S. representative to the International Telecommunication Union (ITU), and was a fellow of the US Defense Science Research Council. Prior to joining Dolby Laboratories Poppy was Research Faculty in the Department of Biomedical Engineering at Johns Hopkins School of Medicine where her research focused on the functional circuitry of the auditory cortex. Poppy is a Fellow of the Audio Engineering Society. She completed her: Post-Doctoral work at Johns Hopkins Medical School in Biomedical Engineering; PhD at UC Berkeley in Neuroscience/Psychology; M.A at McGill University in Experimental Psychology, and B.Mus at the University of Iowa in Violin Performance.

Dr. Brian Kingsbury

Low-Resource Speech Processing in Many Languages

Speech technology has become ubiquitous: people rely on speech-enabled assistants and voice search to access information using their mobile phones, and companies rely on speech interfaces to handle routine customer service interactions. A key enabler behind this is the availability of vast amounts of labeled and unlabeled speech and text data that can be used to train speech models. However, with thousands of languages in the world that we would like to process automatically, it is not realistic to count on having access to thousands of hours of speech and billions of words of text in all of them. How does one proceed when developing a speech processing system under a "small data" constraint? In this talk, I will describe techniques for developing useful audio keyword search and speech-to-text systems for any language, given only limited amounts of target-language training data, and a limited amount of development time. The work I will describe was performed by IBM and its academic partners under the IARPA Babel program.

Biography

Brian Kingsbury is a research scientist in the IBM Watson Group. He earned a BS in electrical engineering from Michigan State University and a PhD in computer science from the University of California, Berkeley. His research interests include deep learning, large-vocabulary speech transcription, and keyword search. He is currently co-PI and technical lead for LORELEI, an IBM-led consortium participating in the IARPA Babel program, which is focused on the rapid development of audio search in many languages with limited resources. Brian has contributed to IBM's entries in numerous competitive evaluations of speech technology, including Switchboard, SPINE, EARS, Spoken Term Detection, and GALE. He has served as as a member of the Speech and Language Technical Committee of the IEEE Signal Processing Society (2009-2011); as an ICASSP speech area chair (2010-2012); an associate editor for IEEE Transactions on Audio, Speech, and Language Processing (2012-2016); and as a program chair for the International Conference on Representation Learning (2014-2016). He is a senior member of the IEEE.