Year: | 2023 |2022 | 2021 | 2020 | 2019 | 2018 | 2017 |2016 | 2015 | 2014 | 2013 | 2012
Content note: The SCIEN colloquium series is available exclusively to SCIEN members and Stanford students. Please log in or visit the SCIEN site for information regarding membership.
SCIEN Colloquia 2022
Jung-Hoon Park
(Ulsan National Institute of Science and Technology (UNIST))
"Shaping Light for Bio-Imaging"

Date: May 31, 2023
Description:
As we can only see the intensity of light, it is not easy to see the importance of the phase of light in our everyday lives. Here, we will describe our recent developments in controlling the wavefront, or in other words, the phase of light to realize advanced optical systems. First, we demonstrate that controlling the wavefront of light can enable high resolution imaging inside aberrating deep tissue and realize functional imaging of brain dynamics in live mice. We also demonstrate that hardware based wavefront shaping is not the only solution and describe a computational approach to image through dynamic turbid media based on the principles of speckle interferometry. In a different perspective, we demonstrate how we can shape the illumination beam to enable different imaging characteristics for different regions of interests within the FOV simultaneously. For example, by illuminating selective sinusoidal illumination patterns, we realize tunable SIM, a novel dynamic adaptive structured illumination microscope, which realizes varying spatiotemporal resolutions across the FOV that can adapt to the user’s needs. Tunable SIM is realized using a DMD and the simplicity of the setup will be ideal for usability and broad dissemination. We further discuss how we can use custom illumination to enable stroboscopic, high dynamic range, or selective depth excitation imaging for advanced bio imaging applications.
Further Information:
Jung-Hoon is an Associate Professor at the Department of Biomedical Engineering at UNIST. He received his Ph.D in Physics at KAIST and conducted postdoctoral research studies at Janelia Research Campus. His research interests focus on building novel optical systems and computational methods to enable high resolution, high speed deep tissue imaging, especially for the brain.
https://bme.unist.ac.kr/portfolio/faculty/faculty_jung-hoon-park/
Francois Chaubard
(Focal Systems)
“Lesson in Productizing Deep Learning Computer Vision: Automating Brick & Mortar Retail with State of the Art Computer Vision”

Date: May 24, 2023
Description:
We will discuss the journey I have been on for the last 8 years trying to productize deep learning computer vision. Focal Systems takes the same concepts and AI used to power self-driving cars to automate and optimize Brick and Mortar Retail, which represents 26% of US GDP. We call this the “Self-Driving Store”. Focal does this by deploying hundreds of tiny, inexpensive shelf cameras that digitize retail shelves hourly with the most accurate computer vision on the market, then FocalOS uses that data to automate many in-store processes, like order writing, out of stock scanning, planogram creation, schedule writing, labor management, and more, resulting in 30-100% EBITDA growth.
Further Information:
Francois Chaubard is the CEO of Focal Systems, a deep learning computer vision company focused on automating brick and mortar retail. Prior to founding Focal Systems, Mr. Chaubard worked at Apple as a deep learning researcher on secretive projects. Before that, he attended Stanford University as a dual masters in CS and EE where he researched deep learning and computer vision under Fei Fei Li, the Director of the Stanford Artificial Intelligence Lab at Stanford University. Before that, he was a missile guidance algorithm researcher at Lockheed Martin on the U.S. AEGIS Ballistic Missile Defense Program. Before that, he received a bachelors from University of Delaware in Mathematics/Mechanical Engineering.
Christoph Lueze
(Stanford)
“Headset design choices for augmented reality applications in industry and medicine”

Date: May 17, 2023
Description:
Although there is currently no Augmented Reality (AR) headset that possesses all the desired specifications, such as high resolution, wide field of view, spacious eyebox, powerful sensing capabilities, compact size, and extended battery life, customized design choices can still be made to meet the specific needs of users in various applications. In this talk I will present our work on AR applications in medicine for guidance of medical procedures and in industry for guidance of frontline workers. Besides the different use-cases, I will focus on the users, how they use the headset and what headset features are important for the shown scenarios. Applications that focus on the real world require very different sensor and display specifications than applications that focus on the virtual content. And an application where a surgeon is treating a patient and errors have critical consequences has very different requirements than a virtual training application for students. By discussing these points I will then explain the reasons for our desired headset choices and recommended design choices for future headset generations. After the talk in-person attendess will have the opportunity to try our software demo on the Hololens 2 where we leverage sensor data to make creation of virtual instructors as easy as taking a video.
Further Information:
Dr. Christoph Leuze is director of the Visualization Core at the Stanford Wu Tsai Institute where his research focuses on virtual and augmented reality technologies for medical applications, and founder of Nakamir, a startup creating Augmented Reality training and guidance solutions for frontline workers. He has published one of the first virtual reality viewers for MRI data where over 40,000 users have been taken through a virtual tour through his own brain. He has received multiple prizes for his work in Augmented Reality including the IEEE VR People’s choice award for the best AR demo, the TechConnect Award for one of the most promising technological innovations for national security and the prize for the best 3D video at the Ars Electronica Art and Science Festival. Dr. Leuze has studied at Leipzig University, Germany and Chiba University, Japan and received the Otto Hahn medal of the Max Planck Society for his PhD thesis at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig.
Rajesh Menon
(University of Utah)
“Non-anthropocentric Imaging with & without optics”

Please LOG IN to view the video.
Date: May 10, 2023
Description:
Imaging that is not constrained by human perception could be advantageous for enhanced privacy, to enable low power, persistent applications, and to improve inferencing by exploiting properties of light that are unavailable to humans (eg. Spectrum, polarization, etc.). By co-optimizing the imager with subsequent image processing, we showcase 3 examples: (1) snapshot hyper-spectral imaging and inferencing; (2) snapshot deep-brain fluorescence microscopy; and (3) optics-free imaging and inferencing. New modalities for signal recording, optics enhanced by nanomanufacturing, and advanced computational capabilities promise exciting new opportunities.
Further Information:
Rajesh Menon combines his expertise in nanofabrication, computation and optical engineering to impact several fields including inverse-designed photonics, flat lenses and unconventional imaging. Rajesh is a Fellow of the OSA, the SPIE, and Senior Member of the IEEE. Among his other honors are a NASA Early-Stage-Innovations Award, NSF CAREER Award and the International Commission for Optics (ICO) Prize. Rajesh currently directs the Laboratory for Optical Nanotechnologies at the University of Utah. He received S.M. and Ph.D. degrees from MIT.
Photo: https://nanoptics.wordpress.com/who/#jp-carousel-150
Na Ji
(University of California at Berkeley)
“Imaging the brain at high spatiotemporal resolution”

Please LOG IN to view the video.
Date: May 3, 2023
Description:
Neuroscience aims to understand the brain, an organ that distinguishes humans as a species, defines us as individuals, and provides the intellectual power with which we explore the universe. Composed of electrically excitable cells called neurons, the brain continuously receives and analyzes information, makes decisions and controls actions. Similar to systems studied in physics, where many properties emerge from the interactions of their components, the functions of the brain arise from the interactions of neurons. The fundamental computational units of the brain, neurons communicate with one another electrochemically via submicron structures called synapses. Synapsing onto one another, neurons form circuits and networks, sometimes spanning centimeters in dimension and specializing in different mental functions. To understand the brain mechanistically, we need methods that can monitor the physiological processes of single synapses as well as the activities of a large number of networked neurons. Using concepts developed in astronomy and optics, my laboratory develops optical microscopy methods for imaging the brain at higher resolution, greater depth, and faster time scales. In this talk, I will outline our past and ongoing research efforts.
https://www.jilab.net/research
Further Information:
Na Ji received her B.S. in Chemical Physics from the University of Science & Technology of China in 2000. She received her Ph.D. in Chemistry from Berkeley in 2005. She started working as a postdoctoral fellow at Janelia Research Campus, Howard Hughes Medical Institute in 2006, before becoming a Group Leader there in 2011. She returned to Berkeley and joined the Physics and Molecular & Cell Biology Departments in 2016. She become a full-time faculty in summer 2017.
Ioannis Gkioulekas
(Carnegie Mellon University)
“Interferometric computational imaging”

Please LOG IN to view the video.
Date: April 26, 2023
Description:
Imaging systems typically accumulate photons that, as they travel from a light source to a camera, follow multiple different paths and interact with several scene objects. This multi-path accumulation process confounds the information that is available in captured images about the scene, such as geometry and material properties. Computational light transport techniques help overcome this multi-path confounding problem, by enabling imaging systems to selectively accumulate only photons that are informative for any given imaging task. Unfortunately, and despite a proliferation of such techniques in the last two decades, they are constrained to operate only under macroscopic settings. This places them out of reach for critical applications requiring microscopic resolutions.
In this talk, I will go over recent progress towards overcoming these limitations, by developing new interferometric imaging techniques for computational light transport. I will show developments on theory, algorithms, and hardware that enable new interferometric imaging systems with the full range of computational light transport capabilities. I will explain how these systems help overcome constraints traditionally associated with interferometric imaging, such as the need to operate in highly controlled lab environments, under very long acquisition times, and using specialized light sources. Lastly, I will show results related to applications such as medical imaging, industrial fabrication, and inspection of critical parts.
Further Information:
Ioannis Gkioulekas is an Assistant Professor at the Robotics Institute, Carnegie Mellon University (CMU). He is a Sloan Research Fellow, and a recipient of the NSF CAREER Award and the Best Paper Award at CVPR 2019. He has PhD and MS degrees from Harvard University, where he was advised by Todd Zickler, and a Diploma from the National Technical University of Athens, where he was advised by Petros Maragos. His works broadly on computer vision, computer graphics, and computational imaging, with a focus on problems including non-line-of-sight imaging, tissue imaging, interferometric imaging systems, physically based rendering, and differentiable rendering.
https://www.cs.cmu.edu/~igkioule/ (personal), https://imaging.cs.cmu.edu/ (lab).
Yuhao Zhu
(University of Rochester)
“Harnessing the Computer Science-Vision Science Symbiosis”

Please LOG IN to view the video.
Date: April 19, 2023
Description:
Emerging platforms such as Augmented Reality (AR), Virtual Reality (VR), and autonomous machines all intimately interact with humans. They must be built, from the ground up, with principled considerations of human perception. This talk will discuss some of our recent work on exploiting the symbiosis between Computer Science (CS) and Vision Science (VS). On the VS for CS front, we will discuss how to jointly optimize imaging, computing, and human perception to obtain unprecedented efficiency on AR/VR and automotive systems. We show that a computing problem that seems challenging may become significantly easier when one considers how computing interacts with imaging and human perception in an end-to-end system. On the CS for VS front, we will discuss our preliminary work on using VR to help the color blind regain 3D color vision and a programming system that helps vision scientists write bug-free color manipulation programs.
Further Information:
Yuhao Zhu is an Assistant Professor of Computer Science at University of Rochester. He holds a Ph.D. from The University of Texas at Austin and was a visiting researcher at Harvard University and Arm Research. His research group focuses on applications, algorithms, and systems for visual computing. His work is recognized by the Honorable Mention of the 2018 ACM SIGARCH/IEEE-CS TCCA Outstanding Dissertation Award, multiple IEEE Micro Top Picks designations, and multiple best paper awards/nominations in computer architecture, Virtual Reality, and visualization.
More about his research can be found at: http://www.horizon-lab.org/.
Yuhao Zhu
(University of Rochester)
“Understanding, Modeling, and Improving Energy Efficiency of Computational Image Sensors”

Please LOG IN to view the video.
Date: April 19, 2023
Description:
Imaging and computing, which acquire and interpret visual data, respectively, are traditionally designed in isolation and simply stitched together in a system, resulting in a sub-optimal whole. This talk discusses how we must rethink the imaging-computing interface and co-design the two to deliver significant efficiency gains.
In particular, I will focus on the paradigm of Computational CMOS Image Sensors (CCIS), where image sensors are equipped with compute capabilities to unlock new applications and reduce energy consumption. Unleashing the power of CCIS, however, requires making a myriad of interlocked design decisions, e.g., computing inside vs. off sensors, 2D vs. 3D stacking, analog vs. digital computing. I will describe a framework we recently developed, validated against real silicon, that models energy consumption of CCIS and, thus, empowers designers to explore architectural design decisions at an early stage. I will describe a few concrete use-cases of our framework in the context of AR/VR. If time permits, I will conclude by discussing some of our recent efforts to co-design optics, image sensors, and machine vision algorithms.
Further Information:
Yuhao Zhu is an Assistant Professor of Computer Science at University of Rochester. He holds a Ph.D. from The University of Texas at Austin and was a visiting researcher at Harvard University and Arm Research. His research group focuses on applications, algorithms, and systems for visual computing. His work is recognized by the Honorable Mention of the 2018 ACM SIGARCH/IEEE-CS TCCA Outstanding Dissertation Award, multiple IEEE Micro Top Picks designations, and multiple best paper awards/nominations in computer architecture, Virtual Reality, and visualization.
More about his research can be found at: http://www.horizon-lab.org/.
Ellen Zhong
(Princeton University)
Neural fields for structural biology

Please LOG IN to view the video.
Date: April 12, 2023
Description:
Major technological advances in cryo-electron microscopy (cryo-EM) have produced new opportunities to study the structure and dynamics of proteins and other biomolecular complexes. However, this structural heterogeneity complicates the algorithmic task of 3D reconstruction from the collected dataset of 2D cryo-EM images. In this seminar, I will overview cryoDRGN, an algorithm that leverages the representation power of deep neural networks to reconstruct continuous distributions of 3D density maps. Underpinning the cryoDRGN method is a deep generative model parameterized by a new neural representation of 3D volumes and a learning algorithm to optimize this representation from unlabeled 2D cryo-EM images. Extended to real datasets and released as an open-source tool, cryoDRGN has been used to discover new protein structures and visualize continuous trajectories of their motion. I will discuss various extensions of the method for broadening the scope of cryo-EM to new classes of dynamic protein complexes and analyzing the learned generative model. CryoDRGN is open-source software freely available at http://cryodrgn.cs.princeton.edu.
Further Information:
Ellen Zhong is an Assistant Professor of Computer Science at Princeton University. Her research interests lie at the intersection of AI and biology with a focus on structural biology and image analysis algorithms for cryo-electron microscopy (cryo-EM). She is the creator of cryoDRGN, a neural method for 3D reconstruction of dynamic protein structures from cryo-EM images. She has interned at DeepMind with the AlphaFold team and previously worked on molecular dynamics algorithms and infrastructure for drug discovery applications at D. E. Shaw Research. She obtained her Ph.D. from MIT in 2022 before joining the Princeton faculty, and her B.S. from the University of Virginia in 2014.
Changhuei Yang
(California Institute of Technology)
Computation in microscopy: How computers are changing the way we build and use microscopes

Please LOG IN to view the video.
Date: April 5, 2023
Description:
The level of computational power we can currently access, has significantly changed the way we think about, process and interact with microscopy images. In this talk, I will discuss some of our recent computational microscopy and deep learning work, that showcase some of these shifts in the context of pathology and life science research. I will talk about Fourier Ptychographic Microscopy (FPM) – a novel way to collect and process microscopy data that is capable of zeroing out physical aberrations from microscopy images. As a novel way to collect and process microscopy data, FPM can also bring significant workflow advantages to pathology. I will also talk about the use of Deep Learning in image analysis, and point out some of novel and impactful ways it is changing how we deal with image data in pathology and life science research. These surprising findings strongly indicate the need for the redesign of physical microscope systems to enable the next level of AI based image analysis. I will briefly touch on the shape and form that these redesigns might look like.
Further Information:
Changhuei Yang is the Thomas G. Myers Professor of Electrical Engineering, Bioengineering and Medical Engineering and a Heritage Medical Research Institute Principal Investigator at the California Institute of Technology. He works in the area of biophotonics and computational imaging. His research team has developed numerous novel biomedical imaging technologies over the past 2 decades – including technologies for focusing light deeply into animals using time-reversal optical methods, lensless microscopy, ePetri, Fourier Ptychography, and non-invasive brain activity monitoring methods. He has worked with major companies, including BioRad, Amgen and Micron-Aptina, to develop solutions for their technological challenges.
He has received the NSF Career Award, the Coulter Foundation Early Career Phase I and II Awards, and the NIH Director’s New Innovator Award. In 2008 he was named one of Discover Magazine’s ‘20 Best Brains Under 40’. He is a Coulter Fellow, an AIMBE Fellow and an OSA Fellow. He was elected as a Fellow in the National Academy of Inventors in 2020.
Tali Treibitz
(University of Haifa)
Towards visual autonomy underwater

Please LOG IN to view the video.
Date: March 15, 2023
Description:
Visual input is considered unreliable underwater and therefore its current use in automatic analysis and decisions is limited. This influences the design of autonomous underwater vehicles which are usually equipped with multiple sensors which make them expensive and cumbersome. On the other hand, human divers often manage to complete complicated tasks using a very lean sensor suite that is heavily based on vision. We aim to bridge this gap and develop algorithms that can exploit the problematic visual underwater input, and systems that use this input for underwater tasks. In the talk I will present several methods we developed for enhancing the visibility underwater (published in CVPR 2019, 2022) and ongoing work on underwater autonomous vehicles using vision.
Further Information:
Prof. Tali Treibitz is heading the Viseaon marine imaging lab in the School of Marine Sciences in the University of Haifa since 2014. She received the BA degree in computer science and the PhD degree in electrical engineering from the Technion-Israel Institute of Technology in 2001 and 2010, respectively. Between 2010-2013 she has been a post-doctoral researcher in the department of computer science and engineering, in the University of California, San Diego and in the Marine Physical Lab in Scripps Institution of Oceanography. Her lab focuses on cutting edge research in underwater computer vision, scene, color and 3D reconstruction, automatic analysis of scenes, and autonomous decision making based on visual input. Based on the lab’s developments, she established a spin-off company, Seaerra-Vision ltd., that develops technologies for real-time underwater image enhancement.
Yichang Shih
(Google)
Mobile Computational Photography with Ultrawide and Telephoto Lens

Please LOG IN to view the video.
Date: March 1, 2023
Description:
The rapid proliferation of hybrid zoom cameras, i.e. a main camera equipped with auxiliary cameras, such as ultrawide, telephoto camera, or both, on smartphone cameras have brought new challenges and opportunities to mobile computational photography. Existing algorithms in this domain focus on the single main camera and give little attention to auxiliary cameras. In this talk, I will share how Google Pixel uses computational photography and ML to resolve lens distortion common on ultrawide lenses, and achieve motion deblur and super-resolution using dual camera fusion from hybrid-optics systems. We will also discuss open questions, industrial trends, and research ideas in mobile computational photography.
Further Information:
Yichang Shih is a Senior Staff Software Engineer at Google. He joined Google in 2017, and now leads a team developing computational photography and ML algorithms for Google Pixel Camera features. Prior to joining Google, he was a research scientist at Light.Co since 2015, working on multi-camera fusion over 16 cameras on mobile devices. Yichang’s research interests span across computational photography, computer vision, and machine learning, focusing on imaging and video enhancements. He received PhD in Computer Science from MIT CSAIL in 2015, under the supervision of Professor Fredo Durand and Bill Freeman. Prior to PhD, he received his Bachelor of Electrical Engineering from National Taiwan University in 2009.
More info about Yichang: https://www.yichangshih.com/
Andrea Tagliasacchi
(Simon Fraser University and Google Brain (Toronto))
Towards mass-adoption of Neural Radiance Fields

Please LOG IN to view the video.
Date: February 22, 2023
Description:
Neural 3D scene representations have had a significant impact on computer vision, seemingly freeing deep learning from the shackles of large and curated 3D datasets. However, many of these techniques still have strong assumptions that make them challenging to build and consume for the average user. During this talk, I will question some of these assumptions. Specifically, we will remove the requirement for multiple calibrated images of the same scene (LoLNeRF), eliminate the necessity for the scene to be entirely static during capture (RobustNeRF), and enable the inspection of these models using consumer-grade mobile devices, rather than relying on high-end GPUs (MobileNeRF).
Further Information:
Andrea Tagliasacchi is an associate professor in the at Simon Fraser University (Vancouver, Canada) where he holds the appointment of “visual computing research chair” within the school of computing science. He is also a part-time (20%) staff research scientist at Google Brain (Toronto), as well as an associate professor (status only) in the computer science department at the University of Toronto. Before joining SFU, he spent four wonderful years as a full-time researcher at Google (mentored by Paul Lalonde, Geoffrey Hinton, and David Fleet). Before joining Google, he was an assistant professor at the University of Victoria (2015-2017), where he held the Industrial Research Chair in 3D Sensing (jointly sponsored by Google and Intel). His alma mater include EPFL (postdoc) SFU (PhD, NSERC Alexander Graham Bell fellow) and Politecnico di Milano (MSc, gold medalist). His research focuses on 3D visual perception, which lies at the intersection of computer vision, computer graphics and machine learning
Tali Dekel
(Weizmann Institute and Google)
Strong Interpretable Priors are All We Need

Please LOG IN to view the video.
Date: February 15, 2023
Description:
The field of computer vision is in the midst of a paradigm shift, moving from task-specific models to “foundation models” – huge neural networks trained on massive amounts of unlabeled data. These models are capable of learning powerful and rich representations of our visual world, as evident for example by the groundbreaking results in text-to-image generation. Nevertheless, current foundation models are still largely treated as black boxes – we do not fully understand the priors they encode, how they are internally represented, which limits their usage for downstream tasks. In this talk, I’ll dive into the internal representations learned by prominent models, and unveil new striking properties about the information they encode, using simple empirical analysis. I’ll demonstrate how to harness their power and unique properties through novel visual descriptors and perceptual losses, for a variety of visual tasks, including co-part segmentation, image correspondences, semantic appearance transfer and text-guided image and video editing. In all cases, the developed methodologies are lightweight, and require no additional training data other than the test example itself.
Further Information:
Tali Dekel is an Assistant Professor at the Mathematics and Computer Science Department at the Weizmann Institute, Israel. She is also a Staff Research Scientist at Google, developing algorithms at the intersection of computer vision, computer graphics, and machine learning. Before Google, she was a Postdoctoral Associate at the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT. Tali completed her Ph.D. studies at the school of electrical engineering, Tel-Aviv University, Israel. Her research interests include computational photography, image/video synthesis, geometry and 3D reconstruction. Her awards and honors include the National Postdoctoral Award for Advancing Women in Science (2014), the Rothschild Postdoctoral Fellowship (2015), the SAMSON – Prime Minister’s Researcher Recruitment Prize (2019), Best Paper Honorable Mention in CVPR 2019, and Best Paper Award (Marr Prize) in ICCV 2019. She often serves as program committee member and area chair of major vision and graphics conferences More information in: https://www.weizmann.ac.il/math/dekel/home
Ben Poole
(Google Brain)
2D priors for 3D generation

Please LOG IN to view the video.
Date: February 8, 2023
Description:
Large scale datasets of images with text descriptions have enabled powerful models that represent and generate pixels. But progress in 3D generation has been slow due to the lack of 3D data and efficient architectures. In this talk, I’ll present DreamFields and DreamFusion: two approaches that enable 3D generation from 2D priors using no 3D data. By turning 2D priors into loss functions, we can optimize 3D models (NeRFs) from scratch via gradient descent. These methods enable high-quality generation of 3D objects from diverse text prompts. Finally, I’ll discuss a fundamental problem with our approach and how new pixel-space priors like Imagen Video can unlock new 3D capabilities.
Further Information:
Ben Poole is a research scientist at Google Brain in San Francisco working on deep generative models for images, video, and 3D. He completed his PhD at Stanford University advised by Surya Ganguli in the Neural Dynamics and Computation lab. His thesis was on computational tools to develop a better understanding of both biological and artificial neural networks. He’s worked at DeepMind, Google Research, Intel Research Pittsburgh, and the NYU Center for Neural Science.
Zeev Zalevsky
(Bar-Ilan University)
Remote Photonic Medicine

Please LOG IN to view the video.
Date: February 1, 2023
Description:
I will present a photonic sensor that can be used for remote sensing of many biomedical parameters simultaneously and continuously. The technology is based upon illuminating a surface with a laser and then using an imaging camera to perform temporal and spatial tracking of secondary speckle patterns in order to have nano metric accurate estimation of the movement of the back reflecting surface. The capability of sensing those movements in nano-metric precision allows connecting the movement with remote bio-sensing and with medical diagnosis capabilities.
The proposed technology was already applied for remote and continuous estimation of vital bio-signs (such as heart beats, respiration, blood pulse pressure and intra ocular pressure), for molecular sensing of chemicals in the blood stream (such as for estimation of alcohol, glucose and lactate concentrations in blood stream, blood coagulation and oximetry) as well as for sensing of hemodynamic characteristics such as blood flow related to brain activity.
The sensor can be used for early diagnosis of diseases such as otitis, melanoma and breast cancer and lately it was tested in large scale clinical trials and provided highly efficient medical diagnosis capabilities for cardiopulmonary diseases. The capability of the sensor was also tested and verified in providing remote high-quality characterization of brain activity.
Further Information:
Zeev Zalevsky received his B.Sc. and direct Ph.D. degrees in electrical engineering from Tel-Aviv University in 1993 and 1996 respectively. Zeev is currently a full Professor and the Dean of the faculty of engineering in Bar-Ilan University, Israel. His major fields of research are optical super resolution, biomedical optics, nano-photonics and fiber-based processing and sensing architectures. Zeev has published more than 570 peer review papers, 340 proceeding papers, 9 books (6 authored and 3 as an editor), 32 book chapters and about 100 patents. Zeev gave about 620 conference presentations with more than 220 invited/keynote or plenary talks.
Zeev is a fellow of many large scientific societies such as SPIE, OSA, IEEE, EOS, IOP, IET, IS&T, ASLMS, AIMBE and more. He is also a fellow of the American National Academy of Inventors (NAI). For his work he received many national and international prizes such as the Krill prize, ICO prize and Abbe medal, SAOT prize, Juludan prize, Taubelnblatt prize, young investigator prize in nanotechnology, the International Wearable Technologies (WT) Innovation World Cup 2012 Prize, Image Engineering Innovation Award, NANOSMAT prize, SPIE startup challenge prize, SPIE prism award, IAAM Scientist Medal Award, International Photonic Award, Dr. Horace Furumoto Innovations Professional award, The Asian Advanced Materials Award, Edison Award, IEEE distinguished lecturer award, VEBLEO Scientist Award, Joseph Fraunhofer Award/Robert M. Burley Prize, Lotfi Zadeh Memorial Award, E&T Innovation Award, CES (Consumer Electronics Show) 2022 Innovation Awards, German Innovation Awards 2022, the Humboldt research prize, SPIE 2023 Chandra S. Vikram Award for Metrology and more.
Besides his academic research activity, Zeev is also very active in commercializing his inventions into start-up companies. Zeev was and is involved in technologically leading of more than 10 startup companies.
Dan McGrath
(Senior Consultant)
Insider’s View on Pixel Design

Date: January 25, 2023
Description:
The success of solid state image sensors has been the cost-effective integrating mega-arrays of transducers into the design flow and manufacturing process that has been the basis of the success of integrated circuits in our industry, This talk will provide from a front-line designer’s perspective key challenges that have been overcome and that still exist to enable this: device physics, integration, manufacturing, meeting customer expectations.
Further Information:
Dan McGrath has worked for over 40 years specializing in the device physics of pixels, both CCD and CIS, and in the integration of image-sensor process enhancements in the manufacturing flow. He received his doctorate in physics from John Hopkins University. He chose his first job because it offered that designing image sensors “means doing physics” and has kept this passion front-and-center in his work. He has worked at Texas Instruments, Polaroid, Atmel, Eastman Kodak, Aptina, BAE Systems and GOODiX Technology and with manufacturing facilities in France, Italy, Taiwan, China and the USA. He has been involved with astronomers on the Galileo mission to Jupiter and to Halley’s Comet, with commercial companies on cell phone imagers and biometrics, with scientific community for microscopy and lab-on-a-chip, with robotics on 3-d mapping sensors and with defense contractors on night vision. His publications include the first megapixel CCD and the basis for dark current spectroscopy (DCS).
Lei Li
(Rice University)
New Generation Photoacoustic Imaging: From benchtop wholebody imagers to wearable sensors

Please LOG IN to view the video.
Date: January 18, 2023
Description:
Whole-body imaging has played an indispensable role in preclinical research by providing high-dimensional physiological, pathological, and phenotypic insights with clinical relevance. Yet, pure optical imaging suffers from either shallow penetration or a poor depth-to-resolution ratio, and non-optical techniques for whole-body imaging of small animals lack either spatiotemporal resolution or functional contrast. We have developed a dream machine, demonstrating that a stand-alone single-impulse panoramic photoacoustic computed tomography (SIP-PACT) mitigates these limitations by combining high spatiotemporal resolution, deep penetration, anatomical, dynamical and functional contrasts, and full-view fidelity. SIP-PACT has imaged in vivo whole-body dynamics of small animals in real time, mapped whole-brain functional connectivity, and tracked circulation tumor cells without labeling. It also has been scaled up for human breast cancer diagnosis. SIP-PACT opens a new window for medical researchers to test drugs and monitor longitudinal therapy without the harm from ionizing radiation associated with X-ray CT, PET, or SPECT. Genetically encoded photochromic proteins benefit photoacoustic computed tomography (PACT) in detection sensitivity and specificity, allowing monitoring of tumor growth and metastasis, multiplexed imaging of multiple tumor types at depths, and real-time visualization of protein-protein interactions in deep-seated tumors. Integrating the newly developed microrobotic system with PACT permits deep imaging and precise control of the micromotors in vivo and promises practical biomedical applications, such as drug delivery. In addition, to shape the benchtop PACT systems toward portable and wearable devices with low cost without compromising the imaging performance, we recently have developed photoacoustic topography through an ergodic relay, a high-throughput imaging system with significantly reduced system size, complexity, and cost, enabling wearable applications. As a rapidly evolving imaging technique, photoacoustic imaging promises preclinical applications and clinical translation.
Further Information:
Dr. Lei Li is an assistant professor of Electrical and Computer Engineering at Rice University. He obtained his Ph.D. from the Department of Electrical Engineering at California Institute of Technology in 2019. He received his MS at Washington University in St. Louis in 2016. His research focuses on developing next-generation medical imaging technology for understanding the brain better, diagnosing early-stage cancer, and wearable monitoring of human vital signs. He was selected as a TED fellow in 2021 and a rising star in Engineering in Health by Columbia University and Johns Hopkins University (2021). He received the Charles and Ellen Wilts Prize from Caltech in 2020 and was selected as one of the Innovators Under 35 by MIT Technology Review in 2019. He is also a two-time winner of the Seno Medical Best Paper Award granted by SPIE (2017 and 2020, San Francisco).
Jeong Joon Park
(Stanford)
Learning to Re-create Reality in 3D

Please LOG IN to view the video.
Date: January 11, 2023
Description:
Despite its tremendous success, 2D media, e.g., photos and videos, remain ‘static’ snapshots of the world. That is, we cannot walk around freely within a video or interact with the people inside. To unlock richer interactive experiences, I build artificial intelligence (AI) to democratize 3D media that have the capabilities to digitally ‘re-render’ and interact with the 3D world. Towards this goal, I aim to enable people to casually ‘capture’ 3D content, as they would take photos and videos. On the other hand, I seek to build generative AI systems that help non-experts in authoring realistic 3D content, such as objects, scenes, and their dynamics. Such faithful reconstruction and synthesis of 3D scenes are challenging due to the complex physics behind the world and the innate ambiguities coming from sparse observations. In this talk, I discuss my recent progress on the fundamental problems towards my vision: 1) representing 3D data with neural networks, 2) training 3D generative models, and 3) combining physics and machine learning to build robust AI systems that can extrapolate beyond what’s observed.
Further Information:
Jeong Joon (JJ) Park is a postdoctoral researcher at Stanford University, working with Professors Leonidas Guibas and Gordon Wetzstein. His main research interests lie in the intersection of computer vision, graphics, and machine learning, where he studies realistic reconstruction and synthesis of 3D scenes using neural and physical representations. He did his PhD in computer science at the University of Washington, Seattle, under the supervision of Professor Steve Seitz, during which he was supported by Apple AI/ML Fellowship. He is fortunate to have worked with great collaborators from his academic institutions and internships with Adobe, Meta, and Apple. Prior to PhD, he received his Bachelor of Science from the California Institute of Technology.
Emily Cooper
(University of California at Berkeley)
Taking a binocular view of augmented reality system design

Please LOG IN to view the video.
Date: December 7, 2022
Description:
Augmented reality (AR) systems aim to enhance our view of the world and make us see things that are not actually there. But building an AR system that effectively integrates with our natural visual experience is hard. AR systems often suffer from technical and visual limitations, such as small eyeboxes and narrow visual-field coverage. An integral part of AR system development, therefore, is perceptual research that improves our understanding of when and why these limitations matter. I will describe the results of perceptual studies designed to provide guidance on how to optimize the limited visual field coverage supported by many AR systems. Our analysis highlights the idiosyncrasies of how our natural binocular visual field is formed, the complexities of quantifying visual field coverage for binocular AR systems, and the trade offs that are necessary when an AR system can only augment a subarea of the visual field.
Further Information:
Emily Cooper is an Assistant Professor of Optometry and Vision Science at the University of California, Berkeley. Her lab’s research examines the mechanisms and phenomenology of human visual perception, with a particular emphasis on perception of three-dimensional (3D) space. In addition to developing insights into basic 3D vision, her lab works to apply these scientific insights to make perceptually meaningful improvements to augmented reality systems.
Thomas Müller
(NVIDIA)
Neural Networks in High Performance Graphics

Please LOG IN to view the video.
Date: November 30, 2022
Description:
Neural networks have a reputation for being expensive to run and even more expensive to train, which makes them seem like a bad fit for high-performance tasks. But this is a misconception. Neural networks can run, and even be trained, in the inner loops of real-time renderers or SLAM systems when powered by the right data structures and algorithms. This talk is about these algorithms: why do they work? When does it makes sense to use them? And how important is such low-level engineering in a research project?
Further Information:
Thomas is a principal research scientist at NVIDIA working on the intersection of machine learning and light transport simulation. His research won multiple best paper awards and is used in movie production (such as in Disney’s Hyperion and RenderMan), 3D reconstruction and gaming. As part of his research, Thomas created several widely used open source frameworks, including instant-ngp, tiny-cuda-nn, and tev. Thomas holds a PhD from ETH Zürich & Disney Research and, in a past life, also developed large components of the online rhythm game “osu!”.
Tara Akhavan
(IRYStec)
Perceptual Displays Today & Tomorrow – Productization and Evaluation

Please LOG IN to view the video.
Date: November 16, 2022
Description:
The evolution of displays in various industries has been a key factor in satisfying consumers. In the automotive space specifically, Perceptual and Immersive displays are inspiring more manufacturers to build products matching the end user’s desires while also accounting for vehicle safety factors such as visibility in poor weather, eye fatigue, seamless interaction, and compensating for age and gender biases. In this talk, Dr. Akhavan will share her experience introducing perceptual displays to the automotive industry, combining physiological and mechanical testing methods to assess her solutions, and most importantly, measuring the performance of perceptual displays compared to traditional displays.
Further Information:
Tara Akhavan, PhD. is an award-winning technology entrepreneur and pioneer in the field of Perceptual Display Processing for the consumer and automotive industry. Prior to founding IRYStec, a Series-A Montreal-based start-up, she was awarded for scaling an Operations and Maintenance Center (OMC) product in the telecommunications industry from analysis and design to deployment in a 3GPP mobile network with 20 million subscribers. She earned her bachelor’s degree in computer engineering, a master’s degree in artificial intelligence and a PhD in image processing and computer vision from Vienna University of Technology.
Luca Fascione
(NVIDIA)
Flavors of color

Please LOG IN to view the video.
Date: November 9, 2022
Description:
Colors are the key constituents of images, whether these be digital or physical media. However, while most people have an understanding of what color is as it overlaps their respective field of work, it seems there is difficulty bringing together these many separate understandings under a unified conceptual framework. We’ll discuss how it is that color happens, its physical nature, explore how (or whether) it can be measured, and what various different people in separate fields do to turn it into quantities usable to digital image synthesis or processing.
Further Information:
Luca Fascione is a distinguished engineer with Nvidia, working in ray tracing and digital image synthesis. He has previously worked in the movie industry for Weta Digital and Pixar Animation Studios. While at Weta he was the architect of the studio’s production renderer Manuka and several other systems, among them the Facial Motion Capture real-time solver first used on Avatar (2009). This last work was recognized with an Academy Award in 2017.
Hakan Urey
( Koç University, Turkey)
Computational Holographic Displays and AR Applications

Date: November 2, 2022
Description:
Augmented reality and metaverse applications hold great promise for the future of human computer interaction. Computer generated holography (CGH) displays can be the ultimate display technology, which can provide true 3D with all the required depth cues for various AR and other applications. Currently, there are challenges related to spatial light modulators, coherent light sources, and computational requirements. ML based algorithms recently improved the image quality and defocus properties of holograms quite substantially. In this talk, I’ll describe the challenges and recent progress in CGH displays. I’ll also discuss the applications of holography in head-worn displays, head-up displays for cars, and vision simulators used before cataract surgeries.
Further Information:
Hakan Urey is a Professor of Electrical Engineering at Koç University, Istanbul-Turkey. He received the BS degree from Middle East Technical University and MS and Ph.D. degrees from Georgia Institute of Technology all in Electrical Engineering. He worked for Microvision and played a key role in the development of MEMS scanners and laser beam scanning systems. His research interests are MEMS, optical sensors, holography, and augmented reality displays. He is the inventor of more than 60 patents, all of which were licensed by more than 10 companies and led to 5 spinoff companies from his lab. He published about 200 papers, gave more than 50 invited talks at conferences, and received numerous awards and grants including European Research Council Advanced Grant. He is a fellow of OPTICA and a member of Science Academy in Turkey.
Thomas Goossens
(Stanford)
Camera simulation in a world of trade secrets

Please LOG IN to view the video.
Date: October 26, 2022
Description:
Imaging systems are found in many devices; these devices serve many different functions from consumer photography to automotive applications to biometric recognition. Designing these image systems can be quite complex, and the process can benefit from simulation tools that model all of the system components. Simulation can be a tool for optimizing the design or evaluating performance even before building a physical prototype. Furthermore, an accurate simulator can be used to generate a large number of synthetic images which can be used to train or test neural networks.
A significant challenge in simulating an image system is obtaining accurate models of all the components. Understandably, manufacturers may want to protect their intellectual property, making them reluctant to share details that would be important to an accurate simulation. In this talk, I will discuss how we used black-box (phenomenological) models to simulate a consumer camera with a proprietary lens design and proprietary pixel optics (microlens and dual pixel). I will describe the ideas and evaluate the performance of the simulation. The use of phenomenological models can greatly expand the scope and value of image systems simulations.
Further Information:
Thomas Goossens is a postdoctoral fellow at Stanford University working on camera simulation. He obtained his Ph.D. degree at KU Leuven (Belgium) in collaboration with IMEC, working on hyperspectral imaging.
Yi Xue
(UC Davis)
3D fluorescence and phase microscopy with scattering samples

Please LOG IN to view the video.
Date: October 19, 2022
Description:
Optical imaging is often hindered by light scattering. Scattered photons contribute to the background noise and degrade the signal-to-noise ratio (SNR) of fluorescent images. To tackle this challenge, I developed several strategies for both multiphoton microscopy and one-photon microscopy to image through scattering media. Multiphoton microscopy has been widely used for deep tissue imaging due to long excitation wavelength and inherent optical sectioning ability, but imaging speed is relatively slow because of scanning. Multiphoton microscopy with parallelized excitation and detection improves the imaging speed, but scattered fluorescent photons degrade the SNR of images. To achieve both high speed and high SNR, I developed a two-photon imaging technique that combines structured illumination and a digital filter of spatial frequency to discard scattered photons and only keeps ballistic photons. On the other hand, scattered photons carry the information of the heterogeneity of scattering media, quantitatively evaluated by refractive index. Instead of discarding scattered photons, I developed a one-photon technique that decodes the refractive index of media from scattered fluorescence images. This technique models a scattering medium as a series of thin layers and describe the light path in the medium. By measuring the fluorescent images and solving the inverse problem, this technique enables the reconstruction of the 3D refractive index of scattering media and digital correction of scattering in fluorescence images.
Further Information:
Yi Xue is an assistant professor at the University of California, Davis. She received her PhD and MS degrees in Mechanical Engineering from Massachusetts Institute of Technology in 2019 and 2015, respectively, and her BEng degree in Optical Engineering from Zhejiang University, China, in 2013. She received JenLab Young Investigator Award and Weill Neurohub Fellowship. Her current research interests include computational optics, multiphoton microscopy, brain imaging and optogenetics.
Qi Sun
(NYU)
Co-Optimizing Human-System Performance in XR

Please LOG IN to view the video.
Date: October 12, 2022
Description:
Extended Reality (XR) enables unprecedented possibilities for displaying virtual content, sensing physical surroundings, and tracking human behaviors with high fidelity. However, we still haven’t created “superhumans” who can outperform what we are in physical reality, nor a “perfect” XR system that delivers infinite battery life or realistic sensation. In this talk, I will discuss some of our recent research on leveraging eye/muscular sensing and learning to model our perception, reaction, and sensation in virtual environments. Based on the knowledge, we create just-in-time visual content that jointly optimizes human (such as reaction speed to events) and system performance in XR.
Further Information:
Qi Sun is an assistant professor at New York University. Before joining NYU, he was a research scientist at Adobe Research. He received his PhD at Stony Brook University. His research interests lie in computer graphics, VR/AR, computational cognition, and human-computer interaction. He is a recipient of the IEEE Virtual Reality Best Dissertation Award, as well as ACM SIGGRAPH Best Paper Award.
Orly Liba
(Google)
Creative Camera: Advancing Computational Photography at Google

Please LOG IN to view the video.
Date: October 5, 2022
Description:
Our team, Creative Camera, aims to democratise image capture and editing by bridging the hardware and software gaps of mobile devices. In Portrait Mode and Night Sight we created DSLR-quality photos on Pixel phones. In Magic Eraser and Sky Palette Transfer we provided users with advanced editing features that previously required expensive software and skill. In this talk I’ll describe some of our team’s projects as well as new directions employing generative models such as diffusion models.
Further Information:
Dr. Orly Liba joined Google as a Research Scientist in 2018. During this time she worked on projects such as Night Sight and Magic Eraser that launched on the Pixel phone. She completed her PhD in Electrical Engineering at Stanford, where her research focused on developing optical and computational tools for Molecular Imaging with Optical Coherence Tomography.
Federico Capasso
(Harvard School of Engineering and Applied Science)
Flat Optics Unifies Semiconductor and Optical Technology: From Metalenses to Cameras and Smart Sensors

Please LOG IN to view the video.
Date: June 8, 2022
Description:
Sub-wavelength spaced arrays of nanostructures, known as metasurfaces, provide a new basis for recasting optical components into thin planar elements, easy to optically align and control aberrations, leading to a major reduction in system complexity and footprint as well as the introduction of new optical functions. The planarity of flat optics leads to the unification of semiconductor manufacturing and lens making, where the planar technology to manufacture chips will be used for CMOS compatible metasurface based optical components for high volume markets like cell phones, AR-VR and advance depth and polarization sensing modalities.
Further Information:
Federico Capasso is the Robert Wallace Professor of Applied Physics at Harvard University, which he joined in 2003 after 27 years at Bell Labs where his career advanced from postdoctoral fellow to Vice President for Physical Research. He has made contributions to optics and photonics, nanoscience, materials science, including the bandgap engineering technique leading to his invention of the quantum cascade laser, MEMS based on the Casimir effect and the first measurement of the repulsive Casimir force, research on metasurfaces including the generalized laws of refraction and reflection, “flat optics” such a high performance metalenses and Matrix Fourier optics used to demonstrate single shot ultracompact polarization sensitive cameras. He is a board member of Metalenz (https://www.metalenz.com/), which he cofounded in 2016 and is focused on bringing to market metalenses and cameras for high volume markets.
His awards include the Yves Medal of Optica (formerly Optical Society), the Balzan Prize in Applied Photonics, the King Faisal Prize, the IEEE Edison Medal, the American Physical Society Arthur Schawlow Prize in Laser Science, the AAAS Rumford Prize, the Enrico Fermi Prize, the European Physical Society Quantum Electronics Prize, the Wetherill Medal of the Franklin Institute, the Materials Research Society Medal, the Jan Czochralski Award for lifetime achievements in Materials Science, the Rob Wood Prize of Optica. He is a member of the National Academy of Sciences, the National Academy of Engineering, the Academia Europaeaa fellow of the National Academy of Inventors, and a fellow of the American Academy of Arts and Sciences (AAAS). He holds honorary doctorates from Lund University and Diderot University.
Susana Marcos
(University of Rochester)
Ophthalmic imaging: from Optical bench to the eye care clinic

Please LOG IN to view the video.
Date: May 25, 2022
Description:
The eye is an exquisite optical system, but the optical image projected on the retina becomes blurred as a result of different very prevalent ocular conditions (myopia, presbyopia, keratoconus…). I will present custom-developed optical technologies (3D fully quantitative OCT, aberrometry, second harmonic generation microscopy, optical coherence optical coherence elastography, adaptive optics visual simulators) that allow characterizing the optical, mechanical and geometrical properties of the ocular components, and connect morphology, retinal and perceived image quality. These relations have served us as inspiration for developing and optimizing correction alternatives(intraocular lenses, surgical procedures. Several of these technologies have made their way to the clinical practice and are used for diagnostics and treatment.
Further Information:
Susana Marcos is the Nicholas George Professor of Optics and the David Williams Director of the Center for Visual Science (CVS) at the University of Rochester in NY. She is a pioneer in the development of new techniques for the evaluation of the eye, including retinal imaging instruments, aberrometers, adaptive optics, anterior segment imaging of the eye and intraocular lens designs.
Dr. Marcos earned her Bachelor and PhD degrees in Physics at the University of Salamanca, Spain. Before coming to Rochester, she was Director of the Institute of Optics, CSIC (2008-2012), Spain, and Professor of Research at CSIC, where she founded the Visual Optics and Biophonics Lab in 2000. Prior to her tenure at CSIC, she was a postdoctoral Fellow (funded by Fulbright and Human Frontier Fellowships) at the Schepens Eye Research Institute (Harvard Medical School).
In July 2021 she was appointed Director of CVS, with dual affiliation in Optics and in Ophthalmology at the University of Rochester. She holds a “Viculated Doctorship” at the Institute of Optics, where she supervises a multidisciplinary, international team of more than 25 members. Her research programs at the University of Rochester address emerging technologies for myopia, presbyopia and cataract corrections.
Recognitions to her work include the Adolph Lomb Medal (Optical Society), ICO Prize (International Commission for Optics), Doctor Honoris Causa by the Ukraine Academy of Science and Technology, OSA Fellow, EOS Fellow, ARVO Fellow, Alcon Research Institute Award, Borish Scholar Award (Indiana University), Physics, Innovation and Technology Award (Royal Spanish Society of Physics-BBVA Foundation), Honor Plate of the Spanish Association of Scientists, Julio Pelaez Award to Women Engineers (Tatiana Perez de Guzman el Bueno Foundation), Ramón y Cajal Medal (Royal Academy of Sciences), King Jaime I Award, and National Research Award in Engineering (Government of Spain), the two latter awarded by the King of Spain.
Chrysanthe Preza
(University of Memphis)
Engineering microscopes using computational imaging

Please LOG IN to view the video.
Date: May 18, 2022
Description:
Improving the performance of three-dimensional (3D) fluorescence microscopes is a topic that has received a lot of attention over the years. In this talk, I will discuss computational imaging techniques that we developed in the past based on point-spread function engineering to address depth-induced spherical aberration, as well as new developments to improve 3D spatial resolution based on optical-transfer function engineering using novel structured illumination approaches.
Further Information:
Chrysanthe Preza is the Kanuri Professor and Chair in the Department of Electrical and Computer Engineering at the University of Memphis, where she joined 2006. She received her D.Sc. degree in Electrical Engineering from Washington University in St. Louis in 1998. She leads the research in the Computational Imaging Research Laboratory at the University of Memphis. Her research interests are imaging science, estimation theory, computational imaging enabled by deep-learning, and computational optical sensing and imaging applied to multidimensional multimodal light microscopy. She received a CAREER award by the National Science Foundation in 2009, the Herff Outstanding Faculty Research Award in 2010 and 2015, and she was the recipient of the Ralph Faudree Professorship at the University of Memphis 2015-2018. She was named Fellow of the SPIE in 2019 and Fellow of the Optica (OSA) in 2020. She serves as Associate Editor for IEEE Transactions on Computational Imaging, Topical Editor for Optica’s Applied Optics, and as Executive Editor for Biological Imaging, Cambridge University Press.
Guillermo Sapiro
(Duke University)
Computational Behavioral Coding for Developmental Disorders

Please LOG IN to view the video.
Date: May 11, 2022
Description:
In this talk I will describe some of the work at Duke University exploiting computer vision and machine learning for Autism screening. We will illustrate the challenges, including the development in real pediatric clinics, as well as the results and discovery of biomarkers.
Further Information:
Guillermo Sapiro received his B.Sc. (summa cum laude), M.Sc., and Ph.D. from the Technion, Israel Institute of Technology. After post-doctoral research at MIT, Guillermo became a Member of Technical Staff at HP Labs. He was with the Department of Electrical and Computer Engineering at the University of Minnesota. Currently he is a James B. Duke School Professor with Duke University. He is also with Apple, Inc., where he leads a team on Health AI.
Guillermo works on theory and applications in computer vision, computer graphics, medical imaging, image analysis, and machine learning. He has authored over 450 papers in these areas and has written a book published by CUP. Guillermo was awarded the ONR Young Investigator Award in 1998, the Presidential Early Career Awards for Scientist and Engineers (PECASE) in 1998, the NSF Career Award in 1999, and the National Security Science and Engineering Faculty Fellowship in 2010. He received the Test-of-Time award at ICCV 2011 and at ICML 2019. He was elected to the American Academy of Arts and Sciences in 2018, to the National Academy of Engineering in 2022, and is a Fellow of IEEE and SIAM. Guillermo was the founding Editor-in-Chief of the SIAM Journal on Imaging Sciences.
Liang Gao
(UCLA)
Ultrafast light field tomography

Please LOG IN to view the video.
Date: May 4, 2022
Description:
Cameras with extreme speeds are enabling technologies in both fundamental and applied sciences. However, existing ultrafast cameras are incapable of coping with extended three-dimensional (3D) scenes. To address this unmet need, we developed a new category of computational ultrafast imaging technique, light field tomography (LIFT), which can perform 3D snapshot transient (time-resolved) imaging at an unprecedented frame rate with full-fledged light field imaging capabilities including depth retrieval, post-capture refocusing, and extended depth of field. As a niche application, we demonstrated real-time non-line-of-sight imaging of fast-moving hidden objects, which is previously impossible without the presented technique. Moreover, we showcased 3D imaging of fiber-guided light propagation along a twisted path and the capability of resolving extended 3D objects. The advantage of such recordings is that even visually simple systems can be scientifically interesting when they are captured at such a high speed and in 3D. The ability to film the propagation of light through a curved optical path, for example, could inform the design of invisibility cloaks and other optical metamaterials.
Further Information:
Dr. Liang Gao is currently an Assistant Professor of Bioengineering at UCLA. His primary research interests encompass multidimensional optical imaging, including hyperspectral imaging and ultrafast imaging. Dr. Liang Gao is the author of more than 70 peer-reviewed publications in top-tier journals, such as Nature, Nature Communications, Science Advances, and PNAS. He received his BS degree in Physics from Tsinghua University in 2005 and Ph.D. degree in Applied Physics and Bioengineering from Rice University in 2011.
Hao Li
(Pinscreen and UC Berkeley)
AI Synthesis for the Metaverse: From avatars to 3D scenes

Please LOG IN to view the video.
Date: April 20, 2022
Description:
As the world is getting ready for the metaverse, the need for 3D content is growing rapidly, AR/VR will become mainstream, and next era of the web will be spatial. A digital and immersive future is unthinkable without telepresence, lifelike digital humans, and photorealistic virtual worlds. Existing computer graphics pipelines and technologies rely on production studios and a content creation process that is time consuming and expensive. My research is about developing novel 3D deep learning-based techniques for generating photorealistic digital humans, objects, and scenes and democratizing the process by making such capability accessible to anyone and automatic. In this talk, I will present a state-of-the-art technology for digitizing an entire virtual 3D avatar from a single photo developed at Pinscreen, and give a live demo. I will also showcase a high-end neural rendering technology used in next generation virtual assistant solutions and real-time virtual production pipelines. I will also present a real-time teleportation system that only uses a single webcam as input for digitizing entire bodies using 3D deep learning. Furthermore, I will present our latest efforts at UC Berkeley on real-time AI synthesis of entire scenes using NeRF representations and Plenoctrees. Finally, I will highlight some recent work on deepfake detection and speech-driven human motion synthesis, where we combine approaches from NLP, vision and graphics. My goal is to enable new capabilities and applications at the intersection of AI, vision, and graphics and impact the future of communication, human-machine interaction, and content creation. At the same time, we must also prioritize the safety and wellbeing of everyone while architecting this future.
Further Information:
Hao Li is CEO and Co-Founder of Pinscreen, a startup that builds cutting edge AI-driven virtual avatar technologies. He is also a Distinguished Fellow of the Computer Vision Group at UC Berkeley. Before that, he was an Associate Professor of Computer Science at the University of Southern California, as well as the director of the Vision and Graphics Lab at the USC Institute for Creative Technologies. Hao’s work in Computer Graphics and Computer Vision focuses on digitizing humans and capturing their performances for immersive communication, telepresence in virtual worlds, and entertainment. His research involves the development of novel deep learning, data-driven, and geometry processing algorithms. He is known for his seminal work in avatar creation, facial animation, hair digitization, dynamic shape processing, as well as his recent efforts in preventing the spread of malicious deep fakes. He was previously a visiting professor at Weta Digital, a research lead at Industrial Light & Magic / Lucasfilm, and a postdoctoral fellow at Columbia and Princeton Universities. He was named top 35 innovator under 35 by MIT Technology Review in 2013 and was also awarded the Google Faculty Award, the Okawa Foundation Research Grant, as well as the Andrew and Erna Viterbi Early Career Chair. He won the Office of Naval Research (ONR) Young Investigator Award in 2018 and was named named to the DARPA ISAT Study Group in 2019. In 2020, he won the ACM SIGGRAPH Real-Time Live! “Best in Show” award. Hao obtained his PhD at ETH Zurich and his MSc at the University of Karlsruhe (TH).
Matthew Tancik
(UC Berkeley)
3D Asset Generation using Neural Radiance Fields

Please LOG IN to view the video.
Date: April 13, 2022
Description:
Neural Radiance Fields (NeRFs) enable novel view synthesis of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. In the past two year these representations have received interest from the community due to their simplicity to implement and their high quality results. In this talk I will discuss the core concepts behind NeRF and dive into the details behind one specific technique that enables the networks to represent high frequency signals. Finally I will discuss a recent project where we scale up NeRFs to represent large scale scenes. Specifically we utilize data captured from autonomous vehicles to reconstruct a neighborhood in San Francisco.
Further Information:
Matt Tancik is a PhD student at UC Berkeley advised by Ren Ng and Angjoo Kanazawa and is supported by the NSF graduate research fellowship program. He received his bachelor’s degree in CS and physics at MIT. He received a master’s degree in CS working on non-line-of-sight imaging while advised by Ramesh Raskar at MIT. His current research lies at the intersection of machine learning and graphics.
Michael Brown
(York University, Canada)
Rethinking the Camera Pipeline to Improve Photographic and Scientific Applications

Please LOG IN to view the video.
Date: April 6, 2022
Description:
The in-camera processing pipeline used in modern consumer cameras (e.g., DSLR and smartphone cameras) is essentially the same design used in early digital consumer cameras from the 1990s. While this original design has stood the test of time, there are many areas in which the current pipeline can be improved. Moreover, with the integration of cameras in our smartphones, there are increasing demands to have cameras operate as scientific imaging devices instead of photographic devices. This talk will describe several recent works addressing limitations in the current camera pipeline and ideas for designing a dual propose pipeline suitable for both photographic and scientific applications.
Further Information:
Michael S. Brown is a professor and Canada Research Chair in Computer Vision at York University in Toronto. He is also a senior research director at the Samsung AI Center in Toronto. His small team at Samsung has contributed to technology that has shipped in millions of devices, including the Galaxy phones and tablets. Dr. Brown is a senior member of the computer vision community. He has served as the general and program chair for several of his fields conferences, including ACCV’14, 3DV’15, WACV’11/17/19, and CVPR’18/21/23.
Marc Levoy
(Stanford and Adobe)
Computational photography at the point of capture on mobile cameras

Please LOG IN to view the video.
Date: March 30, 2022
Description:
The ability to capture bursts of images in rapid succession with varying camera settings, and to process them quickly on device, has revolutionized photography on cell phones, as well as disrupting the camera industry. I will first summarize computational photography technologies on Google’s Pixel smartphones: high dynamic range photography (“HDR+”), simulated shallow depth-of-field (“Portrait mode”), multi-frame super-resolution (“Super Res Zoom”), and photography in very low light (“Night Sight”). My goal at Adobe is to build tools, based on computational photography, for serious photographers and creative professionals. In this spirit, I will enumerate some technologies that have been explored by the research community, and which with some more effort could be made robust and efficient enough to deploy on mobile devices, and controllable enough to be useful as a creative tool. Our ultimate goal is to turn picture-taking into an interactive collaboration between camera and photographer, mediated by computational photography and machine learning.
Further Information:
Marc Levoy is the VMware Founders Professor of Computer Science (Emeritus) at Stanford University and a Vice President and Fellow at Adobe. In previous lives he worked on computer-assisted cartoon animation (1970s), volume rendering (1980s), 3D scanning (1990s), light field imaging (2000s), and computational photography (2010s). At Stanford he taught computer graphics, digital photography, and the science of art. At Google he launched Street View, co-designed the library book scanner, and led the team that created HDR+, Portrait Mode, and Night Sight for Pixel smartphones. These phones won DPReview’s Innovation of the Year (2017 and 2018) and Smartphone Camera of the Year (2019), and Mobile World Congress’s Disruptive Innovation Award (2019). Levoy’s awards include Cornell University Charles Goodwin Sands Medal for best undergraduate thesis (1976), ACM SIGGRAPH Computer Graphics Achievement Award (1996), ACM Fellow (2007), and National Academy of Engineering (2022).
Yifan Wang
(Stanford)
“Neuralize” geometry processing pipeline

Please LOG IN to view the video.
Date: March 9, 2022
Description:
Fueled by the proliferation of consumer-level 3D acquisition devices and the growing accessibility of shape modeling applications for ordinary users, there is a tremendous need for automatic geometry processing algorithms that perform robustly even under incomplete and distorted data. This talk demonstrates how each step of the geometry processing pipeline can be automated and, more importantly, strengthened by utilizing neural networks to leverage consistencies and high-level semantic priors from data.
Further Information:
Yifan is an incoming post-doc researcher at the Stanford Computational Imaging Lab. She obtained her Ph.D. from the Interactive Geometry Lab at ETH Zurich, where she revamped many well-known geometry processing tasks using neural networks. Her work lies in the intersection between computer vision, graphics, and machine learning, spanning a wide range of topics.
Boyd Fowler
(Omnivision)
Highlights of the 2021 International Image Sensor Workshop

Please LOG IN to view the video.
Date: March 16, 2022
Description:
The International Image Sensor Workshop is held every two years. It alternates between USA, Europe and Asia. In 2021 it was held as a virtual workshop. The main topic of the workshop is solid state image sensors, their design and application. More than 500 people attended the workshop last year. There were 86 papers, 52 regular papers, 32 posters and 2 invited talks. In this presentation we will highlight a few key papers, from the workshop, covering the most important image sensor developments during the last two years. These developments include pixel scaling, pixel level optics, global shutter image sensors, SPAD array sensors, high speed image sensors and low light image sensors.
Further Information:
Boyd Fowler joined OmniVision in December 2015 and is the CTO. Prior to joining OmniVision he was a founder and VP of Engineering at Pixel Devices where he focused on developing high performance CMOS image sensors. After Pixel Devices was acquired by Agilent Technologies, Dr. Fowler was responsible for advanced development of their commercial CMOS image sensors products. In 2005 Dr. Fowler joined Fairchild Imaging as the CTO and VP of technology, where he developed SCMOS image sensors for high performance scientific applications. After Fairchild Imaging was acquired by BAE Systems, Dr. Fowler was appointed the technology directory of the CCD/CMOS image sensor business. He has authored numerous technical papers, book chapters and patents. Dr. Fowler received his M.S. and Ph.D. degrees in Electrical Engineering from Stanford University in 1990 and 1995 respectively.
Bernard Kress
(Google)
Novel XR hardware requirements providing Metaverse level consumer and enterprise experiences

Please LOG IN to view the video.
Date: March 2, 2022
Description:
In order to provide experiences and services compatible with Metaverse experience requirements, AR systems and smart glasses will need to evolve from their original concepts (Google Glass or Microsoft HoloLens) to incorporate more specific display, imaging and sensing functionality. These new hardware developments will help address the specific needs to provide effective Metaverse experiences both in enterprise and consumer fields. They will also fuel new challenges in optical engineering to push the development of novel hardware concepts integrating multiple hybrid functionalities such as display and sensing in a same monolithic element producible in mass by low cost fabrication techniques.
Further Information:
Bernard has been involved in optics and photonics for the past 25 years as an author, instructor, associate professor, engineer, and hardware development manager in academia, start-ups and large corporations, with a focus on micro-optics, diffractive and holographic optics. He successively worked on products developments in the fields of optical computing, optical telecom, optical data storage, optical anti-counterfeiting, industrial optical sensors and more recently in augmented and mixed reality systems. Since 2010, he held engineering management positions at Google [X] (Google Glass) and Microsoft (HoloLens 1 and 2). He is currently the Director of XR Hardware at Google and the President Elect of the International Society for Optics and Photonics (the SPIE). Bernard chairs various SPIE conferences including the SPIE AR/VR/MR conference held each year at the Moscone Center in SF.
Dr. Sara Nagelberg and Dr. João Fayad
(OWL Labs)
Cameras for Hybrid Work

Please LOG IN to view the video.
Date: February 23, 2022
Description:
Owl Labs is a Boston-based technology company dedicated to creating a better collaboration experience for all. Launching its flagship product in 2017, the Meeting Owl is a smart 360-degree camera, microphone, and speaker device that creates a deeply immersive video experience, making every voice seen and heard, wherever they may be. The Whiteboard Owl pairs with the Meeting Owl to capture and enhance the content of the whiteboard, so that remote participants can see everything. This talk will delve into how the Owl cameras work, the challenges and tradeoffs of 360 video, and the computer vision that makes it possible.
Further Information:
Sara joined Owl Labs as an optical systems engineer in the summer of 2020. She works on next-generation camera systems for hybrid work. Prior to joining Owl Labs, she was a graduate student researcher in the Laboratory for Bio-Inspired Photonic Engineering at Massachusetts Institute of Technology (MIT), where she worked on microscale dynamic liquid structures for structural color, microlenses, and bio-sensing. She received her Bachelor’s degree in Physics from McGill University, and PhD in Mechanical Engineering from MIT. She is interested in cameras, displays, computer vision and other technologies that enhance our interactions with the world around us, the digital world, and with each other.
João joined Owl Labs as a senior computer vision engineer in the fall of 2019. He’s main project is the computer vision algorithm for the recently released Whiteboard Owl device. He has also developed commercial computer vision applications both at Catapult Sports and Naked Labs. Prior to joining the industry, João received a PhD in Computer Science from Queen Mary College, University of London (UK) for his research work on 3D reconstruction of non-rigid scenes from a single video source. He also held post-doctoral research positions at the National Institute of Informatics in Tokyo (Japan) and the Champalimaud Research Programme in Lisbon (Portugal) where he continued working on the problems of motion tracking and 3D reconstruction. He has both a BSc. and MSc. degrees in Biomedical Engineering from the University of Lisbon (Portugal). He’s interested in computer vision and machine learning solutions to understand human motion and interaction from video.
Mark Brongersma
(Stanford)
Flat Optics for Dynamic Wavefront Manipulation and Mixed Reality Eyewear

Please LOG IN to view the video.
Date: February 16, 2022
Description:
Since the development of diffractive optical elements in the 1970s, major research efforts have focused on replacing bulky optical components by thinner, planar counterparts. The more recent advent of metasurfaces, i.e. nanostructured optical coatings, has further accelerated the development of flat optics through the realization that nanoscale antenna elements can be utilized to facilitate local and nonlocal control over the light scattering amplitude and phase.
In this presentation, I will start by showing how passive and active metasurfaces can start to impact Augmented and Virtual Reality applications. I will discuss the creation of high-efficiency, metasurfaces for optical combiners for near-eye displays, OLED displays, and eye tracking systems. I will also highlight recent efforts in our group to realize electrically-tunable metasurfaces employing nanomechanics, tunable transparent oxides, microfluidics, phase change materials, and atomically-thin semiconductors. Such elements are capable of dynamic wavefront manipulation for optical beam steering and holography. The proposed optical elements can be fabricated by scalable fabrication technologies, opening the door to a wide range of commercial applications.
Further Information:
Mark Brongersma is the Stephen Harris Professor in the Departments of Materials Science and Applied Physics at Stanford University. He leads a research team of ten students and five postdocs. Their research is directed towards the development and physical analysis of new materials and structures that find use in nanoscale electronic and photonic devices. He is on the list of Global Highly Cited Researchers (Clarivate Analytics). He received a National Science Foundation Career Award, the Walter J. Gores Award for Excellence in Teaching, the International Raymond and Beverly Sackler Prize in the Physical Sciences (Physics) for his work on plasmonics, and is a Fellow of the OSA, the SPIE, and the APS. Dr. Brongersma received his PhD from the FOM Institute AMOLF in Amsterdam, The Netherlands, in 1998. From 1998-2001 he was a postdoctoral research fellow at the California Institute of Technology.
Jelena Notaros
(MIT)
Silicon Photonics for LiDAR, Augmented Reality, and Beyond

Please LOG IN to view the video.
Date: February 9, 2022
Description:
By enabling the integration of millions of micro-scale optical components on compact millimeter-scale computer chips, silicon photonics is positioned to enable next-generation optical technologies that facilitate revolutionary advances for numerous fields spanning science and engineering. In this talk, I will highlight our work on developing novel silicon-photonics-based platforms, devices, and systems that enable innovative solutions to high-impact problems in areas including augmented-reality displays, LiDAR sensing for autonomous vehicles, free-space optical communications, quantum engineering, and biophotonics.
Further Information:
Jelena Notaros is the Robert J. Shillman (1974) Career Development Assistant Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology. She received her Ph.D. and M.S. degrees from the Massachusetts Institute of Technology in 2020 and 2017, respectively, and B.S. degree from the University of Colorado Boulder in 2015. Her research interests are in integrated silicon photonics devices, systems, and applications. Jelena was one of three Top DARPA Risers, a 2018 DARPA D60 Plenary Speaker, a 2021 Forbes 30 Under 30 Listee, a 2021 MIT Robert J. Shillman (1974) Career Development Chair recipient, a 2020 MIT RLE Early Career Development Award recipient, a 2015 MIT Herbert E. (1933) and Dorothy J. Grier Presidential Fellow, a 2015-2020 National Science Foundation Graduate Research Fellow, a 2019 OSA CLEO Chair’s Pick Award recipient, a 2014 IEEE Region 5 Student Paper Competition First Place Award recipient, a 2019 MIT MARC Best Overall Paper Award and Best Pitch Award recipient, a 2018 MIT EECS Rising Star, a 2014 Sigma Xi Undergraduate Research Award recipient, and a 2015 CU Boulder Chancellor’s Recognition Award recipient, among other honors.
Kwabena Boahen
(Stanford University)
Event-based Vision Sensors: Challenges and Opportunities

Please LOG IN to view the video.
Date: February 2, 2022
Description:
Event-based vision (EBV) sensors preprocess their photodetectors ’signals to produce spatiotemporally sparse “events”, readout, not frame-by-frame, but rather event-by-event. For example, a Dynamic Vision Sensor (DVS) reports temporal contrast—changes in log-luminance. Event-based readout leverages events’ sparsity to realize higher (effective) sampling rate and shorter latency than frame-based readout does. I will discuss two key challenges EBV cameras present and propose solutions. First, coherent optical flow triggers incoherent events (temporally dispersed) that disappear and reappear at different speeds, depending on local spatial contrast. This makes it excruciatingly difficult to interpret a cluttered scene filmed from a DVS camera mounted on a moving platform (e.g., a drone). Second, when more than ~6M eps (events per second) occur, latency and jitter, its standard deviation, shoot up 400-fold (from 0.2 to 40 μs). That severely limits throughput, the usable fraction of the maximum readout-rate (~1 Geps). These challenges can be tackled by bonding a Back-side Illuminated (BI) CMOS Image Sensor (CIS) wafer directly to a deep-submicron, mixed-signal CMOS wafer that receives photodetector signals via pixel-wise Cu-Cu bonds. This stacked-wafer process accommodates dense mixed-signal preprocessing and performant network-on-a-chip (NoC) routing without sacrificing fill-factor or image resolution.
Further Information:
Kwabena Boahen (M’89, SM’13, F’16) received the B.S. and M.S.E. degrees in electrical and computer engineering from the Johns Hopkins University, Baltimore, MD, both in 1989, and the Ph.D. degree in computation and neural systems from the California Institute of Technology, Pasadena, in 1997. He was on the bioengineering faculty of the University of Pennsylvania from 1997 to 2005, where he held the first Skirkanich Term Junior Chair. He is presently Professor of Bioengineering and Electrical Engineering at Stanford University, with a courtesy appointment in Computer Science. He is also an investigator in Stanford’s Bio-X Institute and Wu Tsai Neurosciences Institute. He founded and directs Stanford’s Brains in Silicon lab, which develops silicon integrated circuits that emulate the way neurons compute and computational models that link neuronal biophysics to cognitive behavior. This interdisciplinary research bridges neurobiology and medicine with electronics and computer science, bringing together these seemingly disparate fields. His scholarship is widely recognized, with over a hundred publications, including a cover story in Scientific American featuring his lab’s work on a silicon retina and a silicon tectum that “wire together” automatically (May 2005). He has been invited to give over a hundred seminar, plenary, and keynote talks, including a 2007 TED talk, “A computer that works like the brain”, with over seven hundred thousand views. He has received several distinguished honors, including a Packard Fellowship for Science and Engineering (1999) and a National Institutes of Health Director’s Pioneer Award (2006). He was elected a fellow of the American Institute for Medical and Biological Engineering (2016) and of the Institute of Electrical and Electronic Engineers (2016) in recognition of his lab’s work on Neurogrid, an iPad-size platform that emulates the cerebral cortex in biophysical detail and at functional scale, a combination that hitherto required a supercomputer. In his lab’s most recent research effort, the Brainstorm Project, he led a multi-university, multi-investigator team to co-design hardware and software that makes neuromorphic computing easier to apply. A spin-out from his Stanford lab, Femtosense Inc (2018), is commercializing this breakthrough.
Jason Lawrence
(Google)
Project Starline: A high-fidelity telepresence system

Please LOG IN to view the video.
Date: January 12, 2022
Description:
We present a real-time bidirectional communication system that lets two people, separated by distance, experience a face-to-face conversation as if they were copresent. It is the first telepresence system that is demonstrably better than 2D videoconferencing, as measured using participant ratings (e.g., presence, attentiveness, reaction-gauging, engagement), meeting recall, and observed nonverbal behaviors (e.g., head nods, eyebrow movements). This milestone is reached by maximizing audiovisual fidelity and the sense of copresence in all design elements, including physical layout, lighting, face tracking, multi-view capture, microphone array, multi-stream compression, loudspeaker output, and lenticular display. Our system achieves key 3D audiovisual cues (stereopsis, motion parallax, and spatialized audio) and enables the full range of communication cues (eye contact, hand gestures, and body language), yet does not require special glasses or body-worn microphones/headphones. The system consists of a head-tracked autostereoscopic display, high-resolution 3D capture and rendering subsystems, and network transmission using compressed color and depth video streams. Other contributions include a novel image-based geometry fusion algorithm, free-space dereverberation, and talker localization.
Further Information:
Jason’s research interests span computer graphics, computer vision, and machine learning. He has worked on a wide range of topics including physically-based rendering, real-time rendering, material appearance modeling and representation, computational fabrication, and systems for acquiring dense accurate measurements of 3D geometry and material appearance. His recent work includes high-fidelity real-time 3D capture, 3D display technologies, digital relighting, and real-time communications.
- Jung-Hoon Park » Shaping Light for Bio-Imaging
- Francois Chaubard » Productizing Deep Learning Computer Vision
- Christoph Lueze » HMDs for medical and industry applications
- Rajesh Menon » Non-anthropocentric Imaging
- Na Ji » Imaging the brain using optical microscopy
- Ioannis Gkioulekas » Interferometric computational imaging
- Yuhao Zhu » Computer Science-Vision Science Symbiosis
- Yuhao Zhu » Energy Efficiency of Computational Image Sensors
- Ellen Zhong » Neural Fields
- Changhuei Yang » Computational Microscopy
- Tali Treibitz » Underwater imaging
- Yichang Shih » Mobile Computational Photography
- Andrea Tagliasacchi » Neural Radiance Fields
- Tali Dekel » Strong Interpretable Priors
- Ben Poole » 2D priors for 3D generation
- Zeev Zalevsky » Remote Photonic Medicine
- Dan McGrath » Pixel Design
- Lei Li » New Generation Photoacoustic Imaging
- Jeong Joon Park » Learning to Re-create Reality in 3D
- Emily Cooper » Binocular AR display design
- Thomas Müller » Neural Networks in High Performance Graphics
- Tara Akhavan » Perceptual Displays
- Luca Fascione » Color Imaging
- Hakan Urey » Computational Holographic Displays
- Thomas Goossens » Camera simulation
- Yi Xue » 3D fluorescence and phase microscopy
- Qi Sun » Human-System Performance in XR
- Orly Liba » Computational Photography at Google
- Federico Capasso » Flat Optics
- Susana Marcos » Ophthalmic imaging
- Chrysanthe Preza » Engineering microscopes using computational imaging
- Guillermo Sapiro » Computational Behavioral Coding for Developmental Disorders
- Liang Gao » Ultrafast light field tomography
- Hao Li » AI Synthesis for the Metaverse: From avatars to 3D scenes
- Matthew Tancik » 3D Asset Generation using Neural Radiance Fields
- Michael Brown » Camera pipeline for photography
- Marc Levoy » Computational photography mobile cameras
- Yifan Wang » “Neuralize” geometry processing pipeline
- Boyd Fowler » 2021 International Image Sensor Workshop
- Bernard Kress » Novel XR hardware requirements
- Dr. Sara Nagelberg and Dr. João Fayad » OWL Camera
- Mark Brongersma » Flat Optics
- Jelena Notaros » Silicon Photonics
- Kwabena Boahen » Event-based Vision Sensors
- Jason Lawrence » Project Starline
SCIEN Colloquia 2021
Christian Theobalt
(Max-Planck-Institute for Informatics)
Neural Methods for Reconstruction and Rendering of Real World Scenes

Please LOG IN to view the video.
Date: November 17, 2021
Description:
In this presentation, I will talk about some of the recent work we did on new methods for reconstructing computer graphics models of real world scenes from sparse or even monocular video data. These methods are based of bringing together neural network-based and explicit model-based approaches. I will also talk about new neural rendering approaches that combine explicit model-based and neural network based concepts for image formation in new ways. They enable new means to synthesize highly realistic imagery and videos of real work scenes under user control.
Further Information:
In his research he looks at algorithmic problems that lie at the intersection of Computer Graphics, Computer Vision and Machine Learning, such as: static and dynamic 3D scene reconstruction, neural rendering and neural scene representations, marker-less motion and performance capture, virtual humans, virtual and augmented reality, computer animation, intrinsic video and inverse rendering, computational videography, machine learning for graphics and vision, new sensors for 3D acquisition, as well as image- and physically-based rendering. He is also interested in using reconstruction techniques for human computer interaction. For his work, he received several awards, including the Otto Hahn Medal of the Max-Planck Society in 2007, the EUROGRAPHICS Young Researcher Award in 2009, the German Pattern Recognition Award 2012, and the Karl Heinz Beckurts Award in 2017, and the EUROGRAPHICS Outstanding Technical Contributions Award in 2020. He received two ERC grants, an ERC Starting Grant in 2013 and an ERC Consolidator Grant in 2017.
Wenzel Jakob
(EPFL)
Differentiable Simulation of Light

Please LOG IN to view the video.
Date: November 10, 2021
Description:
Inverse problems involving light abound throughout many scientific disciplines. Typically, a set of images captured by an instrument must be mathematically processed to reveal some property of our physical reality. This talk will provide an introduction and overview of the emerging field of differentiable physically based rendering, which has the potential of substantially improving the accuracy of such calculations.
Methods in this area propagate derivative information through physical light simulations to solve optimization problems. While still very much a work in progress, advances in the last years have led to increasingly efficient and numerically robust methods that can begin to tackle interesting real-world problems. I will give an overview of recent progress and open problems.
Further Information:
Jannick Rolland
(University of Rochester)
Freeform Optics for Imaging and Metaform Optics in Near-Eye Displays

Please LOG IN to view the video.
Date: November 3, 2021
Description:
Freeform optics has set a new path for optical system design across a wide range of applications spanning from microscopy to space optics, including the billion $ consumer market of near-eye displays for augmented reality that set us on this technology path in the first place. Today, freeform optics have been demonstrated to yield compact, achromatic, and high-performance imaging systems that are poised to enable the science of tomorrow. This talk will introduce freeform optics and highlight emerging design methods. We will then present success stories in digital-viewfinder, imager, and spectrometer designs, which we anticipate will ignite discussion and stimulate cooperation in enabling knowledge in freeform optics. Building on this foundation, we will introduce the concept of a metaform to address a need in near-eye displays.
Further Information:
Holly Rushmeier
(Yale)
Design Tools for Material Appearance

Please LOG IN to view the video.
Date: Oct 27, 2021
Description:
The design of material appearance for both virtual and physical design remains a challenging problem. There aren’t straightforward intuitive techniques as there are in geometric design where shapes can be sketched or assembled from geometric primitives. In this talk I will present a series of contributions to developing intuitive appearance design tools. This includes studies of material appearance perception which form the basis of the development of perceptual axes for reflectance distribution design. I will also present novel interfaces for design including hybrid slider/image navigation and augmented reality interfaces. I will discuss the unique problems involved in designing appearance for objects to be physically manufactured rather than simply displayed in virtual environments. Finally, I will show how exemplars of spatially varying materials can be inverted to produce procedural models.
Further Information:
Michelle Digman
(UC IRvine)
Metabolic imaging using the phasor approach to FLIM and tracking phenotypic change of mitochondria in cancer cells with Mitometer

Please LOG IN to view the video.
Date: Oct 20, 2021
Description:
In this talk I will discuss the phasor approach technique in fluorescence lifetime imaging microscopy (FLIM) as a novel method to measure metabolic alteration as a function of extracellular matrix (ECM) mechanics. To measure mitochondria contribution to metabolic switching, we developed a new algorithm called “mitometer” that is unbiased and allows for automated segmentation and tracking of mitochondria in live cell 2D and 3D time-lapse images. I will show how Mitometer measures mitochondria of triple-negative breast cancer cells. Results show they are faster, more directional, and more elongated than those in their receptor-positive counterparts. Furthermore, Mitometer shows that mitochondrial motility and morphology in breast cancer, but not in normal breast epithelia, correlate with fractions of the reduced form of NADH. Together, the automated segmentation and tracking algorithms and the innate user interface make Mitometer a broadly accessible tool.
Further Information:
Agastya Kalra
(Akasha)
Polarized Computational Imaging and Beyond

Please LOG IN to view the video.
Date: Oct 13, 2021
Description:
Further Information:
Ruth Rosenholtz
(MIT)
“Human vision at a glance”

Please LOG IN to view the video.
Date: Oct 6, 2021
Description:
Further Information:
Lee Redden
(Blue River Technology, John Deere)
“Entrepreneurship, AI, and Agriculture: How ML is Changing Industries”

Please LOG IN to view the video.
Date: September 29, 2021
Description:
1. His entrepreneurship journey and the formation of Blue River Technology from a class at Stanford to an idea worth working on, to a company operating on 10% of the lettuce in the US.
2. The computer vision technology and journey behind the see and spray system up until acquisition
3. Advances in control systems
Further Information:
Ofer David
(BrightWay Vision)
“Recent developments in GatedVision Imaging — seeing the unseen”

Please LOG IN to view the video.
Date: September 22, 2021
Description:
Imaging is the basic building block for automotive autonomous driving. Any computer vision system will require a good image as an input at all driving conditions. GatedVision provides an extra layer on top of the regular RGB/RCCB sensor to augment it at night time and harsh weather conditions. GatedVision images in darkness and different weather conditions will be shared. Imagine that you could detect a small target laying on the road having the same reflectivity as the back ground meaning no contrast, GatedVision can manipulate the way an image is captured so that contrast can be extracted. Additional imaging capabilities of GatedVision will be presented.
Further Information:
Dr. Ofer David received his B.Sc. and M.Sc. from the Technion Haifa, Israel. PhD in electro-optics from Ben Gurion University. He was the head of the Electro-Optic group at Elbit Systems, the 2nd largest defense company in Israel. Ofer has more than 25 years’ experience in the field of Electro-Optics and has various publications and patents on active imaging systems and laser detection. Other areas in which Ofer is involved include long range active imaging systems, imaging trough fog at day/night, visibility measurement systems, automotive night vision systems, laser warning systems and pptical detection systems. He is the co-founder and CEO of BrightWay Vision, an automotive nighttime camera system provider.
Liang Shi
(MIT)
“Learning-based 3D Computer-Generated Holography”

Please LOG IN to view the video.
Date: June 2, 2021
Description:
Computer generated holography (CGH) is fundamental to applications such as biosensing, volumetric display, optical/acoustic tweezer, security and many others that require spatial control of intricate optical or acoustic fields. For near-eye displays, CGH provides the opportunity to support true 3D projection in a sunglass-like display. Yet, the conventional approach to compute a true 3D hologram via physical simulation of diffraction and inference is slow and unaware of occlusion. Moreover, experimental results are often inferior to simulations due to non-idealized optical systems, non-linear and non-uniform SLM responses, and image degradation caused by complex to phase-only conversion. These computational and hardware-imposed challenges together limit the interactiveness and realism of the ultimate immersive experience. In this talk, I will describe techniques to mitigate these challenges, including physical simulation algorithms that handle occlusion for RGB-D and more advanced 3D input, methods to create large-scale 3D hologram datasets, training of CNNs to speed up complex and phase-only hologram synthesis, and approaches to compensate hardware limitations. Together, the resulted system can synthesis and display photorealistic 3D holograms in real-time using a single consumer-grade GPU and run interactively on an iPhone leveraging the Neural Engine. I will further discuss possible extensions that could be built top of the proposed system to support foveated rendering, static pupil expansion, view-dependent effect and other features.
Further Information:
Liang Shi is a PhD student in Prof. Wojciech Matusik’s Computational Fabrication and Design Group at MIT CSAIL. He received his B.E. from Beihang University, M.Sc. from Stanford University, and was a member of Prof. Gordon Wetzstein’s Computational Imaging Lab. He was a research intern at NVIDIA, Adobe, Facebook Reality Lab. His current research interests include computational display, fabrication, and appearance modeling.
Matt Pharr
(NVIDIA)
“Real-Time Ray Tracing and the Reinvention of the Graphics Pipeline”

Please LOG IN to view the video.
Date: May 26, 2021
Description:
For many years, real-time ray tracing was the technology of the future; in 2008, David Kirk famously quipped that it always would be. There were plenty of reasons to doubt that the approach would be suitable for real-time rendering, many of them firmly believed by the speaker. Yet dedicated hardware for ray tracing has now arrived in recent GPUs. Its greatest successes so far have come from not the direct application of existing offline ray-tracing algorithms to real-time, but instead from the reinvention of fundamental rendering algorithms accounting for the constraints of real-time rendering. In this talk, I will survey the history of real-time ray tracing and some of the near misses along the way. I’ll then discuss how real-time rendering is changing with the capabilities offered by the high-performance and arbitrary visibility queries offered by ray tracing.
Further Information:
Matt Pharr is a Distinguished Research Scientist at NVIDIA where he works on ray-tracing and real-time rendering. He is the author of the book “Physically Based Rendering: From Theory To Implementation” for which he and the co-authors were awarded a Scientific and Technical Academy Award in 2014 for the book’s impact on the film industry. He has a Ph.D. in computer science from the Stanford graphics lab and a B.S. in computer science from Yale.
Michael Kudenov
(North Carolina State University)
“Mantis shrimp–inspired organic photodetector for simultaneous hyperspectral and polarimetric imaging-enabling advanced single-pixel architectures”

Please LOG IN to view the video.
Date: May 19, 2021
Description:
Many spectral and polarimetric cameras implement complex spatial, temporal, and spectral re-mapping strategies to measure a signal within a given use-case’s specifications and error tolerances. This re-mapping results in a complex tradespace that is challenging to navigate; a tradespace driven, in part, by the limited degrees of freedom available in inorganic detector technology. This presentation overviews a new kind of organic detector and pixel architecture that enables single-pixel tandem detection of both spectrum and polarization. By using organic detectors’ semitransparency and intrinsic anisotropy, the detector minimizes spatial and temporal resolution tradeoffs while showcasing thin-film polarization control strategies.
Further Information:
Dr. Kudenov obtained his BS in Electrical Engineering from the University of Alaska Fairbanks in 2005 and his PhD in Optical Sciences from the University of Arizona in 2009. He is currently an Associate Professor in Electrical and Computer Engineering at North Carolina State University in Raleigh, NC, where he runs the Optical Sensing Lab (https://research.ece.ncsu.edu/osl/). His research interests focus on use-inspired imaging spectrometer and polarimeter sensors (UV to IR) for applications spanning agriculture, plant phenotyping (http://www.sweetpotatoanalytics.com), biomining and remediation, and quality control. He also serves as the academic advisor for the NC State SPIE student chapter.
Eero Simoncelli
(NYU)
“Photographic Image Priors in the Era of Machine Learning”

Please LOG IN to view the video.
Date: May 12, 2021
Description:
Prior probability models are a central component of the statistical formulation of inverse problems, but density estimation is a notoriously difficult problem for high dimensional signals such as photographic images. Machine learning methods have produced impressive solutions for many inverse problems, greatly surpassing those achievable with simple prior models, but these are often not well understood and don’t generalize well beyond their training context. About a decade ago, a new approach known as “plug-and-play” was proposed, in which a denoiser is used as an algorithmic component for imposing prior information. I’ll describe our progress in understanding and using this implicit prior. We derive a surprisingly simple algorithm for drawing high-probability samples from the implicit prior embedded within a CNN trained to perform blind (i.e., unknown noise level) least-squares Gaussian denoising. A generalization of this algorithm to constrained sampling provides a method for solving *any* linear inverse problem, with no additional training, and no further distributional assumptions. We demonstrate this general form of transfer learning in multiple applications, using the same algorithm to produce state-of-the-art solutions for deblurring, super-resolution, and compressive sensing. I’ll also discuss extensions to visualizing information capture in foveated visual systems. This is joint work with Zahra Kadkhodaie, Sreyas Mohan, and Carlos Fernandez-Granda.
Further Information:
Eero Simoncelli is Silver Professor at New York University and the Director of the Center for Computational Neuroscience at the Flatiron Institute of the Simons Foundation. He is a Fellow of the Institute of Electrical and Electronics Engineers and was a Howard Hughes Medical Institute Investigator from 2000-2020 Simoncelli received his B.S. in physics (summa cum laude) in 1984 from Harvard University, studied applied mathematics at Cambridge University for a year and a half, and then received his M.S. in 1988 and his Ph.D. in 1993, both in electrical engineering from Massachusetts Institute of Technology. He was an assistant professor of computer and information science at the University of Pennsylvania from 1993 until 1996. He moved to New York University in September 1996, where he is currently a professor of neural science, mathematics, data science and psychology. His research interests span a wide range of topics in the representation and analysis of visual images and sounds in both machine and biological vision systems. In addition to his role as scientific director, Simoncelli remains an Investigator with the Simons Collaboration on the Global Brain.
Angjoo Kanazawa
(Stanford)
“Pushing the Boundaries of Novel View Synthesis”

Please LOG IN to view the video.
Date: May 05, 2021
Description:
2020 was a turbulent year, but for 3D learning it was a fruitful one with lots of exciting new tools and ideas. In particular, there have been many exciting developments in the area of coordinate based neural networks and novel view synthesis. In this talk I will discuss our recent work on single image view synthesis with pixelNeRF, which aims to predict a Neural Radiance Field (NeRF) from a single image. I will discuss how NeRF representation allows models like pixel-aligned implicit functions (PiFu) to be trained without explicit 3D supervision and the importance of other key design factors such as predicting in view coordinate-frame and handling multi-view inputs. I will also touch upon our recent work that allows real-time rendering of NeRFs. Then, I will discuss Infinite Nature, a project in collaboration with teams at Google NYC, where we explore how to push the boundaries of novel view synthesis and generate views way beyond the edges of the initial input image, resulting in a controllable video generation of a natural scene.
Further Information:
Angjoo Kanazawa is an Assistant Professor in the Department of Electrical Engineering and Computer Science at the University of California at Berkeley. Previously, she was a BAIR postdoc at UC Berkeley advised by Jitendra Malik, Alexei A. Efros and Trevor Darrell. She completed her PhD in CS at the University of Maryland, College Park with her advisor David Jacobs. Prior to UMD, she obtained her BA in Mathematics and Computer Science at New York University. She has also spent time at the Max Planck Institute for Intelligent Systems with Michael Black and Google NYC with Noah Snavely. Her research is at the intersection of computer vision, graphics, and machine learning, focusing on 4D reconstruction of the dynamic world behind everyday photographs and video. She has been named a Rising Star in EECS and is a recipient of Anita Borg Memorial Scholarship, best paper award in Eurographics 2016 and the Google Research Scholar Award 2021.
Julien Martel
(Stanford)
“Neural Representations: Coordinate Based Networks for Fitting Signals, Derivatives, and Integrals”

Please LOG IN to view the video.
Date: April 28, 2021
Description:
Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal’s spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. In this talk, we describe how sinusoidal representation networks or SIREN, are ideally suited for representing complex natural signals and their derivatives. Using SIREN, we demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how SIRENs can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. While SIREN can be used to fit signals and their derivatives, we also introduce a new framework for solving integral equations using implicit neural representation networks. Our automatic integration framework, AutoInt, enables the calculation of any definite integral with two evaluations of a neural network. We apply our approach for efficient integration to the problem of neural volume rendering. Finally we present a novel architecture and training procedure able to fit data such as gigapixel images or fine-detailed 3D geometry, demonstrating those neural representations are now ready to be used in large scale scenarios.
Further Information:
Julien Martel (http://www.jmartel.net/) is a postdoctoral scholar in the Stanford Computational Imaging Lab. His research interests are in unconventional visual sensing and computing. More specifically, his current topics of research include the co-design of hardware and algorithms for visual sensing, the design of methods for vision sensors with in-pixel computing capabilities, and the use of novel neural representations to store and compute on visual data.
Andrew Watson
(Apple)
“The Chromatic Pyramid of Visibility”

Please LOG IN to view the video.
Date: April 21, 2021
Description:
A fundamental limit to human vision is our ability to sense variations in light intensity over space and time. These limits have been formalized in the spatio-temporal contrast sensitivity function, which is now a foundation of vision science. This function has also proven to be the foundation of much applied vision science, providing guidance on spatial and temporal resolution for modern imaging technology. The Pyramid of Visibility is a simplified model of the human spatio-temporal luminance contrast sensitivity function (Watson, Andrew B.; Ahumada, Albert J. 2016). It posits that log sensitivity is a linear function of spatial frequency, temporal frequency, and log mean luminance. It is valid only away from the spatiotemporal frequency origin. It has recently been extended to peripheral vision to define the Field of Contrast Sensitivity (Watson 2018). Though very useful in a range of applications, the pyramid would benefit from an extension to the chromatic domain. In this talk I will describe our efforts to develop this extension. Among the issues we address are the choice of color space, the definition of color contrast, and how to combine sensitivities among luminance and chromatic pyramids.
Watson, A. B. (2018). “The Field of View, the Field of Resolution, and the Field of Contrast Sensitivity.” Journal of Perceptual Imaging 1(1): 10505-10501-10505-10511.
Watson, A. B. and A. J. Ahumada (2016). “The pyramid of visibility.” Electronic Imaging 2016(16): 1-6.
Further Information:
Andrew B. Watson is the Chief Vision Scientist at Apple, Inc. where he leads the application of vision science to a broad range of Apple technologies, applications, devices and displays. Dr. Watson attended Columbia University and received a PhD in Psychology from the University of Pennsylvania. From 1982 to 2016 he was the Senior Scientist for Vision Research at NASA Ames Research Center in California. His research focuses on computational models of early vision and application of vision science to imaging technology. In 2001, Watson founded the Journal of Vision, and served as Editor-in-Chief for 2001-2013 and 2018-2022. He is a Fellow of the Optical Society of America, of the Association for Research in Vision and Ophthalmology, of the Society for Information Display. He is the recipient of several awards, including the H. Julian Allen Award from NASA, the Otto Schade Award from the Society for Information Display, the Special Recognition Award from the Association for Research in Vision and Ophthalmology, and the Holst Award from Philips Research and The Technical University of Eindhoven. In 2011, he received the Presidential Rank Award from the President of the United States.
Andreas Geiger
(University of Tübingen)
“Neural Implicit Representations for 3D Vision”

Please LOG IN to view the video.
Date: April 14, 2021
Description:
In this talk, I will show several recent results of my group on learning neural implicit 3D representations, departing from the traditional paradigm of representing 3D shapes explicitly using voxels, point clouds or meshes. Implicit representations have a small memory footprint and allow for modeling arbitrary 3D topologies at (theoretically) arbitrary resolution in continuous function space. I will show the ability and limitations of these approaches in the context of reconstructing 3D geometry, texture and motion. I will further demonstrate a technique for learning implicit 3D models using only 2D supervision through implicit differentiation of the level set constraint. Finally, I will demonstrate how implicit models can tackle large-scale reconstructions and introduce GRAF and GIRAFFE which are generative 3D models for neural radiance fields that are able to generate 3D consistent photo-realistic renderings from unstructured and unposed image collections.
Further Information:
Andreas Geiger is professor at the University of Tübingen and group leader at the Max Planck Institute for Intelligent Systems. Prior to this, he was a visiting professor at ETH Zürich and a research scientist at MPI-IS. He studied at KIT, EPFL and MIT and received his PhD degree in 2013 from the KIT. His research interests are at the intersection of 3D reconstruction, motion estimation, scene understanding and sensory-motor control. He maintains the KITTI vision benchmark and coordinates the ELLIS PhD and PostDoc program.
Steve Seitz
(University of Washington and Google)
“Slow Glass”

Please LOG IN to view the video.
Date: April 7, 2021
Description:
Wouldn’t it be fascinating to be in the same room as Abraham Lincoln, visit Thomas Edison in his laboratory, or step onto the streets of New York a hundred years ago? We explore this thought experiment, by tracing ideas from science fiction through antique stereographs to the latest work in generative adversarial networks (GANs) to step back in time to experience these historical people and places not in black and white, but much closer to how they really appeared. In the process, I’ll present our latest work on Keystone Depth, and Time Travel Rephotography.
Further Information:
Steve Seitz is Robert E. Dinning Professor in the Allen School at the University of Washington. He is also a Director on Google’s Daydream team, where he leads teleportation efforts including Google Jump and Cardboard Camera. Prof. Seitz also co-directs the UW Reality Lab. He received his Ph.D. in computer sciences at the University of Wisconsin in 1997. Following his doctoral work, he did a postdoc at Microsoft Research, and then a couple years as Assistant Professor in the Robotics Institute at Carnegie Mellon University. He joined the faculty at the University of Washington in July 2000. His co-authored papers have won the David Marr Prize (twice) at ICCV, and the CVPR 2015 best paper award. He received an NSF Career Award, and ONR Young Investigator Award, an Alfred P. Sloan Fellowship, and is an IEEE Fellow and an ACM Fellow. His work on Photo Tourism (joint with Noah Snavely and Rick Szeliski) formed the basis of Microsoft’s Photosynth technology. Professor Seitz is interested in problems in 3D computer vision and computer graphics, and their application to virtual and augmented reality.
Hany Farid
(UC Berkeley)
“Photographic Forensic Identification”

Please LOG IN to view the video.
Date: March 31, 2021
Description:
Forensic DNA analysis has been critical in prosecuting crimes and overturning wrongful convictions. At the same time, other physical and digital forensic identification techniques, used to link a suspect to a crime scene, are plagued with problems of accuracy, reliability, and reproducibility. Flawed forensic science can have devastating consequences: the National Registry of Exonerations identified that flawed forensic techniques contribute to almost a quarter of wrongful convictions in the United States. I will describe our recent efforts to examine the reliability of two such photographic forensic identification techniques: (1) identification based on purported distinct patterns in clothing; and (2) identification based on measurements of height and weight.
Further Information:
Hany Farid is a Professor at the University of California, Berkeley with a joint appointment in Electrical Engineering & Computer Sciences and the School of Information. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception. He received his undergraduate degree in Computer Science and Applied Mathematics from the University of Rochester in 1989, and his Ph.D. in Computer Science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in Brain and Cognitive Sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019. He is the recipient of an Alfred P. Sloan Fellowship, a John Simon Guggenheim Fellowship, and is a Fellow of the National Academy of Inventors
Lihong Wang
(Caltech)
“World’s Deepest-Penetration and Fastest Optical Cameras: Photoacoustic Tomography and Compressed Ultrafast Photography”

Please LOG IN to view the video.
Date: March 17, 2021
Description:
We developed photoacoustic tomography to peer deep into biological tissue. Photoacoustic tomography (PAT) provides in vivo omniscale functional, metabolic, molecular, and histologic imaging across the scales of organelles through organisms. We also developed compressed ultrafast photography (CUP) to record 10 trillion frames per second in real time, orders of magnitude faster than commercially available camera technologies. CUP can capture the fastest phenomenon in the universe, namely, light propagation, at light speed and can be slowed down for slower phenomena such as combustion.
Further Information:
Lihong Wang is Bren Professor of Medical and Electrical Engineering at Caltech. Published 550 journal articles (h-index = 138, citations = 80,000). Delivered 540 keynote/plenary/invited talks. Published the first functional photoacoustic CT, 3D photoacoustic microscopy, and compressed ultrafast photography (world’s fastest camera). Served as Editor-in-Chief of the Journal of Biomedical Optics. Received the Goodman Book Award, NIH Director’s Pioneer Award, OSA Mees Medal, IEEE Technical Achievement and Biomedical Engineering Awards, SPIE Chance Biomedical Optics Award, IPPA Senior Prize, OSA Feld Biophotonics Award, and an honorary doctorate from Lund University, Sweden. Inducted into the National Academy of Engineering.
Ashok Veeraraghavan
(Rice University)
“Computational Imaging: Beyond the Limits Imposed by Lenses”

Please LOG IN to view the video.
Date: March 10, 2021
Description:
The lens has long been a central element of cameras, since its early use in the mid-nineteenth century by Niepce, Talbot, and Daguerre. The role of the lens, from the Daguerrotype to modern digital cameras, is to refract light to achieve a one-to-one mapping between a point in the scene and a point on the sensor. This effect enables the sensor to compute a particular two-dimensional (2D) integral of theincident 4D light-field. We propose a radical departure from this practice and the many limitations it imposes. In the talk we focus on two inter-related research projects that attempt to go beyond lens-based imaging.
First, we discuss our lab’s recent efforts to build flat, extremely thin imaging devices by replacing the lens in a conventional camera with an amplitude mask and computational reconstruction algorithms. These lensless cameras, called FlatCams can be less than a millimeter in thickness and enable applications where size, weight, thickness or cost are the driving factors. Second, we discuss high-resolution, long-distance imaging using Fourier Ptychography, where the need for a large aperture aberration corrected lens is replaced by a camera array and associated phase retrieval algorithms resulting again in order of magnitude reductions in size, weight and cost. Finally, I will spend a few minutes discussing how the wholistic computational imaging approach can be used to create ultra-high-resolution wavefront sensors.
Further Information:
Ashok Veeraraghavan is currently a Professor of Electrical and Computer Engineering at Rice University, TX, USA. Before joining Rice University, he spent three wonderful and fun-filled years as a Research Scientist at Mitsubishi Electric Research Labs in Cambridge, MA. He received his Bachelors in Electrical Engineering from the Indian Institute of Technology, Madras in 2002 and M.S and PhD. degrees from the Department of Electrical and Computer Engineering at the University of Maryland, College Park in 2004 and 2008 respectively. His thesis received the Doctoral Dissertation award from the Department of Electrical and Computer Engineering at the University of Maryland. His work has won numerous awards including the Hershel. M. Rich Invention Award in 2016 and 2017, and an NSF CAREER award in 2017. He loves playing, talking, and pretty much anything to do with the slow and boring but enthralling game of cricket.
Rafael Piestun
(University of Colorado, Boulder)
“A look towards the future of computational optical microscopy”

Please LOG IN to view the video.
Date: March 3, 2021
Description:
Optical computational imaging seeks enhanced performance and new functionality by the joint design of illumination, unconventional optics, detectors, and reconstruction algorithms. Among the emergent approaches in this field, two remarkable examples enable overcoming the diffraction limit and imaging through complex media.
Abbe’s resolution limit has been overcome enabling unprecedented opportunities for optical imaging at the nanoscale. Fluorescence imaging using photoactivatable or photoswitchable molecules within computational optical systems offers single molecule sensitivity within a wide field of view. The advent of three-dimensional point spread function engineering associated with optimal reconstruction algorithms provides a unique approach to further increase resolution in three dimensions.
Focusing and imaging through strongly scattering media has also been accomplished recently in the optical regime. By using a feedback system and optical modulation, the resulting wavefronts overcome the effects of multiple scattering upon propagation through the medium. Phase-control holographic techniques help characterize scattering media at high-speed using micro-electro-mechanical technology, allowing focusing through a temporally dynamic, strongly scattering sample, or a multimode fiber. In this talk we will further discuss implications for ultrathin optical endoscopy and adaptive nonlinear wavefront shaping.
Further Information:
Prof. Rafael Piestun received MSc. and Ph.D. degrees in Electrical Engineering from the Technion – Israel Institute of Technology. From 1998 to 2000 he was a researcher at Stanford University. Since 2001 he has been with the Department of Electrical and Computer Engineering and the Department of Physics at the University of Colorado – Boulder. Professor Piestun is a fellow of the Optical Society of America, was a Fulbright scholar, an Eshkol fellow, received a Honda Initiation Grant award, a Minerva award, a Provost Achievement Award, and El-Op and Gutwirth prizes. He served in the editorial committee of Optics and Photonics News and was associate editor for Applied Optics. He was the Director and Principal Investigator of the NSF-IGERT program in Computational Optical Sensing and Imaging and is co-Principal Investigator of the NSF Science and Technology Center STROBE. He is also founder of the startup Double Helix Optics that received the SPIE Prism Award and the First Place in the Luminate Competition in Optics and Photonics. His areas of interest include computational optical imaging, superresolution microscopy, volumetric photonic devices, scattering optics, and ultrafast optics.
Hayk Martiros
(Skydio)
“Skydio Autonomy: Research in Robust Visual Navigation and Real-Time 3D Reconstruction”

Please LOG IN to view the video.
Date: February 24, 2021
Description:
Skydio is the leading US drone company and the world leader in autonomous flight. Our drones are used for everything from capturing amazing video, to inspecting bridges, to tracking progress on construction sites.
At the core of our products is a vision-based autonomy system with seven years of development at Skydio, drawing on decades of academic research. This system pushes the state of the art in deep learning, geometric computer vision, motion planning, and control with a particular focus on real-world robustness.
Drones encounter extreme visual scenarios not typically considered by academia nor encountered by cars, ground robots, or AR applications. They are commonly flown in scenes with few or no semantic priors and must deftly navigate thin objects, extreme lighting, camera artifacts, motion blur, textureless surfaces, and water. These challenges are daunting for classical vision because photometric signals are simply not consistent, and for learning-based methods because there is no ground truth for direct supervision of deep networks. In this talk we’ll take a detailed look at our approaches to these problems.
We will also discuss new capabilities on top of our core navigation engine to autonomously map complex scenes and build high quality digital twins, by performing real-time 3D reconstruction across multiple flights. Our vision-based 3D Scan approach allows anyone to build millimeter-scale maps of the world.
Further Information:
Hayk was the first engineering hire at Skydio and he leads the autonomy team. He is an experienced roboticist who develops robust approaches to computer vision, deep learning, nonlinear optimization, and motion planning to bring intelligent robots into the mainstream. His team’s state-of-the-art work in UAV visual navigation of complex scenarios is at the core of every Skydio drone. He also has a deep interest in systems architecture and symbolic computation. His previous works include novel hexapedal robots, collaboration between robot arms, micro-robot factories, solar panel farms, and self-balancing motorcycles. Hayk is a graduate of Stanford University and Princeton University.
Noah Snavely
(Cornell University)
“The Plenoptic Camera”

Please LOG IN to view the video.
Date: February 17, 2021
Description:
Imagine a futuristic version of Google Street View that could dial up any possible place in the world, at any possible time. Effectively, such a service would be a recording of the plenoptic function—the hypothetical function described by Adelson and Bergen that captures all light rays passing through space at all times. While the plenoptic function is completely impractical to capture in its totality, every photo ever taken represents a sample of this function. I will present recent methods we’ve developed to reconstruct the plenoptic function from sparse space-time samples of photos—including Street View itself, as well as tourist photos of famous landmarks. The results of this work include the ability to take a single photo and synthesize a full dawn-to-dusk timelapse video, as well as compelling 4D view synthesis capabilities where a scene can simultaneously be explored in space and time.
Further Information:
Noah Snavely is an associate professor of Computer Science at Cornell University and Cornell Tech, and also a researcher at Google Research. Noah’s research interests are in computer vision and graphics, in particular 3D understanding and depiction of scenes from images. Noah is the recipient of a PECASE, a Microsoft New Faculty Fellowship, an Alfred P. Sloan Fellowship, and a SIGGRAPH Significant New Researcher Award.
Yifan (Evan) Peng
(Stanford University)
“Neural Holography: Incorporating Optics and Artificial Intelligence for Next-generation Computer-generated Holographic Displays”

Please LOG IN to view the video.
Date: February 10, 2021
Description:
Holographic displays promise unprecedented capabilities for direct-view displays as well as virtual and augmented reality applications. However, one of the biggest challenges for computer-generated holography (CGH) is the fundamental tradeoff between algorithm runtime and achieved image quality. Moreover, the image quality achieved by most holographic displays is low, due to the mismatch between the optical wave propagation of the display and its simulated model. We develop an algorithmic CGH framework that achieves unprecedented image fidelity and real-time framerates. Our framework comprises several parts, including a novel camera-in-the-loop optimization strategy that allows us to either optimize a hologram directly or train an interpretable model of the optical wave propagation and a neural network architecture that represents the first CGH algorithm capable of generating full-color high-quality holographic images at FHD resolution in real-time. Based on this framework, we further propose a holographic display architecture using two SLMs, where the camera-in-the-loop optimization with an automated calibration procedure is applied. As such, both diffracted and undiffracted light on the target plane are acquired to update hologram patterns on SLMs simultaneously. The experimental demonstration delivers higher contrast and less noisy holographic images without the need for extra filtering, compared to conventional single SLM-based systems. In summary, we envision that bringing artificial intelligence advances into conventional optics/photonics research opens many opportunities to both communities and is promising to enable high fidelity imaging and display solutions.
Further Information:
Yifan (Evan) Peng is a Postdoctoral Research Fellow at Stanford University in Computational Imaging Lab. His research interest rides across the interdisciplinary fields of optics/photonics, computer graphics, and computer vision. Much of his recent work concerns developing computational imaging modalities combining optics and algorithms, for both cameras and displays. He completed his Ph.D. in Computer Science at the University of British Columbia, and his M.Sc. and B.E. in Optical Science and Engineering at Zhejiang University. During the Ph.D. career, he was also a Visiting Research Student at Stanford Computational Imaging Lab and at Visual Computing Center, King Abdullah University of Science and Technology. He has recently served professional roles as committees and reviewers for several venues of IEEE, OSA, SPIE, and SID.
Ulugbek Kamilov
(Washington University in St. Louis)
“Computational Imaging: Reconciling Models and Learning”

Please LOG IN to view the video.
Date: February 3, 2021
Description:
There is a growing need in biological, medical, and materials imaging research to recover information lost during data acquisition. There are currently two distinct viewpoints on addressing such information loss: model-based and learning-based. Model-based methods leverage analytical signal properties (such as sparsity) and often come with theoretical guarantees and insights. Learning-based methods leverage flexible representations (such as convolutional neural nets) for best empirical performance through training on big datasets. The goal of this talk is to introduce a Regularization by Artifact Removal (RARE) framework that reconciles both viewpoints by providing the “deep learning prior” counterpart of the classical regularized inversion. This is achieved by specifying “artifact-removing deep neural nets” as a mechanism to infuse learned priors into recovery problems, while maintaining a clear separation between the prior and physics-based acquisition models. Our methodology can fully leverage the flexibility offered by deep learning by designing learned prior to be used within our new family of fast iterative algorithms. Yet, our results indicate that the such algorithms can achieve state-of-the-art performance in different computational imaging tasks, while also being amenable to rigorous theoretical analysis. We will focus on the application of the methodology to the problem to various biomedical imaging modalities, such as magnetic resonance imaging and intensity diffraction tomography.
Further Information:
Ulugbek S. Kamilov is an Assistant Professor and Director of Computational Imaging Group (CIG) at Washington University in St. Louis. His research area is computational imaging with an emphasis on theory and algorithms for applications in biomedical imaging. His research interests include signal and image processing, machine learning, and optimization. He obtained the BSc and MSc degrees in Communication Systems, and the PhD degree in Electrical Engineering from EPFL, Switzerland, in 2008, 2011, and 2015, respectively. From 2015 to 2017, he was a Research Scientist at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA, USA.
He is a recipient of the IEEE Signal Processing Society’s 2017 Best Paper Award (with V. K. Goyal and S. Rangan). His Ph.D. thesis was selected as a finalist for the EPFL Doctorate Award in 2016. He is serving as an Associate Editor for the IEEE Transactions on Computational Imaging (2019-present) and Biological Imaging (2020-present). He is also a member of IEEE Technical Committee on Computational Imaging (2016-2019, 2019-present).
Anton Kaplanyan
(Facebook)
“Recent advances and current challenges of graphics for fully immersive augmented and virtual reality”

Please LOG IN to view the video.
Date: January 27, 2021
Description:
Head-mounted displays for augmented and virtual reality are gaining popularity as new and more immersive media. They promise seamless user immersion into virtual worlds and smart augmentation of the real world around us. In order to enable this we need to develop intelligent and power-efficient systems for capturing, representing and rendering virtual content. Key questions include: Can we capture a real object and re-render it as a virtual object in a different environment? How can we leverage human perception to render photorealistic objects efficiently on wearable augmented and virtual reality (AR/VR) headsets? How intelligent can capturing and reconstruction be? In this talk, I will discuss recent efforts in neural and inverse graphics and how they improve the efficiency, immersion, and photorealism of captured and virtual environments. I will also talk about Facebook LiveMaps and Project Aria, as well as the major open challenges on our way to fully immersive and intelligent wearable AR/VR headsets.
Further Information:
Anton Kaplanyan is a research scientist at Facebook Reality Labs. His current research goal is to advance neural and differentiable rendering towards improved immersion and efficiency in the future augmented and virtual reality. Anton worked on Nvidia’s RTX ray tracing hardware research and real-time denoising methods at Nvidia Research, as well as real-time graphics research in CryENGINE 3 game engine and Crysis franchise. Anton holds a PhD in physically based light transport simulation from Karlsruhe Institute of Technology, Germany.
Vincent Sitzmann
(MIT)
“Self-supervised Scene Representation Learning”

Please LOG IN to view the video.
Date: January 20, 2021
Description:
Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. Such Neural Scene Representations may subsequently support a wide variety of downstream tasks, ranging from robotics to computer graphics to medical imaging. However, existing methods ignore one of the most fundamental properties of scenes: their three-dimensional structure. In this talk, I will make the case for equipping Neural Scene Representations with an inductive bias for 3D structure, enabling self-supervised discovery of shape and appearance from few observations. By embedding an implicit scene representation in a neural rendering framework and learning a prior over these representations, I will show how we can enable 3D reconstruction from only a single posed 2D image. I will show how the features we learn in this process are already useful to the downstream task of semantic segmentation. I will then show how gradient-based meta-learning can enable fast inference of implicit representations.
Further Information:
Vincent Sitzmann is a postdoc in Joshua Tenenbaum’s group at MIT CSAIL. He previously finished his PhD at Stanford University with a thesis on “Self-Supervised Scene Representation Learning”. His research interest lies in neural scene representations – the way neural networks learn to represent information on our world. His goal is to allow independent agents to reason about our world given visual observations, such as inferring a complete model of a scene with information on geometry, material, lighting etc. from only few observations, a task that is simple for humans, but currently impossible for AI.
Andrew Maimone
(Facebook)
“Holographic optics for AR/VR”

Please LOG IN to view the video.
Date: January 13, 2021
Description:
Holographic optics are an exciting tool to increase the performance and reduce the size and weight of augmented and virtual reality displays. In this talk, I will describe two types of holographic optics that can be applied to AR/VR, as recently outlined in two ACM SIGGRAPH publications. In the first part, I will describe how static holographic optics can be used to replace conventional optical elements, such as refractive lenses, to enable highly compact, sunglasses-like virtual reality displays while retaining high performance. In the second part, I’ll describe the potential of dynamic holography to replace the conventional image formation process and enable compact and high performance augmented reality displays. In particular, I will focus on a key challenge of dynamic holographic displays, limited etendue, and present a candidate solution to increase etendue through the co-design of a simple scattering mask and hologram optimization.
Further Information:
Andrew Maimone is a research scientist in the Display Systems Research team at Facebook Reality Labs Research. His research focuses on the use of novel optics and computing to enhance virtual and augmented reality displays. He has published novel display designs using holography and light fields with an emphasize on wearable displays with wide fields of view, compact form factors, high resolution, and support for the accommodation depth cue. Previously, Andrew was a researcher at Microsoft Research NExT and completed a PhD in computer science at the University of North Carolina at Chapel Hill.
- Christian Theobalt » Neural Rendering
- Wenzel Jakob » Differentiable Simulation of Light
- Jannick Rolland » Metaform Optics
- Holly Rushmeier » material appearance
- Michelle Digman » phasor approach to FLIM and Mitometer
- Agastya Kalra » Polarized Computational Imaging
- Ruth Rosenholtz » Human vision at a glance
- Lee Redden » Entrepreneurship, AI, and Agriculture
- Ofer David » Gated CMOS Imaging
- Liang Shi » Computer-Generated Holography
- Matt Pharr » Real-Time Ray Tracing
- Michael Kudenov » Simultaneous hyperspectral and polarimetric imaging
- Eero Simoncelli » Photographic Image Priors
- Angjoo Kanazawa » Neural Representations
- Julien Martel » Neural Representations
- Andrew Watson » Visibility
- Andreas Geiger » Neural Implicit Representations
- Steve Seitz » Slow Glass
- Hany Farid » Photographic Forensic Identification
- Lihong Wang » Photoacoustic Tomography
- Ashok Veeraraghavan » Computational Imaging
- Rafael Piestun » computational optical microscopy
- Hayk Martiros » Skydio
- Noah Snavely » Plenoptic Camera
- Yifan (Evan) Peng » Neural Holography
- Ulugbek Kamilov » Computational Imaging: Reconciling Models and Learning
- Anton Kaplanyan » Graphics for fully immersive augmented and virtual reality
- Vincent Sitzmann » Self-supervised Learning
- Andrew Maimone » Holographic optics for AR/VR
SCIEN Colloquia 2020
Michael Broxton
(Google)
“Immersive Light Field Video with a Layered Mesh Representation”

Please LOG IN to view the video.
Date: November 18, 2020
Description:
In this talk I will describe our system for capturing, reconstructing, compressing, and rendering high quality immersive light field video. We record immersive light fields using a custom array of 46 time-synchronized cameras distributed on the surface of a hemispherical, 92cm diameter dome. From this data we produce 6DOF volumetric videos with a wide 80-cm viewing baseline, 10 pixels per degree angular resolution, and a wide field of view (>220 degrees), at 30fps video frame rates. Even though the cameras are placed 18cm apart on average, our system can reconstruct objects as close as 20cm to the camera rig. We accomplish this by leveraging the recently introduced DeepView view interpolation algorithm, replacing its underlying multi-plane image (MPI) scene representation with a collection of spherical shells which are better suited for representing panoramic light field content. We further process this data to reduce the large number of shell layers to a small, fixed number of RGBA+depth layers without significant loss in visual quality. The resulting RGB, alpha, and depth channels in these layers are then compressed using conventional texture atlasing and video compression techniques. The final, compressed representation is lightweight and can be rendered on mobile VR/AR platforms or in a web browser.
Further Information:
From satellites orbiting Mars and the Moon to microscopes peering into the brains of mice and zebrafish, Michael has worked on imaging and computer vision problems spanning the macrocosmos to the microcosmos. After working early in his career at Los Alamos National Lab, MIT, and NASA Ames Research Center, Michael returned to get his PhD at Stanford under Marc Levoy. There he discovered a deep interest in light fields, and has been researching them ever since. Michael joined Google in 2018 and has been working to develop new deep learning methods to solve light field imaging and view synthesis problems.
Sixian You
(University of California at Berkeley)
“Label-free optical imaging of living biological systems”

Please LOG IN to view the video.
Date: November 11, 2020
Description:
Label-free optical imaging of living biological systems offers rich information that can be of immense value for a variety of biomedical tasks. Despite the exceptional theoretical potential, current label-free microscopy platforms are challenging for real-world clinical and biological applications. The major obstacles include the lack of flexible laser sources, limited contrast, and the challenge of acquiring and interpreting the high-dimensional dataset.
In this talk, I will present new optical imaging platforms and methodologies that will address these challenges. By generating and tailoring coherent supercontinuum from photonic crystal fibers, simultaneous metabolic and structural imaging can be achieved without aids of stains, enabling perturbation-free exploration of living systems. These capabilities further motivate development of analytical tools for image-based segmentation and diagnosis, showing broad potential of this label-free imaging technology in discovering new metabolic biomarkers and enabling real-time point-of-procedure applications.
Further Information:
Sixian You is a postdoctoral research fellow in the Computational Imaging Lab at UC Berkeley and will join MIT EECS department in March 2021. Previously, Sixian received her Ph.D. in 2019 from the University of Illnois, Urbana-Champaign (UIUC) where she studied biomedical optical imaging with Dr. Stephen Boppart and Dr. Saurabh Sinha. Between Ph.D. and postdoc, she worked on optical sensing technologies at Apple. Her primary research interest is in developing innovative optical imaging solutions for biomedicine.
David Williams
(University of Rochester)
“Functional imaging and control of retinal ganglion cells in the living primate eye”

Please LOG IN to view the video.
Date: October 28, 2020
Description:
The encapsulation of the retina inside the eye has always challenged our ability to study the anatomy and physiology of retinal neurons in their native state. Our group is developing new tools using adaptive optics that allow not only structural imaging but also functional recording and control of retinal neurons at a cellular spatial scale. By combining adaptive optics with calcium imaging, we can optically record from hundreds of ganglion cells in the nonhuman primate eye over periods as along as years. This approach is especially well-suited for recording from cells serving the central fovea, which has been difficult to access with microelectrodes. Using optogenetics, we can also directly excite these same ganglion cells with light in the living animal. These capabilities together establish a two-way communication link with retinal ganglion cells. I will discuss the advantages and the current limitations of these approaches, as well as speculate about possible future applications for vision restoration and understanding the role of the retina in perception
Further Information:
David Williams received his Ph.D. from the University of California, San Diego in 1979. He was a postdoctoral fellow at Bell Laboratories, Murray Hill in 1980 and joined the University of Rochester in 1981, where he is currently William G. Allyn Professor of Medical Optics in the Institute of Optics and Director of the Center for Visual Science. Williams’ interests include the optical and neural limits on spatial and color vision, functional imaging of the retina, optical instrumentation for the eye, and vision restoration.
Professor Eric Fossum (Dartmouth), Prof. Stanley Chan (Purdue) and Dr. Jiaju Ma (Gigajot Technology)
“Quanta Image Sensors: Concept, Progress, and Commercialization”

Please LOG IN to view the video.
Date: October 21, 2020
Description:
The Quanta Image Sensor (QIS), a photon-counting image sensor, counts each electron generated in the sensor chip and then applies computational imaging to create a gray scale image or extract other information. First proposed in 2005, the QIS has been implemented starting around 2015 by using a CMOS image sensor (CIS) based approach, CIS-QIS, and by using a single-photon avalanche detector (SPAD) approach, SPAD-QIS. Both are visible-light devices based on silicon. This talk will focus on the CIS-QIS developed at Dartmouth and being commercialized by Gigajot with computational imaging development at Purdue and Gigajot. The CIS-QIS device has been demonstrated with up to 20Mpixels per chip and does not use avalanche multiplication which allows for small pixels with low power dissipation.
The talk will start with the QIS concept including strategies for high dynamic range and photon-number resolution, as well as a review of the work at Dartmouth. A brief comparison with SPAD-QIS that permits fast timing resolution but with larger pixels and lower resolution will be made. Next, computational imaging approaches and results including high dynamic range and low-light neural-net image classification will be presented. Work underway at Gigajot including color imaging and potential commercial applications will then be discussed.
Further Information:
Eric R. Fossum is best known for the invention of the CMOS image sensor “camera-on-a-chip” while at the Caltech Jet Propulsion Laboratory now used in billions of cameras each year. He co-founded and led Photobit which was acquired by Micron. He was later CEO of Siimpel and then a consultant with Samsung Electronics. He joined the Dartmouth faculty in 2010 and also serves as the Director of PhD Innovation Programs and Associate Provost for Entrepreneurship and Technology Transfer. In 2017 Dr. Fossum received the Queen Elizabeth Prize, considered by many as the Nobel Prize of Engineering “for the creation of digital imaging sensors,” along with three others. He was inducted into the National Inventors Hall of Fame, elected to the National Academy of Engineering, and is a Fellow of IEEE and OSA, among other honors.
Stanley Chan is an Associate Professor in the School of Electrical and Computer Engineering at Purdue University, West Lafayette, IN. He received the B.Eng. degree in Electrical Engineering from the University of Hong Kong in 2007, and the Ph.D. degree in Electrical Engineering from UC San Diego in 2011. His research interests include computational photography, machine learning, and signal processing. His work is supported by NSF, AFRL, ARO, Intel, and other sponsors. At Purdue, he teaches undergraduate-level probability and graduate-level machine learning. He is currently writing an eBook Introduction to Probability for Data Science. He would appreciate feedback from readers. URL: https://engineering.purdue.edu/ChanGroup/eBook.html
Dr. Jiaju Ma received his Ph.D. in Engineering Sciences in 2017 from Thayer School of Engineering at Dartmouth College as the awardee of Charles F. and Ruth D. Goodrich Prize in recognition of his outstanding academic achievements. He co-founded Gigajot Technology, Inc. in 2017 to commercialize Quanta Image Sensor technologies. From 2017, he has been leading the technological advancement at Gigajot as Chief Technology Officer. He has authored and coauthored over 20 technical publications and holds more than 20 patents and patent applications.
Bill Freeman
(MIT)
“The Moon Camera”

Please LOG IN to view the video.
Date: October 14, 2020
Description:
My attempts to photograph the Earth from space using the moon as a camera, and several computational imaging projects resulting from those attempts.
Further Information:
William T. Freeman is the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science (EECS) at MIT, and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) there. He was the Associate Department Head of EECS from 2011 – 2014. Since 2015, he has been a research manager at Google Research in Cambridge, Massachusetts.
Mohit Gupta
(University of Wisconsin)
“Computational Imaging, One Photon at a Time”

Date: October 7, 2020
Description:
Single-photon avalanche diodes (SPADs) are an emerging sensor technology capable of detecting and time-tagging individual photons with picosecond precision. Despite (or perhaps, due to) these capabilities, SPADs are considered specialized devices suitable only for photon-starved scenarios, and restricted to a limited set of niche applications. This raises the following questions: Can SPADs operate not just in low light, but in bright scenes as well? Can SPADs be used not just with precisely controlled active light sources such as pulsed lasers, but under passive, uncontrolled illumination like cellphone or machine vision cameras?
I will describe our recent work on designing computational imaging techniques that (a) enable single-photon sensors to operate across the entire gamut of imaging conditions including high-flux scenes, and (b) leverages SPADs as passive imaging devices for ultra-low light photography. The overall goal is to transform SPADs into all-weather, general-purpose sensors capable of both active and passive imaging, across photon-starved and photon-flooded environments.
Further Information:
Mohit Gupta is an Assistant Professor of Computer Sciences at the University of Wisconsin-Madison. He received B. Tech in Computer Science from IIT-Delhi, Ph.D. from the Robotics Institute, Carnegie Mellon University, and was a postdoctoral research scientist at Columbia University. He directs the WISION Lab with research interests broadly in computer vision and computational imaging. He has received best paper honorable mention awards at computer vision and photography conferences in 2014 and 2019. His research is supported by NSF, ONR, DARPA, Sony, Intel and Wisconsin Alumni Research Foundation.
Achin Bhowmik
(Starkey Hearing Technologies)
“Transforming hearing aids into multisensory perceptual augmentation and health monitoring devices”

Date: September 30, 2020
Description:
With over 466 million people suffering from disabling hearing loss globally according to the World Health Organization, and the number expected to rise to 900 million people by 2050, hearing aids are crucially important medical wearable devices. Untreated hearing loss has been linked to increased risks of social isolation, depression, dementia, fall injuries, and other health issues. In this talk, we will present a new class of in-ear devices with embedded sensors and artificial intelligence, which are shaped to fit an individual with 3D-imaging of the ear geometry. In addition to providing frequency-dependent amplification of sound to compensate for hearing loss, these devices serve as a continuous monitor for important physiological parameters, an automatic fall detection and alert system, as well as a personal assistant with connectivity to the cloud. Furthermore, Bluetooth-paired with a vision aid, these devices present exciting possibilities for multisensory perceptual augmentation of hearing, balance, vision, and memory, helping people live better and more productive lives.
Further Information:
Dr. Achin Bhowmik is the chief technology officer and executive vice president of engineering at Starkey Hearing Technologies, a privately-held medical devices business with over 5,000 employees. Previously, he was vice president and general manager of the Perceptual Computing Group at Intel, responsible for products and businesses in the areas of 3D sensing and interactive devices. He is a board member for OpenCV, the Society for Information Display, the National Captioning Institute, and a number of technology startup companies. He holds an adjunct professor position at Stanford University School of Medicine.
Lars Omlor
(Zeiss)
“Retinal topography using stripe illumination in a fundus camera”

Please LOG IN to view the video.
Date: September 23, 2020
Description:
Retinal topography is affected by pathology such as drusen and tumors, and it may be useful to determine topography with fundus imaging when three-dimensional imaging is not available. In this talk, I will present a novel method of retinal topography scanning using the stripe projection technology of the CLARUSTM 700 (ZEISS, Dublin, CA) wide-field fundus camera. The camera projects stripes onto the retina and records images of the returned light while maintaining a small angle between illumination and imaging. We make use of this structured illumination, analyzing neighboring stripes to determine depth -i.e. the retinal topography – from both relative defocus and stripe displacements. The resulting topography maps are finally compared to three-dimensional data from optical coherence tomography imaging.
Further Information:
Lars Omlor is a staff research scientist for the central research department of Zeiss. He majored in Mathematics at the University Heidelberg and received his PhD (2009) in computer science from Ulm University. He has been working at Zeiss since 2010 and moved to Pleasanton (CA) in 2018. Most of his research is in the fields of computational imaging, image processing and machine learning.
Daniele Faccio
(University of Glasgow )
Time of flight imaging with single photon detectors

Please LOG IN to view the video.
Date: Aug 26, 2020
Description:
Single Photon Avalanche Detectors (SPADs) can detect single photon arrival events and in doing so, record a click that can be used to also determine the photon arrival time on the detector with picosecond temporal resolution. This timing capability is finding many uses such as LIDAR and fluorescence lifetime imaging and provides an opportunity for revisiting fundamental imaging concepts by combining SPAD data with computational image retrieval techniques. The computational techniques can in general resort to inverse retrieval approaches or machine learning, with the choice depending on the specific nature of the data and imaging problem at hand. I will overview some of our work, starting from the first attempts to capture light-in-flight using SPAD cameras and covering the topics of non-line-of-sight imaging, imaging through diffusion, extraction of 3D images from time-of-light data only, fluorescence lifetime imaging and coincidence counting for quantum imaging applications.
Further Information:
Daniele Faccio is a Royal Academy Chair in Emerging Technologies and Fellow of the Royal Society of Edinburgh and Optical Society of America. He joined the University of Glasgow in 2017 as Professor in Quantum Technologies where he leads the Extreme-Light group and is Director of Research for the School of Physics and Astronomy. He is also adjunct professor at the University of Arizona, and previously was at Heriot-Watt University and University of Insubria (Italy). He has been visiting scientist at MIT (USA), Marie-Curie fellow at ICFO, Barcelona (Spain) and EU-ERC fellow 2012-2017. He worked in the optical telecommunications industry for four years before obtaining his PhD in Physics in 2007 at the University of Nice-Sophia Antipolis (France). His research, funded by the UK research council EPSRC, DSTL, The Leverhulme Trust, the EU Quantum Flagship program and the Royal Academy of Engineering focuses on the physics of light, on how we harness light to answer fundamental questions and on how we harness light to improve society.
Jennifer Barton
(University of Arizona)
Miniature Optical Endoscopes for Early-Stage Cancer Detection

Please LOG IN to view the video.
Date: Aug 5, 2020
Description:
With multiple mechanisms of contrast, high sensitivity, high resolution, and the possibility to create miniature, inexpensive devices, light-based techniques have tremendous potential to positively impact cancer detection and survival. Many organs of the body can be reached in a minimally-invasive fashion with small flexible endoscopes. Some organs, such as the fallopian tubes and ovaries, require extremely miniature (sub-mm) and flexible endoscopes to avoid tissue cutting. Additionally, some modalities, such as side-viewing optical coherence tomography, are naturally suited to miniature endoscopes, whereas others like forward-viewing reflectance or fluorescence imaging, may require performance tradeoffs. The development of small, robust and fiber-delivered advanced light sources, miniature fiber bundles, and sensitive detectors has aided the development of novel miniature endoscopes. In this talk, I will discuss our recent advancements in endoscope design for multimodality optical early detection of ovarian cancer.
Further Information:
Jennifer Barton is currently Professor of Biomedical Engineering and Optical Sciences at the University of Arizona. She also serves as Director of the BIO5 Institute, a collaborative research institute dedicated to solving complex biology-based problems. Barton develops miniature endoscopes that combine multiple optical imaging techniques, particularly optical coherence tomography and fluorescence spectroscopy. She evaluates the suitability of these endoscopic techniques for detecting early cancer development in patients and pre-clinical models. She has a particular interest in the early detection of ovarian cancer, the most deadly gynecological malignancy. She is a fellow of SPIE- the International Optics Society, and the American Institute for Medical and Biological Engineering.
Quyen Nguyen
(UCSD)
Fluorescence Guided Precision Surgery TM – Illuminating Tumors and Nerves

Please LOG IN to view the video.
Date: July 8, 2020
Description:
Although treatment algorithms vary, surgery is the primary treatment modality for most solid cancers. In oncologic tumor resection, the preferred outcome is complete cancer removal as residual tumor left behind is considered treatment failure. However complete tumor removal needs to be balanced with functional preservation and minimizing patient morbidity including prevention of inadvertent nerve injury. The inability of surgeons to visually distinguish between tumor and normal tissue including nerves leads to residual cancer cells being left behind at the edges of resection, i.e. positive surgical margins (PSM). PSM can be as high as 20-40% in breast cancer lumpectomy, 21% for radical prostatectomy, and 13% for HNSCC. Similarly, using white light reflectance alone which is the current standard of care in operating rooms, nerve dysfunction following surgery has been reported to be as high as ~2-40% ranging from immediate post op to long-term dysfunction
Molecular imaging with fluorescence provides enhanced visual definition between diseased and normal tissue and have been shown to decrease PSM in both animal models and patients. Molecular imaging with fluorescence can also provide enhanced visualization of important structures such as nerves to improve preservation and minimize inadvertent injury. Our laboratory has extensive experience in development of both nerve and tumor injectable markers for surgical visualization. In presentation we will discuss the development of nerve and tumor markers combinations to improve intraoperative visualization – aka Precision Surgery TM.
Further Information:
Dr. Nguyen is a Professor in the Department of Surgery at the University of California San Diego (UCSD). She received her combined MD/PhD degree from Washington University, School of Medicine in St. Louis, MO. She completed her General Surgery Internship at Barnes Jewish Hospital in St. Louis and residency in Head and Neck Surgery and subspecialty fellowship training in Neurotology/Skull Base Surgery at UCSD. She is board certified in both Head and Neck Surgery and Neurotology/Skull Base Surgery and is the fellowship director for the ACGME accredited fellowship in Neurotology/Skull Base Surgery at UCSD.
Her clinical practice is at UCSD Health Systems where she cares for patients with diseases of the facial nerve, ear, and skull base. She has subspecialty interest in facial nerve reanimation and surgical procedures for patients with facial paralysis. She also specializes in hearing restoration surgeries including stapedectomy and cochlear implantation.
Dr. Nguyen’s interest in molecular imaging for fluorescence guided Precision SurgeryTM began during her fellowship at UCSD where she collaborated with Dr. Roger Tsien (1952-2016), Nobel Laureate, Chemistry 2008. She has been awarded the Presidential Early Career Award for Scientists and Engineers (PECASE, April 2014). The Presidential Award is the highest honor bestowed by the U.S. government on outstanding scientists and engineers beginning their independent careers.
Together with Dr. Tsien, Dr. Nguyen co-invented a tumor marker for fluorescence-enabled real-time detection of tumor margins and a nerve marker for fluorescence-enabled real-time illumination of nerves. The tumor marker was licensed by Avelas Biosciences, INC and is in currently in late stage clinical testing for patients with breast cancer undergoing surgery (NCT03113825). Dr. Nguyen founded Alume Biosciences, INC (Alume) in 2017 to enable the clinical translation of the nerve marker. Alume has received an allowance from the United States Food and Drug Administration (US FDA) to proceed with clinical trial testing in patients undergoing head and neck surgery (NCT04420689) at University of California, San Diego, Stanford and Harvard. Studies are expected to initiate in late Q2 2020.
Hod Finkelstein
(Sense Photonics)
Next-generation technologies to enable high-performance, low-cost lidar

Please LOG IN to view the video.
Date: July 2, 2020
Description:
Lidar systems are used in diverse applications, including autonomous driving and industrial automation. Despite the challenging requirements for these systems, most lidars in the market utilize legacy technologies such as scanning mechanisms, long-coherence-length or edge-emitting lasers, and avalanche photodiodes, and consequently offer limited performance and robustness at a relatively high price point. Other systems propose to utilize esoteric technologies which face an uphill struggle towards manufacturability. This talk will describe two complementary technologies, leveraging on years of progress in adjacent industries. When combined with novel signal processing algorithms, these deliver a high-resolution, long-range and low-cost solid-state flash lidar system, breaking the performance envelope and resulting in a camera-like system. We will discuss design trade-offs, performance roadmap into the future and remaining challenges.
Further Information:
Hod Finkelstein is CTO at Sense Photonics. He started his career at Intel, designing the early Pentium chips and was in charge of semiconductor technologies at Mellanox Technologies. During his PhD, he invented the first generic-CMOS Single-Photon-Avalanche-Diodes (SPADs) and first demonstrated arrays of these devices. Hod was Director of Technology Development at Illumina, the $45B DNA sequencing company, where he managed the development of on-chip CMOS DNA sequencers. He was later CTO at TruTag Technologies, where he conceptualized and developed an award-winning handheld and battery-operated hyperspectral camera. Hod defines the technology direction and leads Sense Photonics’ next-generation lidar development team.
Mike Wiemer and Brian Lemoff
(Mojo Vision)
Mojo Lens, the First True Smart Contact Lens

Please LOG IN to view the video.
Date: May 27, 2020
Description:
After working in stealth for over 4 years, Mojo Vision recently unveiled the first working prototype of Mojo Lens, a smart contact lens designed to deliver augmented reality content wherever you look. This talk will provide an overview of the company, its vision of “Invisible Computing”, the science behind the world’s first contact lens display, and a first-hand account of what it’s like to actually wear Mojo Lens.
Further Information:
Mike Weimer is a serial entrepreneur and proven science and technology leader in complex systems development and integration. Before co-founding Mojo Vision as CTO, Weimer co-founded and served as president at Solar Junction, a high-efficiency solar cell company (acquired) where he and his team set two world records for the highest-efficiency solar cells ever made by humans. After Solar Junction, Wiemer joined New Enterprise Associates (NEA) as an Entrepreneur in Residence where he sourced new investments and helped portfolio companies to develop their business and funding strategies. He is a board director at Stratio Corporation and an advisor at Stanford’s StartX Accelerator. He holds a B.S., M.S., and Ph.D. in Electrical Engineering from Stanford University.
Brian Lemoff is a Fellow at Mojo Vision where he leads the optics team and lends his expertise to efforts across the company. One of Mojo’s first employees, Lemoff invented several key technologies that make Mojo Lens possible. Before Mojo, Lemoff held leadership positions at HP/Agilent Labs and at the West Virginia High Tech Foundation. He received his BS/MS in physics from Caltech in 1989 and his PhD in physics from Stanford in 1994. Lemoff received the 1995 American Physical Society award for Outstanding Doctoral Research in Atomic, Molecular, and Optical Physics. He has 40 issued U.S. patents relating to optical communications and smart contact lenses, with many more patents pending.
Lei Xiao
(Facebook)
Learned Image Synthesis for Computational Displays

Please LOG IN to view the video.
Date: May 20, 2020
Description:
Addressing vergence-accommodation conflict in head-mounted displays (HMDs) requires resolving two interrelated problems. First, the hardware must support viewing sharp imagery over the full accommodation range of the user. Second, HMDs should accurately reproduce retinal defocus blur to correctly drive accommodation. A multitude of accommodation-supporting HMDs have been proposed, with three architectures receiving particular attention: varifocal, multifocal, and light field displays. These designs all extend depth of focus, but rely on computationally expensive rendering and optimization algorithms to reproduce accurate defocus blur (often limiting content complexity and interactive applications). To date, no unified framework has been proposed to support driving these emerging HMDs using commodity content. In this talk, we will present DeepFocus, a generic, end-to-end convolutional neural network designed to efficiently solve the full range of computational tasks for accommodation-supporting HMDs. This network is demonstrated to accurately synthesize defocus blur, focal stacks, multilayer decompositions, and multiview imagery using only commonly available RGB-D images, enabling real-time, near-correct depictions of retinal blur with a broad set of accommodation-supporting HMDs.
Further Information:
Lei Xiao is a Research Scientist at Facebook Reality Labs. He obtained his PhD from University of British Columbia under supervision of Wolfgang Heidrich. His research interests include computational photography and image synthesis for virtual and mixed reality.
Adam Rowell
(Lucid)
Practical 2D to 3D Image Conversion

Please LOG IN to view the video.
Date: May 13, 2020
Description:
We will discuss techniques for converting 2D images to 3D meshes on mobile devices. This includes methods to efficiently compute both dense and sparse depth maps, converting depth maps into 3D meshes, mesh inpainting, and post-processing. We focus on different CNN designs to solve each step in the processing pipeline and examine common failure modes. Finally, we will look at practical deployment of image processing algorithms and CNNs on mobile apps, and how Lucid uses cloud processing to balance processing power with latency.
Further Information:
Adam is the co-founder and CTO of Lucid. His breakthrough research in computer vision and signal processing as a PhD at Stanford powers the technology behind Lucid’s 3D Fusion Technology, world’s first AI-based 3D and depth capture technology mimicking human vision in dual/multi camera devices which is currently deployed in their own product, the LucidCam, in most of mobile phones, robots, and aiming to be in autonomous cars. He worked for many years in the industry for Exponent as a consultant focused on machine learning and computer vision development, from consumer to business to military applications, coding and leading engineering teams to build the most advanced GPU/NPU based systems in the industry. Afterwards, he joined Maxim Integrated in their Advanced Analytics team, optimizing the organization of the 10,000 employee public company from the ground up. Adam defines the technology direction and leads Lucid’s engineering team
Qi Guo
(Harvard)
Bio-inspired depth sensing using computational optics

Please LOG IN to view the video.
Date: May 6, 2020
Description:
Jumping spiders rely on accurate depth perception for predation and navigation. They accomplish depth perception, despite their tiny brains, by using specialized optics. Each principal eye includes a multitiered retina that simultaneously receives multiple images with different amounts of defocus, and distance is decoded from these images with seemingly little computation. In this talk, I will introduce two depth sensors that are inspired by jumping spiders. They use computational optics and build upon previous depth-from-defocus algorithms in computer vision. Both sensors operate without active illumination, and they are both monocular and computationally efficient.
The first sensor synchronizes an oscillating deformable lens with a photosensor. It produces depth and confidence maps at more than 100 frames per second and has the advantage of being able to extend its working range through optical accommodation. The second sensor uses a custom-designed metalens, which is an ultra-thin device with 2D nano-structures that modulate traversing light. The metalens splits incoming light and simultaneously forms two differently-defocused images on a planar photosensor, allowing the efficient computation of depth and confidence from a single snapshot in time.
Further Information:
Qi Guo is a PhD student at Harvard University advised by Todd Zickler. He combines optics and computer vision algorithms to create computational sensors. He received his bachelors degree in automation from Tsinghua and has interned at Facebook, Nvidia, and Baidu. He received the Best Student Paper as a co-author at ECCV 2016 and the Best Demo Award at ICCP 2018
Anders Grunnet-Jepsen
(Intel)
Insight into the inner workings of Intel’s Stereo and Lidar Depth Cameras

Please LOG IN to view the video.
Date: April 29, 2020
Description:
This talk will provide an overview of the technology and capabilities of Intel’s RealSense Stereo and Lidar Depth Cameras, and will then progress to describe new features, such as high-speed capture, multi-camera enhancements, optical filtering, and near-range high-resolution depth imaging. Finally, we will introduce a new fast on-chip calibration method that can be used to improve the performance of a stereo camera and help mitigate some common stereo artifacts.
Further Information:
Anders Grunnet-Jepsen is the CTO of the Intel RealSense Group, where he works on stereo- and lidar- depth cameras. He was founder of ThinkOptics which licensed its optical tracking technology to Nintendo for the Wii. He was acting CTO of an optical communications start-up, Templex Technology, when it was acquired by Intel in 2001. Anders received his Ph.D from Oxford University, and a Masters in Electrical Engineering from the Technical University of Denmark, and has worked at the University of California San Diego; Thomson CSF, France; and NKT Research Center, Denmark. He has a passion for all things optical.
Awais Ahmed
(Pixxel)
The Extreme Science of Building High-Performing Compact Cameras for Space Applications

Please LOG IN to view the video.
Date: April 22, 2020
Description:
A thickening flock of earth-observing satellites blankets the planet. Over 700 were launched during the past 10 years, and more than 2,200 additional ones are scheduled to go up within the next 10 years. To add to that, year on year, satellite platforms and instruments are being miniaturized to improve cost-efficiency with the same expectations of high spatial and spectral resolutions. But what does it take to build imaging systems that are high-performing in the harsh environment of space but cost-efficient and compact at the same time? This talk will touch upon the technical issues associated with the design, fabrication, and characterisation of such extremely high-performing but still compact and cost-efficient space cameras taking the example of the imager that Pixxel has built as part of its earth-imaging satellite constellations plans.
Further Information:
Awais Ahmed is the founder and CEO at Pixxel, an Indian space-technology startup that builds and operates a constellation of small-satellites to collect, monitor and analyze data through satellite imagery. Awais founded Pixxel in his final year at his university BITS Pilani, where he pursued his Master of Science degree in Mathematics, with the vision of establishing a global space company working on bringing the benefits of space down to earth. Pixxel’s first satellite is manufactured and booked to launch in August 2020 and will be the first private commercial satellite launched from India. Awais was also one of the founding members of Hyperloop India, the only Indian and one of the 24 global finalist teams in the SpaceX Hyperloop Pod Competition where they built India’s first ever hyperloop pod and presented it to Elon Musk.
Florian Willomitzer
(Computational Photography Lab at Northwestern University, IL)
The Role of Fundamental Limits in 3D Imaging Systems: From Looking around Corners to Fast 3D Cameras

Please LOG IN to view the video.
Date: April 15, 2020
Description:
The knowledge about limits is a precious commodity in computational imaging: By knowing that our imaging device already operates at the physical limit (e.g. of resolution), we can avoid unnecessary investments in better hardware, such as faster detectors, better optics or cameras with higher pixel resolution. Moreover, limits often appear as uncertainty products, making it possible to bargain with nature for a better measurement by sacrificing less important information.
In this talk, the role of physical and information limits in computational imaging will be discussed using examples from two of my recent projects: ‘Synthetic Wavelength Holography’ and the ‘Single-Shot 3D Movie Camera’.
Synthetic Wavelength Holography is a novel method to image hidden objects around corners and through scattering media. While other approaches rely on time-of-flight detectors, which suffer from technical limitations in spatial and temporal resolution, Synthetic Wavelength Holography works at the physical limit of the space-bandwidth product. Full field measurements of hidden objects around corners or through scatterers reaching sub-mm resolution will be presented.
The single-shot 3D movie camera is a highly precise 3D sensor for the measurement of fast macroscopic live scenes. From each 1 Mpix camera frame, the sensor delivers 300,000 independent 3D points with high resolution. The single-shot ability allows for a continuous 3D measurement of fast moving or deforming objects, resulting in a continuous 3D movie. Like a hologram, each movie-frame encompasses the full 3D information about the object surface, and the observation perspective can be varied while watching the 3D movie.
Further Information:
Florian Willomitzer works as a Research Assistant Professor in the Computational Photography Lab at Northwestern University, IL. Florian graduated from the University of ErlangenNuremberg, Germany, where he received his Ph.D. degree with honors (‘summa cum laude’) in 2017. During his doctoral studies he investigated physical and information theoretical limits of optical 3D-sensing and implemented sensors that operate close to these limits. Concurrent to his activity at the Erlangen University, Florian was a freelancer in the research group’s spin-off company ‘3DShape GmbH’ and worked as a high school part time teacher for physics.
At Northwestern University, Florian develops novel methods to image hidden objects through scattering media or around corners. Moreover, his research is focused on high-resolution holographic displays, the implementation of highprecision metrology methods in low-cost mobile handheld devices, and novel techniques to overcome traditional resolution limitations and dynamic range restrictions in 3D and 2D imaging.
Reinhard Heckel
(Technical University of Munich)
Image recovery with untrained convolutional neural networks

Please LOG IN to view the video.
Date: April 8, 2020
Description:
Convolutional Neural Networks are highly successful tools for image recovery and restoration. A major contributing factor to this success is that convolutional networks impose strong prior assumptions about natural images—so strong that they enable image recovery without any training data. A surprising observation that highlights those prior assumptions is that one can remove noise from a corrupted natural image by simply fitting (via gradient descent) a randomly initialized, over-parameterized convolutional generator to the noisy image.
In this talk, we discuss a simple un-trained convolutional network, called the deep decoder, that provably enables image denoising and regularization of inverse problems such as compressive sensing with excellent performance. We formally characterize the dynamics of fitting this convolutional network to a noisy signal and to an under-sampled signal, and show that in both cases early-stopped gradient descent provably recovers the clean signal. Finally, we discuss our own numerical results and numerical results from another group demonstrating that un-trained convolutional networks enable magnetic resonance imaging from highly under-sampled measurements.
Further Information:
Reinhard Heckel is a Rudolf Moessbauer assistant professor in the Department of Electrical and Computer Engineering (ECE) at the Technical University of Munich, and an adjunct assistant professor at Rice University, where he was an assistant professor in the ECE department from 2017-2019. Before that, he spent one and a half years as a postdoctoral researcher in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley, and a year in the Cognitive Computing & Computational Sciences Department at IBM Research Zurich. He completed his PhD in electrical engineering in 2014 at ETH Zurich and was a visiting PhD student at the Statistics Department at Stanford University. Reinhard is working in the intersection of machine learning and signal/information processing with a current focus on deep networks for solving inverse problems, learning from few and noisy samples, and DNA data storage.
Albert Parra Pozo
(Facebook)
An Integrated 6DoF Video Camera and System Design

Please LOG IN to view the video.
Date: March 4, 2020
Description:
Designing a fully integrated 360◦ video camera supporting 6DoF head motion parallax requires overcoming many technical hurdles, including camera placement, optical design, sensor resolution, system calibration, real-time video capture, depth reconstruction, and real-time novel view synthesis. While there is a large body of work describing various system components, such as multi-view depth estimation, our paper is the first to describe a complete, reproducible system that considers the challenges arising when designing, building, and deploying a full end-to-end 6DoF video camera and playback environment. Our system includes a computational imaging software pipeline supporting online markerless calibration, high-quality reconstruction, and real-time streaming and rendering. Most of our exposition is based on a professional 16-camera configuration, which will be commercially available to film producers. However, our software pipeline is generic and can handle a variety of camera geometries and configurations. The entire calibration and reconstruction software pipeline along with example datasets is open sourced to encourage follow-up research in high-quality 6DoF video reconstruction and rendering.
Further Information:
Albert Para Pozo is the Tech Lead of the AR/VR Capture team that is responsible for developing next generation virtual and augmented reality technologies. Before joining Facebook in 2014, Albert received a PhD in Electrical and Computer Engineering from Purdue University, and a BS in Telecommunications Engineering from the Polytechnic University of Catalonia. His research interests include virtual reality, augmented reality, computational imaging and image processing.
Ben Mildenhall
(UC Berkeley)
Deep Learning for Practical and Robust View Synthesis

Please LOG IN to view the video.
Date: February 26, 2020
Description:
I will present recent work (“Local Light Field Fusion”) on a practical and robust deep learning solution for capturing and rendering novel views of complex real world scenes for virtual exploration. Our view synthesis algorithm operates on an irregular grid of sampled views, first expanding each sampled view into a local light field via a multiplane image (MPI) scene representation, then rendering novel views by blending adjacent local light fields. We extend traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using our algorithm. In practice, we can apply this bound to capture and render views of real world scenes that achieve the perceptual quality of Nyquist rate view sampling while using up to 4000x fewer views.
Further Information:
Ben Mildenhall is a PhD student at UC Berkeley. He is advised Professor Ren Ng and supported by a Hertz Foundation Fellowship. He received his bachelor’s degree in CS and math from Stanford University and has worked at Pixar, Google, and Fyusion in the past. His current research focuses on applying deep learning to 3D reconstruction, view synthesis, and other inverse graphics problems.
Alberto Stochino
(Perceptive Inc.)
A fundamentally new sensing approach for high-level autonomous driving

Please LOG IN to view the video.
Date: February 19, 2020
Description:
It has been fifteen years since Stanford won the DARPA Grand Challenge. Since then, a new race is underway in the automotive industry to fulfill the ultimate mobility dream: driverless vehicles for all. Trying to elevate test track vehicles to automotive grade products has been a humbling experience for everyone. The mainstream approach has relied primarily on brute force for both hardware and software. Sensors in particular have become overly complex and unscalable in response to escalating hardware requirements. To reverse this trend, Perceptive has been developing a fundamentally new sensing platform for fully autonomous vehicles. Based on Digital Remote Imaging and Edge AI, the platform shifts the complexity to the software and scales with the compute. This talk will discuss the system architecture and underlying physics.
Further Information:
Alberto Stochino is the founder and CEO of Perceptive, Inc.. He received his PhD from the University of Pisa, Italy for his work at Caltech and MIT on the LIGO gravitational wave detectors. After graduate school, his research focused on high precision metrology for space applications at the Australian National University (NASA’s GRACE Follow-on gravimetry mission) and at Stanford’s HEPL laboratory (molecular clocks for NASA) with Prof. Robert Byer and Prof. John Lipa. He later joined Apple to work on autonomous technology until he founded Perceptive in 2018.
Jesse Adams
(FlatCam)
Beyond lenses: Computational imaging with a light-modulating mask

Please LOG IN to view the video.
Date: February 13, 2020
Description:
The lens has long been a central element of cameras, its role to refract light to achieve a one-to-one mapping between a point in the scene and a point on the sensor. We propose a radical departure from this practice and the limitations it imposes. In this talk I will discuss our recent efforts to build extremely thin imaging devices by replacing the lens in a conventional camera with a light-modulating mask and computational reconstruction algorithms. These lensless cameras can be less than a millimeter in thickness and enable applications where size, weight, thickness, or cost are the driving factors.
Further Information:
Jesse Adams is the cofounder and CEO of Flatcam. He received his Ph.D. in Applied Physics from Rice University and B.S. from the University of North Florida. He moved to the Bay area after he was awarded the Cyclotron Road Fellowship, hosted at Lawrence Berkeley National Laboratory, to explore the commercial potential of lens-free imaging.
Tamay Aykut
(Stanford University)
Towards Immersive Telepresence: Stereoscopic 360-degree Vision in Realtime

Please LOG IN to view the video.
Date: February 5, 2020
Description:
The technological advances in immersive telepresence are greatly impeded by the challenges that need to be met when mediating the realistic feeling of presence in a remote environment to a local human user. Providing a stereoscopic 360° visual representation of the distant scene further fosters the level of realism and greatly improves task performance. State-of-the-art technology has primarily developed catadioptric or multi-camera systems to address this issue. Current solutions are bulky, not realtime capable, and tend to produce erroneous image content due to the stitching processes involved, which are prone to perform poorly for texture-less scenes. In this talk, I will introduce a vision on-demand approach that creates stereoscopic scene information upon request. A real-time capable camera system along with a novel deep learning-based delay-compensation paradigm will be presented that provides instant visual feedback for highly immersive telepresence.
Nick Antipa
(UC Berkeley)
DiffuserCam: Lenseless single-exposure 3D imaging

Please LOG IN to view the video.
Date: January 22, 2019
Description:
Traditional lenses are optimized for 2D imaging, which prevents them from capturing extra dimensions of the incident light field (e.g. depth or high-speed dynamics) without multiple exposures or moving parts. Leveraging ideas from compressed sensing, I replace the lens of a traditional camera with a single pseudorandom free-form optic called a diffuser. The diffuser creates a pseudorandom point spread function which multiplexes these extra dimensions into a single 2D exposure taken with a standard sensor. The image is then recovered by solving a sparsity-constrained inverse problem. This lensless camera, dubbed DiffuserCam, is capable of snapshot 3D imaging at video rates, encoding a high-speed video (>4,500 fps) into a single rolling-shutter exposure, and video-rate 3D imaging of fluorescence signals, such as neurons, in a device weighing under 3 grams.
Jonghyun Kim
(NVIDIA)
Matching Visual Acuity and Prescription: Towards AR for Humans

Please LOG IN to view the video.
Date: January 15, 2019
Description:
In this talk, Dr. Jonghyun Kim will present two recent AR display prototypes inspired by human visual system. The first, Foveated AR dynamically provides high-resolution virtual image to the user’s foveal region based on the tracked gaze. The second, Prescription AR is a prescription-embedded fully-customized AR display systems which works as user’s eyeglasses and AR display at the same time. Finally, he will discuss on important issues for socially-acceptable AR display systems including customization, privacy, fashion, and eye-contact interaction, and how they are related to the display technologies.
Further Information:
Dr. Jonghyun Kim received his B.S. degrees in the School of Electrical Engineering in 2011 at Seoul National University and received his Ph.D. degrees in the department of Electrical Engineering and Computer Science at Seoul National University in 2017. He is currently a Senior Research Scientist at NVIDIA and also working as a Visiting Researcher at Stanford. His expertise includes AR/VR display, light field display, and Display Optics. His Foveated and Prescription AR demo received the ‘best in show’ award in Emerging Technology at SIGGRAPH 2019.
Oliver Woodford
(Snap)
Towards immersive AR experiences in monocular video

Please LOG IN to view the video.
Date: January 8, 2019
Description:
AR on handheld, monocular, “through-the-camera” platforms such as mobile phones is a challenging task. While traditional, geometry based approaches provide useful data in certain scenarios, for truly immersive experiences we need to leverage the prior knowledge encapsulated in learned CNNs. In this talk I will discuss the capabilities and limitations of such traditional methods, the need for CNN-based solutions, and the challenges to training accurate and efficient CNNs on this task. I will describe our recent work on implicit, 3D representations for AR, with applications in novel view synthesis, scene reconstruction and arbitrary object manipulation. Finally, I will present a project opportunity, to learn such representations from a dataset of single images.
Further Information:
Oliver Woodford is a Lead Research Scientist in the Creative Vision Group at Snap Research in Los Angeles. His research focuses on statistical models and optimization, with applications in geometry and low-level vision. He obtained a DPhil from Oxford University in 2009, supervised by Andrew Fitzgibbon, Phil Torr and Ian Reid, for work on novel view synthesis and stereo, receiving a Best Paper Award at CVPR 2008 for the latter. Prior to joining Snap, he worked at Toshiba Research, where he received a prestigious Toshiba Research & Development Achievement Award in 2012, and mobile app startup Seene, winner of several tech awards including The Tech Expo Champion 2015.
- Michael Broxton » Light Field Video
- Sixian You » Label-free optical imaging
- David Williams » Imaging retinal ganglion cells
- Professor Eric Fossum (Dartmouth), Prof. Stanley Chan (Purdue) and Dr. Jiaju Ma (Gigajot Technology) » Quanta Image Sensors
- Bill Freeman » Moon Camera
- Mohit Gupta » Computational Imaging
- Achin Bhowmik » Hearing airs as multisensory devices
- Lars Omlor » Fundus imaging
- Daniele Faccio » Time of flight imaging
- Jennifer Barton » Miniature Optical Endoscopes
- Quyen Nguyen » Fluorescence Imaging Tumors and Nerves
- Hod Finkelstein » Low-cost Lidar
- Mike Wiemer and Brian Lemoff » Mojo Lens
- Lei Xiao » DeepFocus
- Adam Rowell » 2D to 3D Image Conversion on Mobile Devices
- Qi Guo » depth sensing using computational optics
- Anders Grunnet-Jepsen » Intel’s Stereo and Lidar Depth Cameras
- Awais Ahmed » Compact Cameras for Space Applications
- Florian Willomitzer » The Role of Fundamental Limits in 3D Imaging Systems
- Reinhard Heckel » Image recovery with untrained CNNs
- Albert Parra Pozo » An Integrated 6DoF Video Camera and System Design
- Ben Mildenhall » Deep Learning for Practical and Robust View Synthesis
- Alberto Stochino » A fundamentally new sensing approach for high-level autonomous driving
- Jesse Adams » Computational imaging with a light-modulating mask
- Tamay Aykut » Towards Immersive Telepresence
- Nick Antipa » DiffuserCam: Lenseless single-exposure 3D imaging
- Jonghyun Kim » Matching Visual Acuity and Prescription
- Oliver Woodford » Towards immersive AR experiences in monocular video
SCIEN Colloquia 2019
Jon Barron
(Google)
How to Learn a Camera

Please LOG IN to view the video.
Date: December 4, 2019
Description:
Traditionally, the image processing pipelines of consumer cameras have been carefully designed, hand-engineered systems. But treating an imaging pipeline as something to be learned instead of something to be engineered has the potential benefits of being faster, more accurate, and easier to tune. Relying on learning in this fashion presents a number of challenges, such as fidelity, fairness, and data collection, which can be addressed through careful consideration of neural network architectures as they relate to the physics of image formation. In this talk I’ll be presenting recent work from Google’s computational photography research team on using machine learning to replace traditional building blocks of a camera pipeline. I will present learning based solutions for the classic tasks of denoising, white balance, and tone mapping, each of which uses a bespoke ML architecture that is designed around the specific constraints and demands of each task. By designing learning-based solutions around the structure provided by optics and camera hardware, we are able to produce state-of-the-art solutions to these three tasks in terms of both accuracy and speed.
Brian Wandell
(Stanford)
Simulation Technologies for Image Systems Engineering

Date: November 20, 2019
Description:
The use of imaging systems has grown enormously over the last several decades; these systems are an essential component in mobile communication, medicine, and automotive applications. As imaging applications have expanded the complexity of imaging systems hardware – from optics to electronics – has increased dramatically. The increased complexity makes software prototyping an essential tool for the design of novel systems and the evaluation of components. I will describe several simulations we created for image systems engineering applications: (a) designing cameras for autonomous vehicles [1], (b) simulating image encoding by the human eye and retina for image quality assessments [2], and (c) assessing the spatial sensitivity of CNNs for multiple applications [3]. This is a good moment to consider how academia and industry might cooperate to create an image systems simulation infrastructure that speeds the development of new systems for the many opportunities that will arise over the next few decades.
Further Information:
Brian A. Wandell is the first Isaac and Madeline Stein Family Professor at Stanford University. He joined the Psychology faculty in 1979 and is a member, by courtesy, of Electrical Engineering, and Ophthalmology. He is Director of the Center for Cognitive and Neurobiological Imaging, and Deputy Director of the Wu Tsai Neuroscience Institute. Wandell’s research uses magnetic resonance imaging and software simulations for basic and applied research spanning human visual perception, brain development, and image systems simulations.
Ravi Ramamoorthi
(UCSD)
Light Fields: From Shape Recovery to Sparse Reconstruction

Please LOG IN to view the video.
Date: November 6, 2019
Description:
The availability of academic and commercial light field camera systems has spurred significant research into the use of light fields and multi-view imagery in computer vision and computer graphics. In this talk, we discuss our results over the past few years, focusing on a few themes. First, we describe our work on a unified formulation of shape from light field cameras, combining cues such as defocus, correspondence, and shading. Then, we go beyond photoconsistency, addressing non Lambertian objects, occlusions, and an SVBRDF-invariant shape recovery algorithm. Finally, we show that advances in machine learning can be used to interpolate light fields from very sparse angular samples, in the limit a single 2D image, and create light field videos from sparse temporal samples. We also discuss recent work on combining machine learning with plenoptic sampling theory to create virtual explorations of real scenes from a very sparse set of input images captured on a handheld mobile phone.
Jason Mudge
(Golden Gate Light Optimization)
A proposed range compensating lens for non-imaging active optical systems

Please LOG IN to view the video.
Date: October 30, 2019
Description:
In many active optical systems where the light is supplied such as a range finder or optical radar (LiDAR), there is a possibility that the detector (pixel) can be destroyed if significant signal is returned or at the very least blinded for a period of time. This is because the designer struggles with the “one over range squared” loss which can amount to significant attenuation in the return signal given the range requirements. When pushing the range limit requirement, the sensor is in need of a large dynamic range detector and/or some form of detector protection when the target is quite close. This work proposes a lens that attempts to compensate for range signal loss passively and instantaneously by combining lens elements in parallel rather than in series as is typically done [Mudge Appl. Opt. 58, (2019)]. The proposed lens is relatively simple and compensates for range albeit not perfectly. Additionally, a discussion is provided to implement this approach along with a variety of examples of a range compensating lens [Phenis et al. Proc. of SPIE, 11125, (2019)]. These designs cover techniques and include some of the penalties incurred.
Laru Marcu
(UC Davis)
Fluorescence Lifetime Techniques in Clinical Interventions

Please LOG IN to view the video.
Date: October 23, 2019
Description:
This presentation overviews fluorescence lifetime spectroscopy and imaging techniques for label-free in vivo characterization of biological tissues. Emphasis is placed on recently developed devices and methods enabling real-time characterization and diagnosis of diseased tissues during clinical interventions. I will present studies conducted in animal models and human patients demonstrating the ability of these techniques to provide rapid in-situ evaluation of tissue biochemistry and their potential to guide surgical and intravascular procedures. Current results demonstrate that intrinsic fluorescence can provide useful contrast for the diagnosis of vulnerable atherosclerotic plaques, intraoperative delineation of brain tumors and head and neck tumors. Finally, I will present results from the first-in-human study that shows the potential of a multispectral fluorescence lifetime method for image-guided augmented reality in trans-oral robotic surgery (TORS).
Further Information:
Laura Marcu is a Professor of Biomedical Engineering and Neurological Surgery at University of California at Davis. She received her doctorate degree in biomedical engineering from the University of Southern California in 1998. Her research interest is in the area of biomedical optics, with a particular focus on research for the development of optical techniques for tissue diagnostics including applications in oncology and cardiology.
Lars Omlor
(Zeiss)
Optical 3D scanning in an X-ray microscope

Please LOG IN to view the video.
Date: October 16, 2019
Description:
The Zeiss Group develops, produces and distributes measuring technology, microscopes, medical technology, eyeglass lenses, camera and cinema lenses, binoculars and semiconductor manufacturing equipment. In this talk, I will present a novel webcam based optical 3D scanning method that allows independent surface mesh generation inside X-ray microscopes. These surface models can be used for collision avoidance and improved ease of use.
Further Information:
Lars Omlor is a staff research scientist for the central research department of Zeiss. He majored in Mathematics at the University Heidelberg and received his PhD (2009) in computer science from Ulm University. He has been working at Zeiss since 2010 and moved to Pleasanton (CA) in 2018. Most of his research is in the fields of computational imaging, image processing and machine learning.
Chang Yuan
(Foresight AI)
Training for autonomous vehicles and mobile robots

Please LOG IN to view the video.
Date: October 9, 2019
Description:
Autonomous mobile robots (e.g., self-driving cars, delivery trucks) are emerging and reshaping the world. However, there is one critical problem blocking the entire industry: how can mobile robots drive safely and naturally in the complex 3D world, and how do we know? In this talk, I will discuss our solution, called “training academy”, to the aforementioned problem. We apply 3D vision, deep learning, and reinforcement learning techniques to generate real-world, high-fidelity driving scenarios and train the autonomous systems to develop human-like intelligence in a simulated environment.
Further Information:
Chang Yuan is the CEO & co-founder of Foresight AI (https://foresight.ai), an AI and robotics technology company in Silicon Valley. Chang received his Ph.D. and M.S. from the University of Southern California, and B.Eng. from Tsinghua University, all in Computer Science. His expertise areas include computer vision, machine learning, and robotics. He has been building cutting-edge technologies and products, including Apple autonomous systems and FaceID, and Amazon Go. He has authored more than 15 publications and 30 patents.
Giljoo Nam
Inverse Rendering for Realistic Computer Graphics

Please LOG IN to view the video.
Date: October 2, 2019
Description:
Rendering refers to a process of creating digital images of an object or a scene from 3D data using computers and algorithms. Inverse rendering is the inverse process of rendering, i.e., reconstructing 3D data from 2D images. The 3D data to be recovered can be 3D geometry, reflectance of a surface, camera viewpoints, or lighting conditions. In this talk, we will discuss three inverse rendering problems. First, inverse rendering using flash photography captures 3D geometry and reflectance of a static object using a single camera and a flashlight attached to the camera. An alternating and iterative optimization framework is proposed to jointly solve for several unknown properties. Second, inverse rendering at microscale reconstructs 3D normals and reflectance of a surface at microscale. A specially designed acquisition system, as well as an inverse rendering algorithm for microscale material appearance, are proposed. Lastly, inverse rendering for human hair describes a novel 3D reconstruction algorithm for modeling high-quality human hair geometry. We hope that our work on these advanced inverse rendering problems boosts hyper-realism in computer graphics.
Further Information:
Giljoo Nam received his Ph.D. from KAIST in August 2019. His doctoral research focuses on inverse rendering for realistic computer graphics. In particular, he has been working on high-quality 3D reconstruction and material appearance modeling. His research on image-based appearance modeling was selected as a representative achievement at ACM SIGGRAPH Asia 2018, receiving press attention from various notable media (e.g., EurekAlert, ScienceDaily, SciTech). He is also a recipient of KCGS (Korea Computer Graphics Society) Young Researcher Award and SIGGRAPH Doctoral Consortium Award. He is currently working as a technical consultant for research institutes in Korea.
Jiamin Wu
(Tsinghua University)
High-speed 3D fluorescence microscopy with digital adaptive optics

Please LOG IN to view the video.
Date: September 25, 2019
Description:
Observing large-scale three-dimensional subcellular dynamics in vivo at high spatiotemporal resolution has long been a pursuit for biology. However, both the signal-to-noise ratio and resolution degradation in multicellular organisms pose great challenges. In this talk, I will discuss our recent work in in vivo aberration-free 3D fluorescence imaging at millisecond scale by scanning light-field microscopy with digital adaptive optics. Specifically, we propose scanning light-field microscopy to achieve diffraction-limited 3D synthetic aperture for incoherent conditions, which facilitates real-time digital adaptive optics for every pixel in post-processing. Various fast subcellular processes are observed, including mitochondrial dynamics in cultured neurons, membrane dynamics in zebrafish embryos, and calcium propagations in cardiac cells, human cerebral organoids, and Drosophila larval neurons, enabling simultaneous in vivo studies of morphological and functional dynamics in 3D
Further Information:
Jiamin Wu is a Postdoctoral Fellow within the Institute for Brain and Cognitive Sciences at Tsinghua University. His current research interests focus on computational microscopy and high-speed 3D imaging, with a particular emphasis on developing computation-based optical setups for observing large-scale biological dynamics in vivo. He received his PhD degree (2019) and bachelor’s degree (2014) in the Department of Automation from Tsinghua University under the supervisor of Professor Qionghai Dai.
Thomas Goossens
(KU Leuven and IMEC, Belgium)
Snapshot multispectral imaging from a different angle

Please LOG IN to view the video.
Date: June 12, 2019
Description:
Combining photography and spectroscopy, spectral imaging enables us to see what no traditional color camera has seen before. The current trend is to miniaturize the technology and bring it towards industry. In this talk, I will first give a general introduction to the most common pitfalls of spectral imaging and the challenges that come with miniaturization. Major pitfalls include balancing cross-talk, quantum efficiency, illumination and the optics. Miniaturization has become possible thanks to the monolithic per-pixel integration of thin-film Fabry-Pérot filters on CMOS imaging sensors. I will explain the difficulty of using these cameras with non- telecentric lenses. This is a major concern because of the angular dependency of the thin-film filters. I will demonstrate how this important issue can be solved using a model-based approach.
Further Information:
Thomas Goossens is a doctoral student in Electrical Engineering at KU Leuven in collaboration with the Sensors and Optics group at IMEC in Belgium. He holds a Master’s degree in mathematical engineering and a Bachelor’s degree in computer science from KU Leuven. His current research focuses on modeling the effect of imaging optics on the performance of thin-film Fabry-Pérot based spectral cameras.
Denis Kalkofen
(Institute of Computer Graphics and Vision at Graz University of Technology, Austria)
Augmented Reality Handbooks

Please LOG IN to view the video.
Date: June 5, 2019
Description:
Handbooks are an essential requirement for understanding and using many artifacts found in our daily life. We use handbooks to understand how things work and how to maintain them. Most handbooks still exist on paper relying on graphical illustrations and accompanying textual explanations to convey the relevant information to the reader. With the success of video sharing platforms a large body of video tutorials available for nearly every aspect of life became available. Video tutorials can often expand printed handbooks with the demonstrations of actions required to solve certain tasks. However, interpreting printed manuals and video tutorials often requires a certain mental effort since users have to match printed images or video frames with the physical object in their environment.
Further Information:
Dr. Denis Kalkofen is an Assistant Professor at the Institute of Computer Graphics and Vision at Graz University of Technology, Austria. His research is focused on developing visualization, interaction and authoring techniques for Mixed Reality environments.
Radek Grzeszczuk
(Light)
Computational Imaging at Light

Please LOG IN to view the video.
Date: May 29, 2019
Description:
Light develops computational imaging technologies that utilize heterogenous constellations of small cameras to create sophisticated imaging effects. This enables the company to provide hardware solutions that are compact – they can easily fit into a cell phone, or a similar small form factor. In this talk, I will review the recent progress of computational imaging research done at the company.
Further Information:
Radek is the Senior Director of Computational Imaging at Light. He received the PhD degree (’98) in Computer Science from University of Toronto. He moved to Silicon Valley in 1997, where he has been working as an individual contributor and managing teams of scientists and engineers in the areas of computer graphics, 3D modeling, augmented reality and visual search. Before joining Light, he has worked at Intel Research Labs (’97-’06), Nokia Research Center (’06-’12), Microsoft (’12-’15), Uber (’15-’16), and Amazon’s A9 (’16-’18).
Roarke Horstmeyer
(Duke University)
Towards intelligent computational microscopes

Please LOG IN to view the video.
Date: May 22, 2019
Description:
Deep learning algorithms offer a powerful means to automatically analyze the content of biomedical images. However, many biological samples of interest are difficult to resolve with a standard optical microscope. Either they are too large to fit within the microscope’s field-of-view, or too thick, or are quickly moving around. In this talk, I will discuss our recent work in addressing these challenges by using deep learning algorithms to design new experimental strategies for microscopic imaging. Specifically, we use deep neural networks to jointly optimize the physical parameters of our computational microscopes – their illumination settings, lens layouts and data transfer pipelines, for example – for specific tasks. Examples include learning specific illumination patterns that can improve classification of the malaria parasite by up to 15%, and establishing fast methods to automatically track moving specimens across gigapixel-sized images.
Harish Bhaskaran
(Oxford)
Phase change materials as functional photonic elements in future computing and displays

Please LOG IN to view the video.
Date: May 15, 2019
Description:
Photonics has always been the technology of the future. Light is faster, can multiplex etc. have all been “good” arguments for several decades and the ushering in of optical computing has perpetually been just a few years away. However, over the last decade, with the advent of micro-and nanofabrication techniques and phenomenal advances in photonics, that era seems to have finally arrived. The ability to create integrated optical circuits on a chip is near. But (and yes, there’s always a but) you need “functional” materials that can be used to control and manipulate this flow of information. In electronics, doping silicon results in one of the most versatile functional materials ever employed by humanity. And that can used to efficiently route electrical signals. How do you do that optically? I hope to convince you that whatever route photonics takes, a class of materials known as phase change materials, will play a key role in its commercialization. These materials can be addressed electrically, and whilst this can be used to control optical signals on photonic circuits this can also be used to create displays and smart windows. In this talk, I hope to give a whistle-stop tour of these applications of these materials with a view towards their near-term applications in displays, and their longer-term potential ranging from integrated photonic memories to machine-learning hardware components.
David Lindell
(Stanford University)
Computational Imaging with Single-Photon Detectors

Please LOG IN to view the video.
Date: May 8, 2019
Description:
Active 3D imaging systems, such as LIDAR, are becoming increasingly prevalent for applications in autonomous vehicle navigation, remote sensing, human-computer interaction, and more. These imaging systems capture distance by directly measuring the time it takes for short pulses of light to travel to a point and return. With emerging sensor technology we can detect down to single arriving photons and identify their arrival at picosecond timescales, enabling new and exciting imaging modalities. In this talk, I discuss trillion-frame-per-second imaging, efficient depth imaging with sparse photon detections, and imaging objects hidden from direct line of sight.
Further Information:
David is a PhD student in the Stanford Computational Imaging Lab. He received his bachelor’s and master’s degrees in EE from Brigham Young University (BYU), where he worked on satellite remote sensing of sea ice and soil moisture. His current research involves developing new computational algorithms for non-line-of-sight imaging, single-photon imaging, and 3D imaging with sensor fusion.
Kihwan Kim
(NVIDIA)
3D Computer Vision: Challenges and Beyond

Please LOG IN to view the video.
Date: May 1, 2019
Description:
3D Computer Vision (3D Vision) techniques have been the key solutions to various scene perception problems such as depth from image(s), camera/object pose estimation, localization and 3D reconstruction of a scene. These solutions are the major part of many AI applications including AR/VR, autonomous driving and robotics. In this talk, I will first review several categories of 3D Vision problems and their challenges. Given the category of static scene perception, I will introduce several learning-based depth estimation methods such as PlaneRCNN, Neural RGBD, and camera pose estimation methods including MapNet as well as few registration algorithms deployed in NVIDIA’s products. I will then introduce more challenging real world scenarios where scenes contain non-stationary rigid changes, non-rigid motions, or varying appearance due to the reflectance and lighting changes, which can cause scene reconstruction to fail due to the view dependent properties. I will discuss several solutions to these problems and conclude by summarizing the future directions for 3D Vision research that are being conducted by NVIDIA’s learning and perception research (LPR) team.
Further Information:
Kihwan Kim is a senior research scientist in learning and perception research group at NVIDIA Research. He received his Ph.D degree in Computer Science from Georgia Institute of Technology in 2011, and BS from Yonsei University in 2001.
Eben Rosenthal
(Stanford University)
Challenges in surgical imaging: Surgical and pathological devices

Please LOG IN to view the video.
Date: April 24, 2019
Description:
Cancer is a surgically treated disease; almost 80% of early stage solid tumors undergo surgery at some point in their treatment course. The biggest gap in quality remains the high rate of tumor-positive margins in surgical resections. The biggest barrier is that only a limited amount of the tissue can be sampled for frozen section analysis (< 5%). The biggest challenge is to develop equipment to direct frozen section analysis to the most area on the specimen most likely to contain a positive margin. To this end, we developed intraoperative devices to leverage molecular imaging during and immediately after cancer resections.
Further Information:
Eben Rosenthal is a surgeon-scientist and academic leader. He is currently serving as the John and Ann Doerr Medical Director of the Stanford Cancer Center, a position he has held since July 2015. He works collaboratively with the Stanford Cancer Institute and Stanford Health Care leaders to set the strategy for the clinical delivery of cancer care across Stanford Medicine and growing cancer networks. He has published over 160 peer-reviewed scientific manuscripts, authored many book chapters and published a book on optical imaging in cancer. Dr. Rosenthal has performed preclinical and clinical research on the role of targeted therapies for use to treat cancer alone and in combination with conventional therapy and has served as principal investigator on several early phase investigator-initiated and industry sponsored clinical trials in molecular oncology. He has conducted bench to bedside development of optical contrast agents to identify cancer in the operating room and led a multidisciplinary team of scientists through successful IND application to allow testing of fluorescent labeled antibodies in the clinic and operating room. These early phase clinical trials have demonstrated that this technique can visualize microscopic cancer in the operating room and may significantly improve clinical outcomes.
Katie Bouman
(California Institute of Technology)
Imaging a Black Hole with the Event Horizon Telescope

Date: April 18, 2019
Description:
This talk will present the methods and procedures used to produce the first results from the Event Horizon Telescope. It is theorized that a black hole will leave a “shadow” on a background of hot gas. Taking a picture of this black hole shadow could help to address a number of important scientific questions, both on the nature of black holes and the validity of general relativity. Unfortunately, due to its small size, traditional imaging approaches require an Earth-sized radio telescope. In this talk, I discuss techniques we have developed to photograph a black hole using the Event Horizon Telescope, a network of telescopes scattered across the globe. Imaging a black hole’s structure with this computational telescope requires us to reconstruct images from sparse measurements, heavily corrupted by atmospheric error.
This talk was sponsored by the Stanford Center for Image Systems Engineering (SCIEN). The SCIEN Colloquia are open to the public. The talks are also videotaped and posted the following week on talks.
stanford.edu for Stanford students, staff, faculty and SCIEN Affiliate Member companies. If you wish to receive announcements of SCIEN talks, you can add yourself to the email distribution list by going here.
Further Information:
Katie Bouman is an assistant professor in the Computing and Mathematical Sciences Department at the California Institute of Technology. Before joining Caltech, she was a postdoctoral fellow in the Harvard-Smithsonian Center for Astrophysics. She received her Ph.D. in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT in EECS. Before coming to MIT, she received her bachelor’s degree in Electrical Engineering from the University of Michigan. The focus of her research is on using emerging computational methods to push the boundaries of interdisciplinary imaging.
Michael Moorhead
(CEO of syGlass.io)
syGlass: Visualization, Annotation, and Communication of Very Large Image Volumes in Virtual Reality

Please LOG IN to view the video.
Date: April 3, 2019
Description:
Scientific researchers now utilize advanced microscopes to collect very large volumes of image data. These volumes often contain morphologically complex structures that can be difficult to comprehend on a 2D monitor, even with 3D projection. syGlass is a software stack designed specifically for the visualization, exploration and annotation of very large image volumes in virtual reality. This technology provides crucial advantages to exploring 3D volumetric data by correctly leveraging neurological processes and pipelines in the visual cortex, reducing cognitive load and search times, while increasing insight and annotation accuracy. The talk will provide a brief overview of new microscope technology, a description of the syGlass stack and product, some real use-cases from various labs around the world, and conclude with predictions and plans for the future of scientific communication.
Further Information:
Michael Morehead is the cofounder and CEO of
Chris Dainty
(Xperi FotoNation)
Fundamental Limits of Cell Phone Cameras

Please LOG IN to view the video.
Date: March 6, 2019
Description:
Further Information:
Chris Dainty is consultant with FotoNation in Galway, Ireland, and holds Emeritus Professor appointments at universities in the UK and Ireland. Throughout his career, he has investigated problems in optical imaging, scattering and propagation. In these areas, he has co-authored or edited six books (including “Image Science” co-authored with Rodney Shaw in 1974), >180 peer-reviewed papers and >300 conference presentations. He has graduated 65 PhD students and mentored >75 post-docs. He is a recipient of the International Commission for Optics Prize, IoP’s Thomas Young Medal and Prize, OSA’s C.E.K. Mees Medal and OSA’s Leadership Award. He is a fellow of The Optical Society, SPIE, The Institute of Physics, and the European Optical Society and a member of the Royal Irish Academy. He was President of The Optical Society (OSA) in 2011.
Sam Hasinoff
(Google)
Burst photography in practice

Please LOG IN to view the video.
Date: February 27, 2019
Description:
Mobile photography has been transformed by software. While sensors and lens design have improved over time, the mobile phone industry relies increasingly on software to mitigate physical limits and the constraints imposed by industrial design. In this talk, I’ll describe the HDR+ system for burst photography, comprising robust and efficient algorithms for capturing, fusing, and processing multiple images into a single higher-quality result. HDR+ is core imaging technology for Google’s Pixel phones – it’s used in all camera modes and powers millions of photos per day. I’ll give a brief history of HDR+ starting from Google Glass (2013), present key algorithms from the HDR+ system, then describe the new features that enable the recently released Night Sight mode.
Further Information:
Sam Hasinoff is a software engineer at Google. Before joining Google in 2011, he was an Research Assistant Professor at the Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic institute on the campus of the University of Chicago. From 2008-2010, he was a postdoctoral fellow at the Massachusetts Institute of Technology, supported in part by the National Sciences and Engineering Research Council of Canada. He received the BSc degree in computer science from the University of British Columbia in 2000, and the MSc and PhD degrees in computer science from the University of Toronto in 2002 and 2008, respectively. In 2006, he received an honorable mention for the Longuet-Higgins Best Paper Award at the European Conference on Computer Vision. He is the recipient of the Alain Fournier Award for the top Canadian dissertation in computer graphics in 2008.
Orazio Gallo
(NVIDIA)
Deep Learning Meets Computational Imaging: Combining Data-Driven Priors and Domain Knowledge

Please LOG IN to view the video.
Date: February 20, 2019
Description:
Neural networks have surpassed the performance of virtually any traditional computer vision algorithm thanks to their ability to learn priors directly from the data. The common encoder/decoder with skip connections architecture, for instance, has been successfully employed in a number of tasks, from optical flow estimation, to image deblurring, image denoising, and even higher level tasks, such as image-to-image translation. To improve the results further, one must leverage the constraints of the specific problem at hand, in particular when the domain is fairly well understood, such as the case of computational imaging. In this talk I will describe recent projects that build on this observation, ranging from reflection removal, to novel view synthesis, and video stitching.
Further Information:
Orazio Gallo is a Senior Research Scientist at NVIDIA Research. He is interested in computational imaging, computer vision, deep learning and, in particular, in the intersection of the three. Alongside topics such as view synthesis and 3D vision, his recent interests also include integrating traditional computer vision and computational imaging knowledge into deep learning architectures. Previously, Orazio’s research focus revolved around tinkering with the way pictures are captured, processed, and consumed by the photographer or the viewer. Orazio is an associate editor of the IEEE Transactions of Computational Imaging and was an associate editor of Signal Processing: Image Communication from 2015 to 2017. Since 2015 he is also a member of the IEEE Computational Imaging Technical Committee.
Yoav Shechtman
(Israel Institute of Technology)
Microscopic particle localization in 3D and in multicolor

Please LOG IN to view the video.
Date: February 6, 2019
Description:
Xu Liu
(Zhejiang University)
Optical Super-resolution Microscopy with Spatial Frequency Shift

Please LOG IN to view the video.
Date: February 5, 2019
Description:
Exploiting to see beyond the diffraction-limit of optical microscopies is of great significance. State-of-the-art solutions of super-resolution microscopy, like STED and STORM approaches, rely on the fluorescent effect of labeling samples. It is still challenging to obtain the super-resolution for unlabeling samples without fluorescent effect. To this end, we have developed a novel super-resolution method, called Spatial Frequency Shift (SFS), to realize the deep super-resolution with or without fluorescent effect in wide field imaging. The principle and the applications of this SFS technique will be presented.
Further Information:
Professor Xu Liu is Yangtze River Scholarship Chair Professor in the College of Optical Science and Engineering of Zhejiang University, China. He had obtained his BS. and MS. Degrees from Zhejiang University, respectively. He obtained his Ph.D. from Ecole Nationale Superieure de Physique de Marseille in France in 1990. Then he joined the optics department as a faculty. His research fields include: Thin film optics and optical coating techniques, Projection and AR display, Optical imaging and instrumentation. He has authored more than 200 journal papers and 60 patents. Currently, he serves as the Director of State Key Laboratory of Modern Optical Instrumentation, Zhejiang University, as well as the Secretary General of Chinese Optical Society. He is the fellow of OSA and SPIE.
Austin Russell
(Luminar)
Bottlenecks in Autonomy: The Last 1%

Please LOG IN to view the video.
Date: January 30, 2019
Description:
Accurate and reliable 3D perception is the key remaining bottleneck to making self-driving vehicles safe and ubiquitous. Today, it’s relatively easy to get an autonomous car to work 99% of the time, but it’s the incredibly long tail of edge cases that’s preventing them from reaching real-world deployment without a backup driver constantly watching over. All of this comes down to how well the autonomous car can see and understand the world around it. The key to achieving accurate, safer-than-human level 3D perception all starts with the LiDAR. That said, both legacy LiDAR solutions and newer upstarts, which largely leverage off-the-shelf components, have still struggled to meet the stringent performance requirements needed to solve key edge cases encountered in everyday driving scenarios.
Luminar, founded in 2012 by Austin Russell, has taken an entirely new approach to LiDAR, building its’ system from the ground up at the component level for over 5 years. The result was the first and only solution that meets and exceeds all of the key performance requirements demanded by Car/Truck OEM’s and technology leaders to achieve safe autonomy, in addition to unit economics that can enable widespread adoption across even mainstream consumer vehicle platforms. This has culminated with last years’ release of their first scalable product for autonomous test and development fleets, which has subsequently led to rapidly accelerating adoption in the market. During this talk, raw Luminar LiDAR data from autonomous test vehicles will be presented to the audience, demonstrating real world examples of life-threatening edge cases and how they can now be avoided.
Rafal Mantiuk
(University of Cambridge)
How many pixels are too many?

Please LOG IN to view the video.
Date: January 18, 2019
Description:
We start to lack the processing power and bandwidth to drive 8K and high-resolution head-mounted displays. However, as the human eye and visual system have their own limitations, the relevant question is what spatial and temporal resolution is the ultimate limit for any technology. In this talk, I will review the visual models of spatio-temporal and chromatic contrast sensitivity which can explain such limitations. Then I will show how they can be used to reduce rendering cost in VR-applications, find more efficient encoding of high dynamic range images and compress images in a visually lossless manner.
Further Information:
Rafał K. Mantiuk is Reader (Associate Professor) at the Department of Computer Science and Technology, University of Cambridge (UK). He received PhD from the Max-Planck-Institute for Computer Science (Germany). His recent interests focus on computational displays, novel display technologies, rendering and imaging algorithms that adapt to human visual performance and viewing conditions in order to deliver the best images given limited resources, such as computation time, bandwidth or dynamic range. He contributed to early work on high dynamic range imaging, including quality metrics (HDR-VDP), video compression and tone-mapping. Further details:http://www.cl.cam.ac.uk/~rkm38/.
Jan Kautz
(NVIDIA)
Image Domain Transfer

Please LOG IN to view the video.
Date: January 9, 2019
Description:
Image domain transfer includes methods that transform an image based on an example, commonly used in photorealistic and artistic style transfer, as well as learning-based methods that learn a transfer function based on a training set. These are usually based on generative adversarial networks (GANs), and can be supervised or unsupervised as well as unimodal or multimodal. I will present a number of our recent methods in this space that can be used to translate, for instance, a label map to a realistic street image, a day time street image to a night time street image, a dog to different cat breeds, and many more.
Further Information:
Jan is VP of Learning and Perception Research at NVIDIA. He leads the Learning & Perception Research team, working predominantly on computer vision problems (from low-level vision through geometric vision to high-level vision), as well as machine learning problems (including deep reinforcement learning, generative models, and efficient deep learning). Before joining NVIDIA in 2013, Jan was a tenured faculty member at University College London. He holds a BSc in Computer Science from the University of Erlangen-Nürnberg (1999), an MMath from the University of Waterloo (1999), received his PhD from the Max-Planck-Institut für Informatik (2003), and worked as a post-doctoral researcher at the Massachusetts Institute of Technology (2003-2006).
- Jon Barron » How to Learn a Camera
- Brian Wandell » Simulation Technologies for Image Systems Engineering
- Ravi Ramamoorthi » Light Fields: From Shape Recovery to Sparse Reconstruction
- Jason Mudge » Range compensating lens for non-imaging active optical systems
- Laru Marcu » Fluorescence Lifetime Techniques in Clinical Interventions
- Lars Omlor » Optical 3D scanning in an X-ray microscope
- Chang Yuan » Training for autonomous vehicles and mobile robots
- Giljoo Nam » Inverse Rendering for Realistic Computer Graphics
- Jiamin Wu » High-speed 3D fluorescence microscopy
- Thomas Goossens » Snapshot multispectral imaging from a different angle
- Denis Kalkofen » Augmented Reality Handbooks
- Radek Grzeszczuk » Computational Imaging at Light
- Roarke Horstmeyer » Towards intelligent computational microscopes
- Harish Bhaskaran » Phase change materials as functional photonic elements
- David Lindell » Computational Imaging with Single-Photon Detectors
- Kihwan Kim » 3D Computer Vision: Challenges and Beyond
- Eben Rosenthal » Challenges in surgical imaging: Surgical and pathological devices
- Katie Bouman » Imaging a Black Hole with the Event Horizon Telescope
- Michael Moorhead » syGlass
- Chris Dainty » Fundamental Limits of Cell Phone Cameras
- Sam Hasinoff » Burst photography in practice
- Orazio Gallo » Deep Learning Meets Computational Imaging
- Yoav Shechtman » Microscopic particle localization in 3D and in multicolor
- Xu Liu » Optical Super-resolution Microscopy
- Austin Russell » Bottlenecks in Autonomy: The Last 1%
- Rafal Mantiuk » How many pixels are too many?
- Jan Kautz » Image Domain Transfer
SCIEN Colloquia 2018
Liang Gao
(University of Illinois Urbana-Champaign)
Plenoptic Medical Cameras

Please LOG IN to view the video.
Date: December 5, 2018
Description:
Optical imaging probes like otoscopes and laryngoscopes are essential tools used by doctors to see deep into the human body. Until now, they have been crucially limited to two-dimensional (2D) views of tissue lesions in vivo that frequently jeopardize their diagnostic usefulness. Depth imaging is critically needed in medical diagnostics because most tissue lesions manifest themselves as abnormal 3D structural changes. In this talk, I will talk our recent effort to develop three-dimensional (3D) plenoptic imaging tool that revolutionizes diagnosis with unprecedented sensitivity and specificity in the images produced. Particularly, I will discuss two plenoptic medical cameras, a plenoptic otoscope and a plenoptic laryngoscope, and their applications for in-vivo imaging.
Further Information:
Dr. Liang Gao is currently an Assistant Professor of Electrical and Computer Engineering department at University of Illinois Urbana-Champaign. He is also affiliated with Beckman Institute for Advanced Science and Technology. His primary research interests encompass multidimensional optical imaging, including hyperspectral imaging and ultrafast imaging, photoacoustic tomography and microscopy, and cost-effective high-performance optics for diagnostics. Dr. Liang Gao is the author of more than 40 peer-reviewed publications in top-tier journals, such as Nature, Science Advances, Physics Report, and Annual Review of Biomedical Engineering. He received his BS degree in Physics from Tsinghua University in 2005 and Ph.D. degree in Applied Physics and Bioengineering from Rice University in 2011. Dr. Liang Gao is a recipient of NSF CAREER award in 2017 and NIH MIRA award for Early-Stage Investigators in 2018.
Petr Kollnhofer
(MIT)
Perceptual modeling with multi-modal sensing

Please LOG IN to view the video.
Date: November 28, 2018
Description:
The research of human perception has enabled many visual applications in computer graphics that efficiently utilize computation resources to deliver a high quality experience within the limitations of the hardware. Beyond vision, humans perceive their surrounding using variety of senses to build a mental model of the world and act upon it. This mental image is often incomplete or incorrect which may have safety implications. As we cannot directly see inside the head, we need to read indirect signals projected outside. In the first part of the talk I will show how perceptual modeling can be used to overcome and exploit limitations of one specific human sense – the vision. Then, I will describe how we can build sensors to observe other human interactions connected first with physical touch and then with eye gaze patterns. Finally, I will outline how such readings can be used to teach computers to understand human behavior, to predict and to provide assistance or safety.
Further Information:
Dr. Petr Kellnhofer has completed his PhD at Max-Planck Institute for Informatics in Germany under supervision of Prof. Hans-Peter Seidel and Prof. Karol Myszkowski. His thesis on perceptual modeling of human vision for stereoscopy was awarded the Eurographics PhD award. After the graduation he has been a postdoc in the group of Prof. Wojciech Matusik at MIT CSAIL and he has been working on topics related to human sensing such as eye tracking. Dr. Kellnhofer’s current research interest is a combination of perceptual modeling and machine learning in order to utilize data gathered from various types of sensors and to learn about human perception and higher level behavior.
Hany Farid
(Berkeley)
Photo Forensics from JPEG Coding Artifacts

Please LOG IN to view the video.
Date: November 14, 2018
Description:
The past few years have seen a startling and troubling rise in the fake-news phenomena in which everyone from individuals to state-sponsored entities produce and distribute mis-information, which is then widely promoted and disseminated on social media. The implications of fake news range from a mis-informed public to an existential threat to democracy, and horrific violence. At the same time, recent and rapid advances in machine learning are making it easier than ever to create sophisticated and compelling fake images and videos, making the fake-news phenomena even more powerful and dangerous. I will start by providing a broad overview of the field of image and video forensics and then I will describe in detail a suite of image forensic techniques that explicitly detect inconsistencies in JPEG coding artifacts
Gordon Wetzstein
(Stanford)
Computational Single-Photon Imaging

Please LOG IN to view the video.
Date: November 7, 2018
Description:
Time-of-flight imaging and LIDAR systems enable 3D scene acquisition at long range using active illumination. This is useful for autonomous driving, robotic vision, human-computer interaction and many other applications. The technological requirements on these imaging systems are extreme: individual photon events need to be recorded and time-stamped at a picosecond timescale, which is facilitated by emerging single-photon detectors. In this talk, we discuss a new class of computational cameras based on single-photon detectors. These enable efficient ways for non-line-of-sight imaging (i.e., looking around corners) and efficient depth sensing as well as other unprecedented imaging modalities.
Michael Broxton
(Google)
Wavefront coding techniques and resolution limits for light field microscopy

Please LOG IN to view the video.
Date: October 31, 2018
Description:
Light field microscopy is a rapid, scan-less volume imaging technique that requires only a standard wide field fluorescence microscope and a microlens array. Unlike scanning microscopes, which collect volumetric information over time, the light field microscope captures volumes synchronously in a single photographic exposure, and at speeds limited only by the frame rate of the image sensor. This is made possible by the microlens array, which focuses light onto the camera sensor so that each position in the volume is mapped onto the sensor as a unique light intensity pattern. These intensity patterns are the position-dependent point response functions of the light field microscope. With prior knowledge of these point response functions, it is possible to “decode” 3-D information from a raw light field image and computationally reconstruct a full volume.
In this talk I present an optical model for light field microscopy based on wave optics that accurately models light field point response functions. I describe an algorithm that solves for volumes using a GPU-accelerated iterative algorithm, and discuss priors that are useful for reconstructing biological specimens. I then explore the diffraction limit that applies for light field microscopy, and how it gives rise to a position-dependent resolution limits for this microscope. I’ll explain how these limits differ from more familiar resolution metrics commonly used in 3-D scanning microscopy, like the Rayleigh limit and the optical transfer function (OTF). Using this theory of resolution limits for the light field microscope, I explore new wavefront coding techniques that can modify the light field resolution limits and can address certain common reconstruction artifacts, at least to a degree. Certain resolution trade-offs exist that suggest that light field microscopy is just one of potentially many useful forms of computational microscopy. Finally, I describe our application of light field microscopy in neuroscience where we have used it to record calcium activity in populations of neurons within the brains of awake, behaving animals.
Michael Zollhöfer
(Stanford University)
Is it real? Deep Neural Face Reconstruction and Rendering

Please LOG IN to view the video.
Date: October 24, 2018
Description:
A broad range of applications in visual effects, computer animation, autonomous driving, and man-machine interaction heavily depend on robust and fast algorithms to obtain high-quality reconstructions of our physical world in terms of geometry, motion, reflectance, and illumination. Especially, with the increasing popularity of virtual, augmented and mixed reality devices, there comes a rising demand for real-time and low-latency solutions.
This talk covers data-parallel optimization and state-of-the-art machine learning techniques to tackle the underlying 3D and 4D reconstruction problems based on novel mathematical models and fast algorithms. The particular focus of this talk is on self-supervised face reconstruction from a collection of unlabeled in-the-wild images. The proposed approach can be trained end-to-end without dense annotations by fusing a convolutional encoder with a differentiable expert-designed renderer and a self-supervised training loss.
The resulting reconstructions are the foundation for advanced video editing effects, such as photo-realistic re-animation of portrait videos. The core of the proposed approach is a generative rendering-to-video translation network that takes computer graphics renderings as input and generates photo-realistic modified target videos that mimic the source content. With the ability to freely control the underlying parametric face model, we are able to demonstrate a large variety of video rewrite applications. For instance, we can reenact the full head using interactive user-controlled editing and realize high-fidelity visual dubbing
Further Information:
Michael Zollhöfer is a Visiting Assistant Professor at Stanford University. His stay at Stanford is funded by a postdoctoral fellowship of the Max Planck Center for Visual Computing and Communication (MPC-VCC), which he received for his work in the fields of computer vision, computer graphics, and machine learning. Before joining Stanford University, Michael was a Postdoctoral Researcher at the Max Planck Institute for Informatics working with Christian Theobalt. He received his PhD from the University of Erlangen-Nuremberg for his work on real-time reconstruction of static and dynamic scenes. During his PhD, he was an intern at Microsoft Research Cambridge working with Shahram Izadi on data-parallel optimization for real-time template-based surface reconstruction. The primary goal of his research is to teach computers to reconstruct and analyze our world at frame rate based on visual input. To this end, he develops key technology to invert the image formation models of computer graphics based on data-parallel optimization and state-of-the-art deep learning techniques. The reconstructed intrinsic scene properties, such as geometry, motion, reflectance, and illumination are the foundation for a broad range of applications not only in virtual and augmented reality, visual effects, computer animation, autonomous driving, and man-machine interaction, but also in other fields such as medicine and biomechanics.
Shalin Mehta
(Chan Zuckerberg Biohub)
Computational microscopy of dynamic order across biological scales

Please LOG IN to view the video.
Date: October 17, 2018
Description:
Living systems are characterized by emergent behavior of ordered components. Imaging technologies that reveal dynamic arrangement of organelles in a cell and of cells in a tissue are needed to understand the emergent behavior of living systems. I will present an overview of challenges in imaging dynamic order at the scales of cells and tissue, and discuss advances in computational label-free microscopy to overcome these challenges.
Further Information:
Shalin Mehta received his Ph.D. at the National University of Singapore, focusing on optics and biological microscopy. His Ph.D. research led to better mathematical models and novel approaches for label-free imaging of cellular morphology. He then joined the Marine Biological Laboratory in Woods Hole, where he developed novel imaging and computational methods for detecting molecular order across a range of scales in living systems. He built an instantaneous fluorescence polarization microscope that revealed the dynamics of molecular assemblies by tracking the orientation and position of molecules in live cells. At CZ Biohub, his lab seeks to measure physical properties of biological systems with increasing precision, resolution, and throughput by exploiting diverse light-matter interactions and algorithms.
Mohammad Musa
(Deepen AI)
How to train neural networks on LiDAR point cloud data

Please LOG IN to view the video.
Date: October 10, 2018
Description:
Accurate LiDAR classification and segmentation is required for developing critical ADAS & Autonomous Vehicles components. Mainly, its required for high definition mapping and developing perception and path/motion planning algorithms. This talk will cover best practices for how to accurately annotate and benchmark your AV/ADAS models against LiDAR point cloud ground truth training data.
Further Information:
Mohammad Musa started Deepen AI in January 2017 focusing on AI tools and infrastructure for the Autonomous Development industry. Mohammad used to lead product efforts for Google wide Initiatives to enable teams to build excellent products. He worked specifically on infrastructure products for tracking user centered metrics, bug management and user feedback loops. Prior to that, he was the head of Launch & Readiness at Google Apps for Work where he lead a cross functional team managing product launches, product roadmap, trusted tester and launch communications. Before Google, Mohammad worked in software engineering and technical sales positions in the video games and semiconductor industries in multiple startups.
Jerome Mertz
(Boston University)
The challenge of large-scale brain imaging

Please LOG IN to view the video.
Date: October 3, 2018
Description:
Advanced optical microscopy techniques have enabled the recording and stimulation of large populations of neurons deep within living, intact animal brains. I will present a broad overview of these techniques, and discuss challenges that still remain in performing large-scale imaging with high spatio-temporal resolution, along with various strategies that are being adopted to address these challenges.
Further Information:
Jerome Mertz received an AB in physics from Princeton University in 1984, and a PhD in quantum optics from UC Santa Barbara and the University of Paris VI in 1991. Following postdoctoral studies at the University of Konstanz and at Cornell University, he became a CNRS research director at the Ecole Supérieure de Physique et de Chimie Industrielle in Paris. He is currently a professor of Biomedical Engineering at Boston University. His interests are in the development and applications of novel optical microscopy techniques for biological imaging. He is also author of a textbook titled “Introduction to Optical Microscopy”.
Ben Backus
(Vivid Vision)
Mobile VR for vision testing and treatment

Please LOG IN to view the video.
Date: June 6, 2018
Description:
Consumer-level HMDs are adequate for many medical applications. Vivid Vision (VV) takes advantage of their low cost, light weight, and large VR gaming code base to make vision tests and treatments. The company’s software is built using the Unity engine, which allows for multiplatform support.in the Unity framework, allowing it to run on many hardware platforms. New headsets are available every six months or less, which creates interesting challenges within in the medical device space. VV’s flagship product is the commercially available Vivid Vision System, used by more than 120 clinics to test and treat binocular dysfunctions such as convergence difficulties, amblyopia, strabismus, and stereo blindness. VV has recently developed a new, VR-based visual field analyzer.
Jake Li
(Hamamatsu Photonics)
Emerging LIDAR concepts and sensor technologies for autonomous vehicles

Please LOG IN to view the video.
Date: May 30, 2018
Description:
Sensor technologies such as radar, camera, and LIDAR have become the key enablers for achieving higher levels of autonomous control in vehicles, from fleets to commercial. There are, however, still questions remaining: to what extent will radar and camera technologies continue to improve, and which LIDAR concepts will be the most successful? This presentation will provide an overview of the tradeoffs for LIDAR vs. competing sensor technologies (camera and radar); this discussion will reinforce the need for sensor fusion. We will also discuss the types of improvements that are necessary for each sensor technology. The presentation will summarize and compare various LIDAR designs — mechanical, flash, MEMS-mirror based, optical phased array, and FMCW (frequency modulated continuous wave) — and then discuss each LIDAR concept’s future outlook. Finally, there will be a quick review of guidelines for selecting photonic components such as photodetectors, light sources, and MEMS mirrors
Further Information:
Jake Q. Li is in charge of research and analysis of various market segments, with a concentration in the automotive LiDAR market. He is knowledgeable about various optical components such as photodetectors — including MPPC (a type of silicon photomultiplier or SiPM), avalanche photodiodes, and PIN photodiodes — and light emitters that are important parts of LIDAR system designs. He has expert understanding of the upcoming solid-state technology needs for the autonomous vehicle market. Together with his experience and understanding of the specific requirements needed for LIDAR systems, he will guide you through the selection process of the best photodetectors and light sources that will fit your individual needs.
Mark McCord
(Cepton Technologies)
LiDAR Technology for Autonomous Vehicles

Please LOG IN to view the video.
Date: May 23, 2018
Description:
LiDAR is a key sensor for autonomous vehicles that enables them to understand their surroundings in 3 dimensions. I will discuss the evolution of LiDAR, and describe various LiDAR technologies currently being developed. These include rotating sensors, MEMs and Optical Phase Array scanning devices, flash detector arrays, and single photon avalanche detectors. Requirements for autonomous vehicles are very challenging, and the different technologies each have advantages and disadvantages that will be discussed. The architecture of LiDAR also affects how it fits into the overall vehicle architecture. Image fusion with other sensors including radar, cameras, and ultrasound will be part of the overall solution. Other LiDAR applications including non-automotive transportation, mining, precision agriculture, UAV’s, mapping, surveying, and security will be described.
Further Information:
As Co-Founder and Vice President of Engineering at Cepton Technologies, Dr. Mark McCord leads the development of high performance, low-cost imaging LiDAR systems. Prior to Cepton, Dr. McCord was Director of System Engineering, Advanced Development at KLA-Tencor, where he developed electron beam technologies for etching and imaging silicon chips. Earlier in his career, Dr. McCord served as an Associate Professor of Electrical Engineering at Stanford University, where he and his group researched various methods of nanometer-scale silicon processing, and as a Research Staff Member at IBM Research, where he worked on development of X-ray and electron beam chip lithography. Dr. McCord earned a B.S. in Electrical Engineering from Princeton University and a PhD in Electrical Engineering from Stanford University.
Loic Royer
(Chan Zuckerberg Biohub)
Pushing the Limits of Fluorescence Microscopy with adaptive imaging and machine learning

Please LOG IN to view the video.
Date: May 16, 2018
Description:
Fluorescence microscopy lets biologist see and understand the intricate machinery at the heart of living systems and has led to numerous discoveries. Any technological progress towards improving image quality would extend the range of possible observations and would consequently open up the path to new findings. I will show how modern machine learning and smart robotic microscopes can push the boundaries of observability. One fundamental obstacle in microscopy takes the form of a trade-of between imaging speed, spatial resolution, light exposure, and imaging depth. We have shown that deep learning can circumvent these physical limitations: microscopy images can be restored even if 60-fold fewer photons are used during acquisition, isotropic resolution can be achieved even with a 10-fold under-sampling along the axial direction, and diffraction-limited structures can be resolved at 20-times higher frame-rates compared to state-of-the-art methods. Moreover, I will demonstrate how smart microscopy techniques can achieve the full optical resolution of light-sheet microscopes — instruments capable of capturing the entire developmental arch of an embryo from a single cell to a fully formed motile organism. Our instrument improves spatial resolution and signal strength two to five-fold, recovers cellular and sub-cellular structures in many regions otherwise not resolved, adapts to spatiotemporal dynamics of genetically encoded fluorescent markers and robustly optimises imaging performance during large-scale morphogenetic changes in living organisms.
Boyd Fowler
(Omnivision)
Advances in automotive image sensors

Please LOG IN to view the video.
Date: May 9, 2018
Description:
In this talk I present recent advances in 2D and 3D image sensors for automotive applications such as rear view cameras, surround view cameras, ADAS cameras and in cabin driver monitoring cameras. This includes developments in high dynamic range image capture, LED flicker mitigation, high frame rate capture, global shutter, near infrared sensitivity and range imaging. I will also describe sensor developments for short range and long range LIDAR systems.
Further Information:
Boyd Fowler joined OmniVision in December 2015 and is the CTO. Prior to joining OmniVision he was a founder and VP of Engineering at Pixel Devices where he focused on developing high performance CMOS image sensors. After Pixel Devices was acquired by Agilent Technologies, Dr. Fowler was responsible for advanced development of their commercial CMOS image sensors products. In 2005 Dr. Fowler joined Fairchild Imaging as the CTO and VP of technology, where he developed SCMOS image sensors for high performance scientific applications. After Fairchild Imaging was acquired by BAE Systems, Dr. Fowler was appointed the technology directory of the CCD/CMOS image sensor business. He has authored numerous technical papers, book chapters and patents. Dr. Fowler received his M.S. and Ph.D. degrees in Electrical Engineering from Stanford University in 1990 and 1995 respectively.
Anna-Karin Gustavson
(Stanford University)
3D single-molecule super-resolution microscopy using a tilted light sheet

Please LOG IN to view the video.
Date: May 2, 2018
Description:
To obtain a complete picture of subcellular structures, cells must be imaged with high resolution in all three dimensions (3D). In this talk, I will present tilted light sheet microscopy with 3D point spread functions (TILT3D), an imaging platform that combines a novel, tilted light sheet illumination strategy with engineered long axial range point spread functions (PSFs) for low-background, 3D super localization of single molecules as well as 3D super-resolution imaging in thick cells. Here the axial positions of the single molecules are encoded in the shape of the PSF rather than in the position or thickness of the light sheet. TILT3D is built upon a standard inverted microscope and has minimal custom parts. The result is simple and flexible 3D super-resolution imaging with tens of nm localization precision throughout thick mammalian cells. We validated TILT3D for 3D super-resolution imaging in mammalian cells by imaging mitochondria and the full nuclear lamina using the double-helix PSF for single-molecule detection and the recently developed Tetrapod PSFs for fiducial bead tracking and live axial drift correction. We think that TILT3D in the future will become an important tool not only for 3D super-resolution imaging, but also for live whole-cell single-particle and single-molecule tracking.
Further Information:
Dr. Anna-Karin Gustavsson is a postdoctoral fellow in the Moerner Lab at the Department of Chemistry at Stanford University, and she also holds a postdoctoral fellowship from the Karolinska Institute in Stockholm, Sweden. Her research is focused on the development and application of 3D single-molecule super-resolution microscopy for cell imaging, and includes the implementation of light sheet illumination for optical sectioning. She has a background in physics and received her PhD in Physics in 2015 from the University of Gothenburg, Sweden. Her PhD project was focused on studying dynamic responses in single cells by combining and optimizing techniques such as fluorescence microscopy, optical tweezers, and microfluidics. Dr. Gustavsson has received several awards, most notably the FEBS Journal Richard Perham Prize for Young Scientists in 2012 and the PicoQuant Young Investigator Award in 2018.
Seishi Takamura
(NTT)
Video Coding before and beyond HEVC

Please LOG IN to view the video.
Date: April 25, 2018
Description:
We are enjoying video contents in various situations. Though they are already compressed down to 1/10 – 1/1000 from its original size, it has been reported that video traffic over the internet is increasing 31% per year, within which the video traffic will occupy 82% by 2020. This is why development of better compression technology is eagerly demanded. ITU-T/ISO/IEC jointly developed the latest video coding standard, High Efficiency Video Coding (HEVC), in 2013. They are about to start next generation standard. Corresponding proposals will be evaluated at April 2018 meeting in San Diego, just a week before this talk.
In this talk, we will first overview the advances of video coding technology in the last several decades, latest topics including the report of the San Diego meeting, some new approaches including deep learning technique etc. will be presented.
Kyros Kutulakos
(Univ. of Toronto)
Transport-Aware Cameras

Please LOG IN to view the video.
Date: April 18, 2018
Description:
Conventional cameras record all light falling onto their sensor regardless of its source or its 3D path to the camera. In this talk I will present a emerging family of coded-exposure video cameras that can be programmed to record just a fraction of the light coming from an artificial source—be it a common street lamp or a programmable projector—based on the light path’s geometry or timing. Live video from these cameras offers a very unconventional view of our everyday world in which refraction and scattering can be notice with the naked eye can become apparent, and the flicker of electric lights can be turned into a powerful cue for analyzing the electrical grid from room to city.
I will discuss the unique optical properties and power efficiency of these “transport aware cameras” through three case studies: the ACam for analyzing the electrical grid, EpiScan3D for robust progress toward designing a computational CMOS sensor for coded two-bucket imaging—a novel capability that promises much more flexible and powerful transport-aware cameras compared to existing off-the-shelf solutions.
Thomas Burnett
(FoVI3D)
Light-field Display Architecture and the Heterogenous Display Ecosystem FoVI3D

Please LOG IN to view the video.
Date: April 11, 2018
Description:
Human binocular vision and acuity, and the accompanying 3D retinal processing of the human eye and brain are specifically designed to promote situational awareness and understanding in the natural 3D world. The ability to resolve depth within a scene whether natural or artificial improves our spatial understanding of the scene and as a result reduces the cognitive load accompanying the analysis and collaboration on complex tasks.
A light-field display projects 3D imagery that is visible to the unaided eye (without glasses or head tracking) and allows for perspective correct visualization within the display’s projection volume. Binocular disparity, occlusion, specular highlights and gradient shading, and other expected depth cues are correct from the viewer’s perspective as in the natural real-world light-field.
Light-field displays are no longer a science fiction concept and a few companies are producing impressive light-field display prototypes. This presentation will review:
· The application agnostic light-field display architecture being developed at FoVI3D.
· General light-field display properties and characteristics such as field of view, directional resolution, and their effect on the 3D aerial image.
· The computation challenge for generating high-fidelity light-fields.
· A display agnostic ecosystem.
Christian Theobalt
(Max-Planck-Institute (MPI) for Informatics)
Video-based Reconstruction of the Real World in Motion

Please LOG IN to view the video.
Date: March 21, 2018
Description:
New methods for capturing highly detailed models of moving real world scenes with cameras, i.e., models of detailed deforming geometry, appearance or even material properties, become more and more important in many application areas. They are needed in visual content creation, for instance in visual effects, where they are needed to build highly realistic models of virtual human actors. Further on, efficient, reliable and highly accurate dynamic scene reconstruction is nowadays an important prerequisite for many other application domains, such as: human-computer and human-robot interaction, autonomous robotics and autonomous driving, virtual and augmented reality, 3D and free-viewpoint TV, immersive telepresence, and even video editing.
The development of dynamic scene reconstruction methods has been a long standing challenge in computer graphics and computer vision. Recently, the field has seen important progress. New methods were developed that capture – without markers or scene instrumentation – rather detailed models of individual moving humans or general deforming surfaces from video recordings, and capture even simple models of appearance and lighting. However, despite this recent progress, the field is still at an early stage, and current technology is still starkly constrained in many ways. Many of today’s state-of-the-art methods are still niche solutions that are designed to work under very constrained conditions, for instance: only in controlled studios, with many cameras, for very specific object types, for very simple types of motion and deformation, or at processing speeds far from real-time.
In this talk, I will present some of our recent works on detailed marker-less dynamic scene reconstruction and performance capture in which we advanced the state of the art in several ways. For instance, I will briefly show new methods for marker-less capture of the full body (like our VNECT approach) and hands that work in more general environments, and even in real-time and with one camera. I will then show some of our work on high-quality face performance capture and face reenactment. Here, I will also illustrate the benefits of both model-based and learning-based approaches and show how different ways to join the forces of the two open up new possibilities. Live demos included !
More Information: https://www.youtube.com/channel/UCNdXGCWZ6oZqbt5Y12L9inw
Jacob Chakareski
(University of Alabama)
Drone IoT Networks for Virtual Human Teleportation

Please LOG IN to view the video.
Date: March 14, 2018
Description:
Further Information:
Jacob Chakareski is an Assistant Professor of Electrical and Computer Engineering at The University of Alabama, where he leads the Laboratory for VR/AR Immersive Communication (LION). His interests span virtual and augmented reality, UAV-IoT sensing and communication, and rigorous machine learning for stochastic control. Dr. Chakareski received the Swiss NSF Ambizione Career Award and the best paper award at ICC 2017. He trained as a PhD student at Rice and Stanford, held research appointments with Microsoft, HP Labs, and EPFL, and sits on the advisory board of Frame, Inc. His research is supported by the NSF, AFOSR, Adobe, NVIDIA, and Microsoft. For further info, please visit www.jakov.org.
Patrick Llull
(Google)
Temporal coding of volumetric imagery

Please LOG IN to view the video.
Date: March 7, 2018
Description:
‘Image volumes’ refer to realizations of images in other dimensions such as time, spectrum, and focus. Recent advances in scientific, medical, and consumer applications demand improvements in image volume capture. Though image volume acquisition continues to advance, it maintains the same sampling mechanisms that have been used for decades; every voxel must be scanned or captured in parallel and is presumed independent of its neighbors. Under these conditions, improving performance comes at the cost of increased system complexity, data rates, and power consumption.
This talk describes systems and methods with which to efficiently detect and visualize image volumes by temporally encoding the extra dimensions’ information into 2D measurements or displays. Some highlights of my research include video and 3D recovery from photographs, and true-3D augmented reality image display by time multiplexing. In the talk, I show how temporal optical coding can improve system performance, battery life, and hardware simplicity for a variety of platforms and applications.
Further Information:
Currently with Google’s Daydream virtual reality team, Patrick Llull completed his Ph.D. under Prof. David Brady at the Duke University Imaging and Spectroscopy Program (DISP) in May 2016. His doctoral research focused on compressive video and multidimensional sensing, with research internship experience with Ricoh Innovations in near-eye multifocal displays. During his Ph.D. Patrick won two best paper awards and was an NSF graduate fellowship honorable mention. Patrick graduated with his BS from the University of Arizona’s College of Optical Sciences in May 2012.
Marty Banks & Steve Cholewiak
(University of California, Berkeley)
ChromaBlur: Rendering Chromatic Eye Aberration Improves Accommodation and Realism

Please LOG IN to view the video.
Date: February 28, 2018
Description:
Computer-graphics engineers and vision scientists want to generate images that reproduce realistic depth-dependent blur. Current rendering algorithms take into account scene geometry, aperture size, and focal distance, and they produce photorealistic imagery as with a high-quality camera. But to create immersive experiences, rendering algorithms should aim instead for perceptual realism. In so doing, they should take into account the significant optical aberrations of the human eye. We developed a method that, by incorporating some of those aberrations, yields displayed images that produce retinal images much closer to the ones that occur in natural viewing. In particular, we create displayed images taking the eye’s chromatic aberration into account. This produces different chromatic effects in the retinal image for objects farther or nearer than current focus. We call the method ChromaBlur. We conducted two experiments that illustrate the benefits of ChromaBlur. One showed that accommodation (eye focusing) is driven quite effectively when ChromaBlur is used and that accommodation is not driven at all when conventional methods are used. The second showed that perceived depth and realism are greater with imagery created by ChromaBlur than in imagery created conventionally. ChromaBlur can be coupled with focus-adjustable lenses and gaze tracking to reproduce the natural relationship between accommodation and blur in HMDs and other immersive devices. It can thereby minimize the adverse effects of vergence-accommodation conflicts.
Chris Metzler
(Rice University)
Data-driven Computational Imaging

Please LOG IN to view the video.
Date: February 21, 2018
Description:
Between ever increasing pixel counts, ever cheaper sensors, and the ever expanding world-wide-web, natural image data has become plentiful. These vast quantities of data, be they high frame rate videos or huge curated datasets like Imagenet, stand to substantially improve the performance and capabilities of computational imaging systems. However, using this data efficiently presents its own unique set of challenges. In this talk I will use data to develop better priors, improve reconstructions, and enable new capabilities for computational imaging systems.
Further Information:
Chris Metzler is a PhD candidate in the Machine Learning, Digital Signal Processing, and Computational Imaging labs at Rice University. His research focuses on developing and applying new algorithms, including neural networks, to problems in computational imaging. Much of his work concerns imaging through scattering media, like fog and water, and last summer he interned in the U.S. Naval Research Laboratory’s Applied Optics branch. He is an NSF graduate research fellow and was formerly an NDSEG graduate research fellow.
Fu-Chung Huang
(NVIDIA Research)
Accelerated Computing for Light Field and Holographic Displays

Please LOG IN to view the video.
Date: February 14, 2017
Description:
In this talk, I will present two recently published papers at the annual SIGGRAPH ASIA 2017. For the first paper, we present a 4D light field sampling and rendering system for light field displays that can support both foveation and accommodation to reduce rendering cost while maintaining perceptual quality and comfort. For the second paper, we present a light field based Computer Generated Holography (CGH) rendering pipeline allowing for reproduction of high-definition 3D scenes with continuous depth and support of intra-pupil view dependent occlusion using computer generated hologram. Our rendering and Fresnel integral accurately accounts for diffraction and supports various types of reference illumination for holograms.
Further Information:
Fu-Chung Huang is a research scientist at Nvidia Research. He works on computational displays where high performance computation is applied to solve problems related to optics and perception. Recently, his research focus specifically on near-eye displays for virtual reality and augmented reality. He received Ph.D. in Computer Science from UC Berkeley in 2013 and his dissertation on Vision-correcting Light Field Displays won the Scientific America’s World Changing Ideas 2014. He was a visiting scientist at MIT Media Lab with Prof. Ramesh Raskar during 2011 to 2013 and at Stanford University with Prof. Gordon Wetzstein during 2014 to 2015.
Steve Silverman
(Google)
Street View 2018 - The Newest Generation Of Mapping Hardware

Please LOG IN to view the video.
Date: February 7, 2017
Description:
A brief overview of Street View from it’s inception 10 years ago until now will be presented. Street level Imagery has been the prime objective for Google’s Street View in the past, and has now migrated into a state-of-the-art mapping platform. Challenges and solutions to the design and fabrication of the imaging system and optimization of hardware to align with specific software post processing will be discussed. Real world challenges of fielding hardware in 80+ countries will also be addressed.
Further Information:
Steven Silverman is a Technical Program Manager at Google, Inc developing and deploying camera/mapping systems for Google.com/Street View. He has developed flash lidar systems which are part of the SpaceX Dragon Vehicle birthing system. He was the Chief Engineer for the Thermal Emission Spectrometers (TES and Mini-TES) for Mars Global Surveyor, both Mars Exploration Rovers, as well as the Chief Engineer for the Thermal Emission Imaging System (Themis) for Mars Odyssey. He graduated from Cal Poly SLO in Engineering Science, and has an MS in ECE from UCSB.
Kristen Grauman
(University of Texas at Austin)
Learning where to look in 360 environments

Please LOG IN to view the video.
Date: January 24, 2017
Description:
Further Information:
Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin. Her research in computer vision and machine learning focuses on visual recognition. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an Alfred P. Sloan Research Fellow and Microsoft Research New Faculty Fellow, a recipient of NSF CAREER and ONR Young Investigator awards, the PAMI Young Researcher Award in 2013, the 2013 IJCAI Computers and Thought Award, and a Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013. Work with her collaborators has been recognized with paper awards at CVPR 2008, ICCV 2011, ACCV 2016, and CHI 2017. She currently serves as an Associate Editor in Chief for the Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and an Editorial Board Member for the International Conference on Computer Vision (IJCV), and she served as a Program Chair of CVPR 2015 in Boston.
Alex Lidow
(Efficient Power Conversion Corporation (EPC))
Driverless Anything and the Role of LIDAR

Please LOG IN to view the video.
Date: January 17, 2017
Description:
LIDAR, or light detection and ranging, is a versatile light-based remote sensing technology that has been the subject of a great deal of attention in recent times. It has shown up in a number of media venues, and has even led to public debate about engineering choices of a well-known electric car company, Tesla Motors. During this talk the speaker will provide some background on LiDAR and discuss why it is a key link to the future autonomous vehicle ecosystem as well as its strong connection to power electronics technologies
Further Information:
Alex Lidow is CEO and co-founder of Efficient Power Conversion Corporation (EPC). Since 1977, Dr. Lidow has been dedicated to making power conversion more efficient upon the belief that this will reduce the harm to our environment and increase the global standard of living.
Edward Chang
(President of Research and Healthcare (DeepQ) at HTC)
Advancing Healthcare with AI and VR

Please LOG IN to view the video.
Date: January 10, 2017
Description:
Quality, cost, and accessibility form an iron triangle that has prevented healthcare from achieving accelerated advancement in the last few decades. Improving any one of the three metrics may lead to degradation of the other two. However, thanks to recent breakthroughs in artificial intelligence (AI) and virtual reality (VR), this iron triangle can finally be shattered. In this talk, I will share the experience of developing DeepQ, an AI platform for AI-assisted diagnosis and VR-facilitated surgery. I will present three healthcare initiatives we have undertaken since 2012: Healthbox, Tricorder, and VR surgery, and explain how AI and VR play pivotal roles in improving diagnosis accuracy and treatment effectiveness. And more specifically, how we have dealt with not only big data analytics, but also small data learning, which is typical in the medical domain. The talk concludes with roadmaps and a list of open research issues in signal processing and AI to achieve precision medicine and surgery.
Further Information:
Edward Chang currently serves as the President of Research and Healthcare (DeepQ) at HTC. Ed’s most notable work is co-leading the DeepQ project (with Prof. CK Peng at Harvard), working with a team of physicians, scientists, and engineers to design and develop mobile wireless diagnostic instruments. Such instruments can help consumers make their own reliable health diagnoses anywhere at any time. The project entered the Tricorder XPRIZE competition in 2013 with 310 other entrants and was awarded second place in April 2017 with 1M USD prize. The deep architecture that powers DeepQ is also applied to power Vivepaper, an AR product Ed’s team launched in 2016 to support immersive augmented reality experiences (for education, training, and entertainment).
- Liang Gao » Plenoptic Medical Cameras
- Petr Kollnhofer » Perceptual modeling with multi-modal sensing
- Hany Farid » Photo Forensics from JPEG Coding Artifacts
- Gordon Wetzstein » Computational Single-Photon Imaging
- Michael Broxton » Wavefront coding techniques and resolution limits for light field microscopy
- Michael Zollhöfer » Deep Neural Face Reconstruction and Rendering
- Shalin Mehta » Computational microscopy of dynamic order
- Mohammad Musa » How to train neural networks on LiDAR point cloud data
- Jerome Mertz » The challenge of large-scale brain imaging
- Ben Backus » Mobile VR for vision testing and treatment
- Jake Li » Emerging LIDAR concepts and sensor technologies
- Mark McCord » LiDAR Technology for Autonomous Vehicles
- Loic Royer » Pushing the Limits of Fluorescence Microscopy
- Boyd Fowler » Advances in automotive image sensors
- Anna-Karin Gustavson » 3D single-molecule super-resolution microscopy
- Seishi Takamura » Video Coding before and beyond HEVC
- Kyros Kutulakos » Transport-Aware Cameras
- Thomas Burnett » Light-field Display Architecture and the Heterogenous Display Ecosystem FoVI3D
- Christian Theobalt » Video-based Reconstruction of the Real World in Motion
- Jacob Chakareski » Drone IoT Networks for Virtual Human Teleportation
- Patrick Llull » Temporal coding of volumetric imagery
- Marty Banks & Steve Cholewiak » ChromaBlur
- Chris Metzler » Data-driven Computational Imaging
- Fu-Chung Huang » Accelerated Computing for Light Field and Holographic Displays
- Steve Silverman » Street View 2018 - The Newest Generation Of Mapping Hardware
- Kristen Grauman » Learning where to look in 360 environments
- Alex Lidow » Driverless Anything and the Role of LIDAR
- Edward Chang » Advancing Healthcare with AI and VR
SCIEN Colloquia 2017
Liang Gao
(University of Illinois at Urbana-Champaign)
Compressed Ultrafast Photography and Microscopy: Redefining the Limit of Passive Ultrafast Imaging

Please LOG IN to view the video.
Date: December 6, 2017
Description:
High-speed imaging is an indispensable technology for blur-free observation of fast transient dynamics in virtually all areas including science, industry, defense, energy, and medicine. Unfortunately, the frame rates of conventional cameras are significantly constrained by their data transfer bandwidth and onboard storage. We demonstrate a two-dimensional dynamic imaging technique, compressed ultrafast photography (CUP), which can capture non-repetitive time-evolving events at up to 100 billion fps. Compared with existing ultrafast imaging techniques, CUP has a prominent advantage of measuring an x, y, t (x, y, spatial coordinates; t, time) scene with a single camera snapshot, thereby allowing observation of transient events occurring on a time scale down to tens of picoseconds. Thanks to the CUP technology, for the first time, the human can see light pulses on the fly. Because this technology advances the imaging frame rate by orders of magnitude, we now enter a new regime and open new visions.
In this talk, I will discuss our recent effort to develop a second-generation CUP system and demonstrate its applications at scales from macroscopic to microscopic. For the first time, we imaged photonic Mach cones and captured “Sonic Boom” of light in action. Moreover, by adapting CUP for microscopy, we enabled two-dimensional fluorescence lifetime imaging at an unprecedented speed. The advantage of CUP recording is that even visually simple systems can be scientifically interesting when they are captured at such a high speed. Given CUP’s capability, we expect it to find widespread applications in both fundamental and applied sciences including biomedical research.
Hakan Urey
(Koç University in Istanbul-Turkey and CY Vision in San Jose, CA)
Next Generation Wearable AR Display Technologies

Please LOG IN to view the video.
Date: November 29, 2017
Description:
Wearable AR/VR displays have a long history and earlier efforts failed due to various limitations. Advances in sensors, optical technologies, and computing technologies renewed the interest in this area. Most people are convinced AR will be very big. A key question is whether AR glasses can be the new computing platform and replace smart phones? I’ll discuss some of the challenges ahead. We have been working on various wearable display architectures and I’ll discuss our efforts related to MEMS scanned beam displays, head-mounted projectors and smart telepresence screens, and holographic near-eye displays.
Further Information:
Wearable AR/VR displays have a long history and earlier efforts failed due to various limitations. Advances in sensors, optical technologies, and computing technologies renewed the interest in this area. Most people are convinced AR will be very big. A key question is whether AR glasses can be the new computing platform and replace smart phones? I’ll discuss some of the challenges ahead. We have been working on various wearable display architectures and I’ll discuss our efforts related to MEMS scanned beam displays, head-mounted projectors and smart telepresence screens, and holographic near-eye displays.
Kaan Aksit
(NVIDIA)
Near-Eye Varifocal Augmented Reality Displays

Please LOG IN to view the video.
Date: November 15, 2017
Description:
With the goal of registering dynamic synthetic imagery onto the real world, Ivan Sutherland envisioned a fundamental idea to combine digital displays with conventional optical components in a wearable fashion. Since then, various new advancements in the display engineering domain, and a broader understanding in the vision science domain have led us to computational displays for virtual reality and augmented reality applications. Today, such displays promise a more realistic and comfortable experience through techniques such as lightfield displays, holographic displays, always-in-focus displays, multiplane displays, and varifocal displays. In this talk, as an Nvidian, I will be presenting our new optical layouts for see-through computational near-eye displays that is simple, compact, varifocal, and provides a wide field of view with clear peripheral vision and large eyebox. Key to our efforts so far contain novel see-through rear-projection holographic screens, and deformable mirror membranes. We establish fundamental trade-offs between the quantitative parameters of resolution, field of view, and the form-factor of our designs; opening an intriguing avenue for future work on accommodation-supporting augmented reality display.
Further Information:
Kaan Akşit received his B.S. degree in electrical engineering from Istanbul Technical University, Turkey in 2007, his M.Sc. degree in electrical power engineering from RWTH Aachen University, Germany in 2010, and his Ph.D. degree in electrical engineering at Koç University, Turkey in 2014. In 2009, he joined Philips Research at Eindhoven, the Netherlands as an intern. In 2013, he joined Disney Research, Zurich, Switzerland as an intern. His past research include topics such as visible light communications, optical medical sensing, solar cars, and auto-stereoscopic displays. Since July 2014, he is working as a research scientist at Nvidia Corporation located at Santa Clara, USA, tackling the problems related to computational displays for virtual and augmented reality.
Andrew Jones
(USC)
Interactive 3D Digital Humans

Please LOG IN to view the video.
Date: November 8, 2017
Description:
This talk will cover recent methods for recording and displaying interactive life-sized digital humans using the ICT Light Stage, natural language interfaces, and automultiscopic 3D displays. We will then discuss the first full application of this technology to preserve the experience of in-person interactions with Holocaust survivors
Further Information:
Andrew Jones is a computer graphics programmer and inventor at the University of Southern California’s Institute for Creative Technology. In 2004, Jones began working in cultural heritage, using 3D scanning techniques to virtually reunite the Parthenon and its sculptures. The resulting depictions of the Parthenon were featured in the 2004 Olympics, PBS’s NOVA, National Geographic, the IMAX film Greece: Secrets of the Past, and The Louvre. However computer generated worlds only truly come alive when combined with interactive human characters.Subsequently, Andrew developed new techniques to record dynamic human facial and full-body performances. These photoreal real-time characters have been used by companies such as ImageMetrics, Activision, Digital Domain and Weta for visual effects and games. As part of his PhD, Jones designed new display devices that can show 3D imagery to multiple viewers without the need for stereo glasses, winning “Best Emerging Technology” at SIGGRAPH 2007. His current work with the USC Shoah Foundation explores how to use digital humans and holographic technology to change how we communicate with each other and the past.
Stefan Williams
( University of Sydney)
The Australian Centre for Field Robotics

Please LOG IN to view the video.
Date: November 1, 2017
Description:
Further Information:
Rudolf Oldenbourg and Talon Chandler
(Marine Biological Laboratory, Woods Hole MA)
Mapping molecular orientation using polarized light microscopy

Please LOG IN to view the video.
Date: November 1, 2017
Description:
Polarization is a basic property of light, but the human eye is not sensitive to it. Therefore, we don’t have an intuitive understanding of polarization and of optical phenomena that are based on it. They either elude us, like the polarization of the blue sky or the rainbow, or they puzzle us, like the effect of Polaroid sunglasses. Meanwhile, polarized light plays an important role in nature and can be used to manipulate and analyze molecular order in materials, including living cells, tissues, and whole organisms, by observation with the polarized light microscope.
Further Information:
Basel Salahieh
(Intel)
Light field Retargeting for Integral and Multi-panel Displays

Please LOG IN to view the video.
Date: October 25, 2017
Description:
Further Information:
Andrew Maimone
(Oculus Research)
Holographic Near-Eye Displays for Virtual and Augmented Reality

Please LOG IN to view the video.
Date: October 18, 2017
Description:
Further Information:
Gordon Wetzstein
(Stanford University)
Computational Near-Eye Displays

Please LOG IN to view the video.
Date: October 8, 2017
Description:
Virtual reality is a new medium that provides unprecedented user experiences. Eventually, VR/AR systems will redefine communication, entertainment, education, collaborative work, simulation, training, telesurgery, and basic vision research. In all of these applications, the primary interface between the user and the digital world is the near-eye display. While today’s VR systems struggle to provide natural and comfortable viewing experiences, next-generation computational near-eye displays have the potential to provide visual experiences that are better than the real world. In this talk, we explore the frontiers of VR/AR systems engineering and discuss next-generation near-eye display technology, including gaze-contingent focus, light field displays, monovision, holographic near-eye displays, and accommodation-invariant near-eye displays.
Further Information:
Gordon Wetzstein is an Assistant Professor of Electrical Engineering and, by courtesy, of Computer Science at Stanford University. He is the leader of the Stanford Computational Imaging Lab, an interdisciplinary research group focused on advancing imaging, microscopy, and display systems. Prior to joining Stanford in 2014, Prof. Wetzstein was a Research Scientist in the Camera Culture Group at the MIT Media Lab. He founded displayblocks.org as a forum for sharing computational display design instructions with the DIY community, and presented a number of courses on Computational Displays and Computational Photography at ACM SIGGRAPH.
Douglas Lanman
(Stanford University)
Focal Surface Displays

Please LOG IN to view the video.
Date: October 11, 2017
Description:
Conventional binocular head-mounted displays (HMDs) vary the stimulus to vergence with the information in the picture, while the stimulus to accommodation remains fixed at the apparent distance of the display, as created by the viewing optics. Sustained vergence-accommodation conflict (VAC) has been associated with visual discomfort, motivating numerous proposals for delivering near-correct accommodation cues. We introduce focal surface displays to meet this challenge, augmenting conventional HMDs with a phase-only spatial light modulator (SLM) placed between the display screen and viewing optics. This SLM acts as a dynamic freeform lens, shaping synthesized focal surfaces to conform to the virtual scene geometry. We introduce a framework to decompose target focal stacks and depth maps into one or more pairs of piecewise smooth focal surfaces and underlying display images. We build on recent developments in “optimized blending” to implement a multifocal display that allows the accurate depiction of occluding, semi-transparent, and reflective objects. Practical benefits over prior accommodation-supporting HMDs are demonstrated using a binocular focal surface display employing a liquid crystal on silicon (LCOS) phase SLM and an organic light-emitting diode (OLED) display.
Further Information:
Douglas Lanman is the Director of Computational Imaging at Oculus Research, where he leads investigations into advanced display and imaging technologies. His prior research has focused on head-mounted displays, glasses-free 3D displays, light field cameras, and active illumination for 3D reconstruction and interaction. He received a B.S. in Applied Physics with Honors from Caltech in 2002 and M.S. and Ph.D. degrees in Electrical Engineering from Brown University in 2006 and 2010, respectively. He was a Senior Research Scientist at NVIDIA Research from 2012 to 2014, a Postdoctoral Associate at the MIT Media Lab from 2010 to 2012, and an Assistant Research Staff Member at MIT Lincoln Laboratory from 2002 to 2005.
More Information: https://research.
Donald Dansereau
(Stanford University)
Computational Imaging for Robotic Vision

Please LOG IN to view the video.
Date: May 31, 2017
Description:
This talk argues for combining the fields of robotic vision and computational imaging. Both consider the joint design of hardware and algorithms, but with dramatically different approaches and results. Roboticists seldom design their own cameras, and computational imaging seldom considers performance in terms of autonomous decision-making.The union of these fields considers whole-system design from optics to decisions. This yields impactful sensors offering greater autonomy and robustness, especially in challenging imaging conditions. Motivating examples are drawn from autonomous ground and underwater robotics, and the talk concludes with recent advances in the design and evaluation of novel cameras for robotics applications..
Further Information:
Donald G. Dansereau joined the Stanford Computational Imaging Lab as a postdoctoral scholar in September 2016. His research is focused on computational imaging for robotic vision, and he is the author of the open-source Light Field Toolbox for Matlab. Dr. Dansereau completed B.Sc. and M.Sc. degrees in electrical and computer engineering at the University of Calgary in 2001 and 2004, receiving the Governor General’s Gold Medal for his work in light field processing. His industry experience includes physics engines for video games, computer vision for microchip packaging, and FPGA design for high-throughput automatic test equipment. In 2014 he completed a Ph.D. in plenoptic signal processing at the Australian Centre for Field Robotics, University of Sydney, and in 2015 joined on as a research fellow at the Australian Centre for Robotic Vision at the Queensland University of Technology, Brisbane. Donald’s field work includes marine archaeology on a Bronze Age city in Greece, seamount and hydrothermal vent mapping in the Sea of Crete and Aeolian Arc, habitat monitoring off the coast of Tasmania, and hydrochemistry and wreck exploration in Lake Geneva.
Rezah Zadeh
(Matroid and Stanford University)
FusionNet: 3D Object Classification Using Multiple Data Representations

Please LOG IN to view the video.
Date: May 31, 2017
Description:
High-quality 3D object recognition is an important component of many vision and robotics systems. We tackle the object recognition problem using two data representations: Volumetric representation, where the 3D object is discretized spatially as binary voxels – 1 if the voxel is occupied and 0 otherwise. Pixel representation where the 3D object is represented as a set of projected 2D pixel images. At the time of submission, we obtained leading results on the Princeton ModelNet challenge. Some of the best deep learning architectures for classifying 3D CAD models use Convolutional Neural Networks (CNNs) on pixel representation, as seen on the ModelNet leaderboard. Diverging from this trend, we combine both the above representations and exploit them to learn new features. This approach yields a significantly better classifier than using either of the representations in isolation. To do this, we introduce new Volumetric CNN (V-CNN) architectures.
Further Information:
Reza Zadeh is CEO at Matroid and Adjunct Professor at Stanford University. His work focuses on machine learning, distributed computing, and discrete applied mathematics. He has served on the Technical Advisory Board of Microsoft and Databricks.
Alex Hegyi
(PARC)
Hyperspectral imaging using polarization interferometry

Please LOG IN to view the video.
Date: May 17, 2017
Description:
Polarization interferometers are interferometers that utilize birefringent crystals in order to generate an optical path delay between two polarizations of light. In this talk I will describe how I have employed polarization interferometry to make two kinds of Fourier imaging spectrometers; in one case, by temporally scanning the optical path delay with a liquid crystal cell, and in the other, utilizing relative motion between scene and detector to spatially scan the optical path delay through a position-dependent wave plate.
Further Information:
Alex Hegyi is a Member of Research Staff at PARC, a Xerox company, where he works on novel concepts for optical sensing. He holds a PhD in Electrical Engineering from UC Berkeley and a BS with Honors and Distinction in Physics from Stanford. He is a former Hertz Foundation Fellow, is a winner of the Hertz Foundation Thesis Prize, and is one of the 2016 Technology Review “35 Innovators Under 35”
Kari Pulli
(Meta)
Heterogeneous Computational Imaging

Please LOG IN to view the video.
Date: May 3, 2017
Description:
Modern systems-on-a-chip (SoC) have many different types of processors that could be used in computational imaging. Unfortunately, they all have different programming models, and are thus difficult to optimize as a system. In this talk we discuss various standards (OpenCL, OpenVX) and domain-specific programming languages (Halide, Proximal) that make it easier to accelerate processing for computational imaging.
Further Information:
Kari Pulli is CTO at Meta. Before joining Meta, Kari worked as CTO of the Imaging and Camera Technologies Group at Intel influencing the architecture of future IPUs. He was VP of Computational Imaging at Light and before that he led research teams at NVIDIA Research (Senior Director) and at Nokia Research (Nokia Fellow) on Computational Photography, Computer Vision, and Augmented Reality. He headed Nokia’s graphics technology, and contributed to many Khronos and JCP mobile graphics and media standards, and wrote a book on mobile 3D graphics. Kari holds CS degrees from Univ. Minnesota (BSc), Univ. Oulu (MSc, Lic. Tech.), Univ. Washington (PhD); and an MBA from Univ. Oulu. He has taught and worked as a researcher at Stanford, Univ. Oulu, and MIT.
Greg Corrado
(Google)
Deep Learning Imaging Applications

Please LOG IN to view the video.
Date: April 26, 2017
Description:
Deep learning has driven huge progress in visual object recognition in the last five years, but this is one aspect of its application to imaging. This talk will provided a brief overview deep learning and artificial neural networks in computer vision, before delving into wide range of application Google has pursued in this area. Topics will include image summarization, image augmentation, artistic style transfer, and medical diagnostics.
Further Information:
Greg Corrado is a Principal Scientist at Google, and the co-founder of the Google Brain Team. He works at the nexus of artificial intelligence, computational neuroscience, and scalable machine learning, and has published in fields ranging from behavioral economics, to particle physics, to deep learning. In his time at Google he has worked to put AI directly into the hands of users via products like RankBrain and SmartReply, and into the hands of developers via opensource software releases like TensorFlow and word2vec. He currently leads several research efforts in advanced applications of machine learning, ranging from natural human communication to expanded healthcare availability. Before coming to Google, he worked at IBM Research on neuromorphic silicon devices and large scale neural simulations. He did his graduate studies in both Neuroscience and in Computer Science at Stanford University, and his undergraduate work in Physics at Princeton University
Steve Mann
(University of Toronto)
Monitorless Workspaces and Operating Rooms of the Future: Virtual/Augmented Reality through Multiharmonic Lock-In Amplifiers.

Please LOG IN to view the video.
Date: April 19, 2017
Description:
n my childhood I invented a new kind of lock-in amplifier and used it as the basis for the world’s first wearable augmented reality computer (http://wearcam.org/par). This allowed me to see radio waves, sound waves, and electrical signals inside the human body, all aligned perfectly with the physical space in which they were present. I built this equipment into special electric eyeglasses that automatically adjusted their convergence and focus to match their surroundings. By shearing the spacetime continuum one sees a stroboscopic vision in coordinates in which the speed of light, sound, or wave propagation is exactly zero (http://wearcam.org/kineveillance.pdf), or slowed down, making these signals visible to radio engineers, sound engineers, neurosurgeons, and the like. See (below) the attached picture of a violin attached to the desk in my office at Meta, where we’re creating the future of computing based on Human-in-the-Loop Intelligence (https://en.wikipedia.org/wiki/Humanistic_intelligence).
Further Information:
Felix Heide
(Stanford University)
Capturing the “Invisible”: Computational Imaging for Robust Sensing and Vision

Please LOG IN to view the video.
Date: April 12, 2017
Description:
Imaging has become an essential part of how we communicate with each other, how autonomous agents sense the world and act independently, and how we research chemical reactions and biological processes. Today’s imaging and computer vision systems, however, often fail in critical scenarios, for example in low light or in fog. This is due to ambiguity in the captured images, introduced partly by imperfect capture systems, such as cellphone optics and sensors, and partly present in the signal before measuring, such as photon shot noise. This ambiguity makes imaging with conventional cameras challenging, e.g. low-light cellphone imaging, and it makes high-level computer vision tasks difficult, such as scene segmentation and understanding.
In this talk, I will present several examples of algorithms that computationally resolve this ambiguity and make sensing and vision systems robust. These methods rely on three key ingredients: accurate probabilistic forward models, learned priors, and efficient large-scale optimization methods. In particular, I will show how to achieve better low-light imaging using cell-phones (beating Google’s HDR+), and how to classify images at 3 lux (substantially outperforming very deep convolutional networks, such as the Inception-v4 architecture). Using a similar methodology, I will discuss ways to miniaturize existing camera systems by designing ultra-thin, focus-tunable diffractive optics. Finally, I will present new exotic imaging modalities which enable new applications at the forefront of vision and imaging, such as seeing through scattering media and imaging objects outside direct line of sight.
Further Information:
Felix Heide is a postdoctoral research working with Professor Gordon Wetzstein in the Department of Electrical Engineering at Stanford University. He is interested in the theory and application of computational imaging and vision systems. Researching imaging systems end-to-end, Felix’s work lies at the intersection of optics, machine learning, optimization, computer graphics and computer vision. Felix has co-authored over 25 publications and filed 3 patents. He co-founded the mobile vision start-up Algolux. Felix received his Ph.D. in December 2016 at the University of British Columbia under the advisement of Professor Wolfgang Heidrich. His doctoral dissertation focuses on optimization for computational imaging and won the Alain Fournier Ph.D. Dissertation Award.
Peter Gao
(Cruise Automation)
Practical Computer Vision for Self-Driving Cars

Please LOG IN to view the video.
Date: April 5, 2017
Description:
Cruise is developing and testing a fleet of self driving cars on the streets of San Francisco. Getting these cars to drive is a hard engineering and science problem – this talk explains roughly how self driving cars work and how computer vision, from camera hardware to deep learning, helps make a self driving car go.
Further Information:
https://www.getcruise.com/ see also http://www.theverge.com/2017/1/19/14327954/gm-self-driving-car-cruise-chevy-bolt-video
Hans Kiening
(Arnold & Richter Cine Technik)
ARRIScope - A new era in surgical microscopy

Please LOG IN to view the video.
Date: March 14, 2017
Description:
The continuous increase in performance and the versatility of ARRI´s digital motion picture camera systems led to our development of the first fully digital stereoscopic operating microscope, the ARRISCOPE. For the last 18 months’ multiple units have been used in clinical trials at renowned clinics in the field of Otology in Germany. During our presentation we will cover the obstacles, initial applications and future potentials of 3D camera based surgical microscopes and give an insight into the technical preconditions and advantages of the digital imaging chain. More Information: http://www.arrimedical.com/
Further Information:
Dr. Hans Kiening is general manager and founder of the medical business unit at Arnold & Richter (ARRI) based in Munich, Germany. He has more than 20 years of experience with image science and sensor technologies. He began his career at ARRI Research & Development in 1996, where he developed an automated image analysis and calibration system for the ARRILASER (a solidstate laser filmrecorder). In 2012, he conceptualized and realized the development of a purely digital surgical microscope based on the Alexa camera system – the ARRISCOPE. He holds a lectureship at the University of Applied Science in Munich and is the author of many SMPTE papers (journal award 2006) and medical/media image science related patents. He holds a PhD in image science from the University of Cottbus, Germany.
Christian Sandor
(Nara Institute of Science and Technology)
Breaking the Barriers to True Augmented Reality

Please LOG IN to view the video.
Date: March 14, 2017
Description:
In 1950, Alan Turing introduced the Turing Test, an essential concept in the philosophy of Artificial Intelligence (AI). He proposed an “imitation game” to test the sophistication of an AI software. Similar tests have been suggested for fields including Computer Graphics and Visual Computing. In this talk, we will propose an Augmented Reality Turing Test (ARTT).
Augmented Reality (AR) embeds spatially-registered computer graphics in the user’s view in realtime. This capability can be used for a lot of purposes; for example, AR hands can demonstrate manual repair steps to a mechanic. To pass the ARTT, we must create AR objects that are indistinguishable from real objects. Ray Kurzweil bet USD 20,000 that the Turing Test will be passed by 2029. We think that the ARTT can be passed significantly earlier.
We will discuss the grand challenges for passing the ARTT, including: calibration, localization & tracking, modeling, rendering, display technology, and multimodal AR. We will also show examples from our previous and current work at Nara Institute of Science and Technology in Japan.
Further Information:
Dr. Christian Sandor is an Associate Professor at one of Japan’s most prestigious research universities, Nara Institute of Science and Technology (NAIST), where he is co-directing the Interactive Media Design Lab together with Professor Hirokazu Kato. Since the year 2000, his foremost research interest is Augmented Reality, as he believes that it will have a profound impact on the future of mankind.
In 2005, he obtained a doctorate in Computer Science from the Munich University of Technology, Germany under the supervision of Prof. Gudrun Klinker and Prof. Steven Feiner. He decided to explore the research world in the spirit of Alexander von Humboldt and has lived outside of Germany ever since to work with leading research groups at institutions including: Columbia University (New York, USA), Canon’s Leading-Edge Technology Research Headquarters (Tokyo, Japan), Graz University of Technology (Austria), University of Stuttgart (Germany), and Tohoku University (Japan).
Before joining NAIST, he directed the Magic Vision Lab (http://www.magicvisionlab.com ). Together with his students, he won awards at the premier Augmented Reality conference, IEEE International Symposium on Mixed and Augmented Reality (ISMAR): best demo (2011) and best poster honourable mention (2012, 2013). He presented several keynotes and acquired funding close to 1.5 million dollars; in 2012, Magic Vision Lab was the first, and still only, Australian lab to be awarded in Samsung’s Global Research Outreach Program. In 2014, he received a Google Faculty Award for creating an Augmented Reality X-Ray system for Google Glass.
Jon Shlens
(Google)
A Learned Representation for Artistic Style

Please LOG IN to view the video.
Date: March 1, 2017
Description:
The diversity of painting styles represents a rich visual vocabulary for the construction of an image. The degree to which one may learn and parsimoniously capture this visual vocabulary measures our understanding of the higher level features of paintings, if not images in general. In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings. We demonstrate that such a network generalizes across a diversity of artistic styles by reducing a painting to a point in an embedding space. Importantly, this model permits a user to explore new painting styles by arbitrarily combining the styles learned from individual paintings. We hope that this work provides a useful step towards building rich models of paintings and offers a window on to the structure of the learned representation of artistic style.
More information: https://research.google.com/pubs/pub45832.html
Further Information:
Jonathon Shlens received his Ph.D in computational neuroscience from UC San Diego in 2007 where his research focused on applying machine learning towards understanding visual processing in real biological systems. He was previously a research fellow at the Howard Hughes Medical Institute, a research engineer at Pixar Animation Studios and a Miller Fellow at UC Berkeley. He has been at Google Research since 2010 and is currently a research scientist focused on building scalable vision systems. During his time at Google, he has been a core contributor to deep learning systems including the recently open-sourced TensorFlow. His research interests have spanned the development of state-of-the-art image recognition systems and training algorithms for deep networks.
Henry Fuchs
(University of North Carolina)
The AR/VR Renaissance: promises, disappointments, unsolved problems

Please LOG IN to view the video.
Date: March 1, 2017
Description:
Augmented and Virtual Reality have been hailed as “the next big thing” several times in the past 25 years. Some are predicting that VR will be the next computing platform, or at least the next platform for social media. Others worry that today’s VR systems are closer to the 1990s Apple Newton than the 2007 Apple iPhone. This talk will feature a short, personal history of AR and VR, a survey of some of current work, sample applications, and remaining problems. Current work with encouraging results include 3D acquisition of dynamic, populated spaces; compact and wide field-of-view AR displays; low-latency and high-dynamic range AR display systems; and AR lightfield displays that may reduce the accommodation-vergence conflict.
More information: http://henryfuchs.web.unc.edu/
Further Information:
Henry Fuchs (PhD, Utah, 1975) is the Federico Gil Distinguished Professor of Computer Science and Adjunct Professor of Biomedical Engineering at UNC Chapel Hill, coauthor of over 200 papers, mostly on rendering algorithms (BSP Trees), graphics hardware (Pixel-Planes), head-mounted / near-eye and large-format displays, virtual and augmented reality, telepresence, medical and training applications. He is a member of the National Academy of Engineering, a fellow of the American Academy of Arts and Sciences, recipient of the 2013 IEEE VGTC Virtual Reality Career Award, and the 2015 ACM SIGGRAPH Steven Anson Coons Award.
Vivek Goyal
(Boston University)
First-Photon Imaging and Other Imaging with Few Photons

Please LOG IN to view the video.
Date: February 22, 2017
Description:
LIDAR systems use single-photon detectors to enable long-range reflectivity and depth imaging. By exploiting an inhomoheneous Poisson process observation model and the typical structure of natural scenes, first-photon imaging demonstrates the possibility of accurate LIDAR with only 1 detected photon per pixel, where half of the detections are due to (uninformative) ambient light. I will explain the simple ideas behind first-photon imaging. Then I will touch upon related subsequent works that mitigate the limitations of detector arrays, withstand 25-times more ambient light, allow for unknown ambient light levels, and capture multiple depths per pixel.
More information: http://www.bu.edu/eng/profile/vivek-goyal/
Further Information:
Vivek Goyal received the M.S. and Ph.D. degrees in electrical engineering from the University of California, Berkeley, where he received the Eliahu Jury Award for outstanding achievement in systems, communications, control, or signal processing. He was a Member of Technical Staff at Bell Laboratories, a Senior Research Engineer for Digital Fountain, and the Esther and Harold E. Edgerton Associate Professor of Electrical Engineering at MIT. He was an adviser to 3dim Tech, winner of the MIT $100K Entrepreneurship Competition Launch Contest Grand Prize, and consequently with Nest Labs. He is now an Associate Professor of Electrical and Computer Engineering at Boston University.
Abe Davis
(Stanford University)
Visual Vibration Analysis

Please LOG IN to view the video.
Date: February 15, 2017
Description:
I will show how video can be a powerful way to measure physical vibrations. By relating the frequencies of subtle, often imperceptible changes in video to the vibrations of visible objects, we can reason about the physical properties of those objects and the forces that drive their motion. In my talk I’ll show how this can be used to recover sound from silent video (Visual Microphone), estimate the material properties of visible objects (Visual Vibrometry), and learn enough about the physics of objects to create plausible image-space simulations (Dynamic Video)
Further Information:
Abe Davis is a new postdoc at Stanford working with Doug James. He recently completed his PhD at MIT, where he was advised by Fredo Durand. His thesis focused on analyzing subtle variations in video to reason about physical vibrations. Abe has explored applications of his work in graphics, vision, and civil engineering, with publications in SIGGRAPH, SIGGRAPH Asia, and CVPR, as well as top venues in structural health monitoring and nondestructive testing. His dissertation won the 2016 Sprowls award for outstanding thesis in computer science. Abe’s research has been featured in most major news outlets that cover science and technology. Business Insider named him one of the “8 most innovative scientists in tech and engineering” in 2015, and Forbes named him one of their “30 under 30” in 2016.
Trevor Darrell
(UC Berkeley )
Adversarial perceptual representation learning across diverse modalities and domains

Please LOG IN to view the video.
Date: February 8, 2017
Description:
Learning of layered or “deep” representations has provided significant advances in computer vision in recent years, but has traditionally been limited to fully supervised settings with very large amounts of training data. New results in adversarial adaptive representation learning show how such methods can also excel when learning in sparse/weakly labeled settings across modalities and domains. I’ll review state-of-the-art models for fully convolutional pixel-dense segmentation from weakly labeled input, and will discuss new methods for adapting models to new domains with few or no target labels for categories of interest. As time permits, I’ll present recent long-term recurrent network models that learn cross-modal description and explanation, visuomotor robotic policies that adapt to new domains, and deep autonomous driving policies that can be learned from heterogeneous large-scale dashcam video datasets.
Further Information:
Prof. Darrell is on the faculty of the CS Division of the EECS Department at UC Berkeley and he is also appointed at the UC-affiliated International Computer Science Institute (ICSI). Darrell’s group develops algorithms for large-scale perceptual learning, including object and activity recognition and detection, for a variety of applications including multimodal interaction with robots and mobile devices. His interests include computer vision, machine learning, computer graphics, and perception-based human computer interfaces. Prof. Darrell was previously on the faculty of the MIT EECS department from 1999-2008, where he directed the Vision Interface Group. He was a member of the research staff at Interval Research Corporation from 1996-1999, and received the S.M., and PhD. degrees from MIT in 1992 and 1996, respectively. He obtained the B.S.E. degree from the University of Pennsylvania in 1988, having started his career in computer vision as an undergraduate researcher in Ruzena Bajcsy’s GRASP lab.
Keisuke Goda
(University of Tokyo)
High-speed imaging meets single-cell analysis

Please LOG IN to view the video.
Date: January 27, 2017
Description:
High-speed imaging is an indispensable tool for blur-free observation and monitoring of fast transient dynamics in today’s scientific research, industry, defense, and energy. The field of high-speed imaging has steadily grown since Eadweard Muybridge demonstrated motion-picture photography in 1878. High-speed cameras are commonly used for sports, manufacturing, collision testing, robotic vision, missile tracking, and fusion science and are even available to professional photographers. Over the last few years, high-speed imaging has been shown highly effective for single-cell analysis – the study of individual biological cells among populations for identifying cell-to-cell differences and elucidating cellular heterogeneity invisible to population-averaged measurements. The marriage of these seemingly unrelated disciplines has been made possible by exploiting high-speed imaging’s capability of acquiring information-rich images at high frame rates to obtain a snapshot library of numerous cells in a short duration of time (with one cell per frame), which is useful for accurate statistical analysis of the cells. This is a paradigm shift in the field of high-speed imaging since the approach is radically different from its traditional use in slow-motion analysis. In this talk, I introduce a few different methods for high-speed imaging and their application to single-cell analysis for precision medicine and green energy.
Further Information:
Keisuke Goda is Department Chair and a Professor of Physical Chemistry in the Department of Chemistry at the University of Tokyo and holds an adjunct faculty position at UCLA. He obtained his BA degree summa cum laude from UC Berkeley in 2001 and his PhD from MIT in 2007, both in physics. At MIT, he worked on the development of quantum-enhancement techniques in LIGO for gravitational-wave detection. His research currently focuses on the development of innovative laser-based molecular imaging and spectroscopy methods for data-driven science. He has been awarded the Gravitational Wave International Committee Thesis Award (2008), Burroughs Wellcome Fund Career Award (2011), Konica Minolta Imaging Science Award (2013), IEEE Photonics Society Distinguished Lecturers Award (2014), and WIRED Audi Innovation Award (2016). He serves as an Associate Editor for APL Photonics (AIP Publishing) and a Young Global Leader for World Economic Forum.
Alfredo Dubra
(Stanford University)
Adaptive optics retinal imaging: more than just high-resolution

Please LOG IN to view the video.
Date: January 18, 2017
Description:
The majority of the cells in the retina do not reproduce, making early diagnosing of eye disease paramount. Through improved resolution provided by the correction of the ocular monochromatic aberrations, adaptive optics combined with conventional and novel imaging techniques reveal pathology at the cellular-scale. When compared with existing clinical tools, the ability to visualize retinal cells and microscopic structures non-invasively represents a quantum leap in the potential for diagnosing and managing ocular, systemic and neurological diseases. The presentation will first cover the adaptive optics technology itself and some of its unique technical challenges. This will be followed by a review of AO-enhanced imaging modalities applied to the study of the healthy and diseased eye, with particular focus on multiple-scattering imaging to reveal transparent retinal structures.
Further Information:
Alfredo (Alf) Dubra is an Associate Professor of Ophthalmology at Stanford (Byers Eye Institute). He trained in Physics in the Universidad de la República in Uruguay (BSc and MSc) and at Imperial College London in the United Kingdom (PhD). Before joining Stanford, he was with the University of Rochester and the Medical College of Wisconsin. His research focuses on the translation of mathematical, engineering and optical tools for the diagnosing, monitoring progression and treatment of ocular disease.
Emily Cooper
(Dartmouth College)
Designing and assessing near-eye displays to increase user inclusivity

Please LOG IN to view the video.
Date: January 11, 2017
Description:
Recent years have seen impressive growth in near-eye display systems, which are the basis of most virtual and augmented reality experiences. There are, however, a unique set of challenges to designing a display system that is literally strapped to the user’s face. With an estimated half of all adults in the United States requiring some level of visual correction, maximizing inclusivity for near-eye displays is essential. I will describe work that combines principles from optics, optometry, and visual perception to identify and address major limitations of near-eye displays both for users with normal vision and those that require common corrective lenses. I will also describe ongoing work assessing the potential for near-eye displays to assist people with less common visual impairments at performing day-to-day tasks.generations of smart hospitals.
Further Information:
Emily Cooper is an assistant research professor in the Psychological and Brain Sciences Department at Dartmouth College. Emily’s research focuses on basic and applied visual perception, including 3D vision and perceptual graphics. She received her B.A. in Psychology and English Literature from the University of Chicago in 2007. She received her Ph.D. in Neuroscience from the University of California, Berkeley in 2012. Following a postdoctoral fellowship at Stanford University, she joined the faculty at Dartmouth College in 2015.
- Liang Gao » Compressed Ultrafast Photography and Microscopy
- Hakan Urey » Next Gen Wearable AR Display
- Kaan Aksit » Near-Eye Varifocal Augmented Reality Displays
- Andrew Jones » Interactive 3D Digital Humans
- Stefan Williams » Australian Centre for Field Robotics
- Rudolf Oldenbourg and Talon Chandler » Mapping molecular orientation
- Basel Salahieh » Light field Retargeting
- Andrew Maimone » Holographic Near-Eye Displays
- Gordon Wetzstein » Computational Near-Eye Displays
- Douglas Lanman » Focal Surface Displays
- Donald Dansereau » Computational Imaging for Robotic Vision
- Rezah Zadeh » FusionNet
- Alex Hegyi » Hyperspectral imaging
- Kari Pulli » Heterogeneous Computational Imaging
- Greg Corrado » Deep Learning Imaging Applications
- Steve Mann » Monitorless Workspaces and Operating Rooms of the Future
- Felix Heide » Computational Imaging for Robust Sensing and Vision
- Peter Gao » Practical Computer Vision for Self-Driving Cars
- Hans Kiening » ARRIScope - A new era in surgical microscopy
- Christian Sandor » Breaking the Barriers to True Augmented Reality
- Jon Shlens » A Learned Representation for Artistic Style
- Henry Fuchs » The AR/VR Renaissance
- Vivek Goyal » First-Photon Imaging and Other Imaging with Few Photons
- Abe Davis » Visual Vibration Analysis
- Trevor Darrell » Adversarial perceptual representation
- Keisuke Goda » High-speed imaging meets single-cell analysis
- Alfredo Dubra » Adaptive optics retinal imaging: more than just high-resolution
- Emily Cooper » Designing and assessing near-eye displays
SCIEN Colloquia 2016
Daniel Palanker
(Stanford University)
Electronic augmentation of body functions: progress in electro-neural interfaces

Please LOG IN to view the video.
Date: December 6, 2016
Description:
Electrical nature of neural signaling allows efficient bi-directional electrical communication with the nervous system. Currently, electro-neural interfaces are utilized for partial restoration of sensory functions, such as hearing and sight, actuation of prosthetic limbs and restoration of tactile sensitivity, enhancement of tear secretion, and many others. Deep brain stimulation helps controlling tremor in patients with Parkinson’s disease, improve muscle control in dystonia, and in other neurological disorders. With technological advances and progress in understanding of the neural systems, these interfaces may allow not only restoration or augmentation of the lost functions, but also expansion of our natural capabilities – sensory, cognitive and others. I will review the state of the field and future directions of technological development.
generations of smart hospitals.
Further Information:
Daniel Palanker is a Professor in the Department of Ophthalmology and Director of the Hansen Experimental Physics Laboratory at Stanford University. He received MSc in Physics in 1984 from the Yerevan State University in Armenia, and PhD in Applied Physics in 1994 from the Hebrew University of Jerusalem, Israel.
Dr. Palanker studies interactions of electric field with biological cells and tissues, and develops optical and electronic technologies for diagnostic, therapeutic, surgical and prosthetic applications, primarily in ophthalmology. These studies include laser-tissue interactions with applications to ocular therapy and surgery, and interferometric imaging of neural signals. In the field of electro-neural interfaces, Dr. Palanker is developing retinal prosthesis for restoration of sight to the blind and implants for electronic control of secretary glands and blood vessels.
Several of his developments are in clinical practice world-wide: Pulsed Electron Avalanche Knife (PEAK PlasmaBlade), Patterned Scanning Laser Photocoagulator (PASCAL), and OCT-guided Laser System for Cataract Surgery (Catalys). Several others are in clinical trials: Gene therapy of the retinal pigment epithelium (Ocular BioFactory, Avalanche Biotechnologies Inc); Neural stimulation for enhanced tear secretion (TearBud, Allergan Inc.); Smartphone-based ophthalmic diagnostics and monitoring (Paxos, DigiSight Inc.).
Alexandre Alahi
(Stanford University)
Towards Socially-aware AI

Please LOG IN to view the video.
Date: November 15, 2016
Description:
n this talk, I will present my work towards socially-aware machines that can understand human social dynamics and learn to forecast them. First, I will highlight the machine vision techniques behind understanding the behavior of more than 100 million individuals captured by multi-modal cameras in urban spaces. I will show how to use sparsity promoting priors to extract meaningful information about human behavior. Second, I will introduce a new deep learning method to forecast human social behavior. The causality behind human behavior is an interplay between both observable and non-observable cues (e.g., intentions). For instance, when humans walk into crowded urban environments such as a busy train terminal, they obey a large number of (unwritten) common sense rules and comply with social conventions. They typically avoid crossing groups and keep a personal distance to their surrounding. I will present detailed insights on how to learn these interactions from millions of trajectories. I will describe a new recurrent neural network that can jointly reason on correlated sequences and forecast human trajectories in crowded scenes. It opens new avenues of research in learning the causalities behind the world we observe. I will conclude my talk by mentioning some ongoing work in applying these techniques to social robots, and the future generations of smart hospitals.
Further Information:
Alexandre Alahi is currently a research scientist at Stanford University and received his PhD from EPFL in Switzerland (nominated for the EPFL PhD prize). His research enables machines to perceive the world and make decisions in the context of transportation problems and built environments at all scales. His work is centered around understanding and predicting human social behavior at all scale with multi-modal data. He has worked on the theoretical and practical applications of socially-aware Artificial Intelligence. He was awarded the Swiss NSF early and advanced researcher grants for his work on predicting human social behavior. He won the CVPR 2012 Open Source Award for his work on Retina-inspired image descriptor, and the ICDSC 2009 Challenge Prize for his sparsity driven algorithm that has tracked more than 100 million pedestrians in train terminals. His research has been covered internationally by BBC, Euronews, Wall street journal, as well as national news in the US and Switzerland. Finally, he co-founded the startup Visiosafe, won several startup competitions, and was elected as the Top 20 Swiss Venture leaders in 2010.
Saverio Murgia
(Horus Technology)
Designing a smart wearable camera for blind and visually impaired people

Please LOG IN to view the video.
Date: November 15, 2016
Description:
Horus Technology was founded in July 2014 with the goal of creating a smart wearable camera for blind and visually impaired people featuring intelligent algorithms that could understand the environment around its user and describe it out loud. Two years later, Horus has a working prototype being tested by a number of blind people in Europe and North America. Harnessing the power of portable GPUs, stereo vision and deep learning algorithms, Horus can read texts in different languages, learn and recognize faces, objects and identify obstacles. Designing a wearable device, we had to face a number of challenges and difficult choices. We will describe our systems, our design choices for both software and hardware and we will end with a small demo of Horus capabilities.
Further Information:
Founder and CEO of Horus Technology, Saverio Murgia is passionate about machine learning, computer vision and robotics. Both engineer and entrepreneur, in 2015 he obtained a double MSc/MEng in Advanced Robotics from the Ecole Centrale de Nantes and the University of Genoa. He also owns a degree in management from ISICT and a BSc in Biomedical Engineering from the University of Genoa. Before founding Horus Technology, Saverio was visiting researcher at EPFL and the Italian Institute of Technology.
Emanuele Mandelli
(InVisage Technologies)
Quantum dot-based image sensors for cutting-edge commercial multispectral cameras

Please LOG IN to view the video.
Date: November 9, 2016
Description:
his work presents the development of a quantum dot-based photosensitive film engineered to be integrated on standard CMOS process wafers. It enables the design of exceptionally high performance, reliable image sensors. Quantum dot solids absorb light much more rapidly than typical silicon-based photodiodes do, and with the ability to tune the effective material bandgap, quantum dot-based imagers enable higher quantum efficiency over extended spectral bands, both in the Visible and IR regions of the spectrum. Moreover, a quantum dot-based image sensor enables desirable functions such as ultra-small pixels with low crosstalk, high full well capacity, global shutter and wide dynamic range at a relatively low manufacturing cost. At InVisage, we have optimized the manufacturing process flow and are now able to produce high-end image sensors for both visible and NIR in quantity.
Further Information:
Emanuele is Vice President of Engineering at InVisage Technologies, an advanced materials and camera platform company based in Menlo Park, CA. He has more than 20 years of experience with image sensors, X-ray, and particle physics detectors. He began his career at the Lawrence Berkeley National Laboratory, where he designed integrated circuits for high energy physics and helped deliver the pixel readout modules for the Atlas CERN inner detector in the Large Hadron Collider that confirmed the Higgs boson theory. He then joined AltaSens, an early stage startup company spun off from Rockwell Scientific and later acquired by JVC Kenwood, where he designed high-end CMOS image sensors for cinematographers, television broadcasters and filmmakers. He has been a reviewer for the NSS-MIC conference and he is author of numerous papers and image sensor-related patents. He holds a PhD, MS and BS in Electrical Engineering and Computer Science from the University of Pavia, Italy.
Christy Fernandez-Cull
(MIT Lincoln Laboratory)
Smart pixel imaging with computational arrays

Please LOG IN to view the video.
Date: November 1, 2016
Description:
This talk will review architectures for computational imaging arrays where algorithms and cameras are co-designed. The talk will focus on novel digital readout integrated circuits (DROICs) that achieve snapshot on-chip high dynamic range and object tracking where most commercial systems require a multiple exposure acquisition.
Further Information:
Christy Fernandez-Cull received her M.S. and Ph.D. in Electrical and Computer Engineering from Duke University. She has worked at MIT Lincoln Laboratory as a member of the technical staff in the Sensor Technology and System Applications group and is a research affiliate with the Camera Culture Group at MIT Media Laboratory. She is an active member in the OSA, SPIE, IEEE, and SHPE. Christy has worked on and published papers pertaining to computational imager design, coded aperture systems, photonics, compressive holography, weather satellites, periscope systems and millimeter-wave holography systems. Her interests include research and development efforts in science and technology, volunteering to stimulate science, technology, engineering, and math disciplines in K-12 grades, and keeping up-to-date with advances in science policy.
Matthew O'Toole
(Stanford University)
Optical Probing for Analyzing Light Transport

Please LOG IN to view the video.
Date: October 18, 2016
Description:
Active illumination techniques enable self-driving cars to detect and avoid hazards, optical microscopes to see deep into volumetric specimens, and light stages to digitally capture the shape and appearance of subjects. These active techniques work by using controllable lights to emit structured illumination patterns into an environment, and sensors to detect and process the light reflected back in response. Although such techniques confer many unique imaging capabilities, they often require long acquisition and processing times, rely on predictive models for the way light interacts with a scene, and cease to function when exposed to bright ambient sunlight.
In this talk, we introduce a generalized form of active illumination—known as optical probing—that provides a user with unprecedented control over which light paths contribute to a photo. The key idea is to project a sequence of illumination patterns onto a scene, while simultaneously using a second sequence of mask patterns to physically block the light received at select sensor pixels. This all-optical technique enables RAW photos to be captured in which specific light paths are blocked, attenuated, or enhanced. We demonstrate experimental probing prototypes with the ability to (1) record live direct-only or indirect-only video streams of a scene, (2) capture the 3D shape of objects in the presence of complex transport properties and strong ambient illumination, and (3) overcome the multi-path interference problem associated with time-of-flight sensors.
Further Information:
Matthew O’Toole is a Banting Postdoctoral Fellow at Stanford’s Computational Imaging group. He received his Ph.D. degree from the University of Toronto in 2016 on research related to active illumination and light transport. He organized the IEEE 2016 International Workshop on Computational Cameras and Displays, and was a visiting student at the MIT Media Lab’s Camera Culture group in 2011. His work was the recipient of two “Best Paper Honorable Mention” awards at CVPR 2014 and ICCV 2007, and two “Best Demo” awards at CVPR 2015 and ICCP 2015.
Brian Cabral
(Facebook)
The Soul of a New Camera: The design of Facebook's Surround Open Source 3D-360 video camera

Please LOG IN to view the video.
Date: October 12, 2016
Description:
Around a year ago we set out to create an open-source reference design for a 3D-360 camera. In nine months, we had designed and built the camera, and published the specs and code. Our team leveraged a series of maturing technologies in this effort. Advances and availability in sensor technology, 20+ of computer vision algorithm development, 3D printing, rapid design photo-typing and computation photography allowed our team to move extremely fast. We will delve into the roles each of these technologies played in the designing of the camera, giving an overview of the system components and discussing the tradeoffs made during the design process. The engineering complexities and technical elements of 360 stereoscopic video capture will be discussed as well. We will end with some demos of the system and its output.
Further Information:
Brian Cabral is Director of Engineering at Facebook specializing in computational photography, computer vision, and computer graphics. He is the holder of numerous patents (filed and issued) and leads the Surround 360 VR camera team. He has published a number of diverse papers in the area of computer graphics and imaging including the pioneering Line Integral Convolution algorithm.
Bernard Kress
(Microsoft Research)
Human-centric optical design: a key for next generation AR and VR optics

Please LOG IN to view the video.
Date: October 4, 2016
Description:
Bernard has made over the past two decades significant scientific contributions as an engineer, researcher, associate professor, consultant, instructor, and author.
He has been instrumental in developing numerous optical sub-systems for consumer electronics and industrial products, generating IP, teaching and transferring technological solutions to industry. Application sectors include laser materials processing, optical anti-counterfeiting, biotech sensors, optical telecom devices, optical data storage, optical computing, optical motion sensors, digital image projection, displays, depth map sensors, and more recently head-up and head mounted displays (smart glasses, AR and VR).
He is specifically involved in the field of micro-optics, wafer scale optics, holography and nanophotonics.
Bernard has 32 patents granted worldwide and published numerous books and book chapters on micro-optics. He is a short course instructor for the SPIE and was involved in numerous SPIE conferences as technical committee member and conference co-chair.
He is an SPIE fellow since 2013 as has been recently elected to the board of Directors of SPIE.
Bernard joined Google [X] Labs. in 2011 as the Principal Optical Architect, and is now Partner Optical Architect at Microsoft Corp, on the Hololens project.
Further Information:
The ultimate wearable display is an information device that people can use all day. It should be as forgettable as a pair of glasses or a watch, but more useful than a smart phone. It should be small, light, low-power, high-resolution and have a large field of view (FOV). Oh, and one more thing, it should be able to switch from VR to AR.
These requirements pose challenges for hardware and, most importantly, optical design. In this talk, I will review existing AR and VR optical architectures and explain why it is difficult to create a small, light and high-resolution display that has a wide FOV. Because comfort is king, new optical designs for the next-generation AR and VR system should be guided by an understanding of the capabilities and limitations of the human visual system.
Gal Chechik
(Google & Bar-Illan University)
Machine learning for large-scale image understanding

Please LOG IN to view the video.
Date: May 11, 2016
Description:
The recent progress in recognizing visual objects and annotating images has been driven by super-rich models and massive datasets. However, machine vision models still have a very limited ‘understanding’ of images, rendering them brittle when attempting to generalize to unseen examples. I will describe recent efforts to improve the robustness and accuracy of systems for annotating and retrieving images, first, by using structure in the space of images and fusing various types of information about image labels, and second, by matching structures in visual scenes to structures in their corresponding language descriptions or queries. We apply these approaches to billions of queries and images, to improve search and annotation of public images and personal photos.
Further Information:
Gal Chechik is a professor at the Gonda brain research center, Bar-Ilan University, Israel, and a senior research scientist at Google. His work focuses on learning in brains and in machines. Specifically, he studies the principles governing representation and adaptivity at multiple timescales in the brain, and algorithms for training computers to represent signals and learn from examples. Gal earned his PhD in 2004 from the Hebrew University of Jerusalem developing machine learning and probabilistic methods to understand the auditory neural code. He then studied computational principles regulating molecular cellular pathways as a postdoctoral researcher at the CS dept in Stanford. In 2007, he joined Google research as a senior research scientist, developing large-scale machine learning algorithms for machine perception. Since 2009, he heads the computational neurobiology lab at BIU and was appointed an associate professor in 2013. He was awarded a Fulbright fellowship, a complexity scholarship and the Israeli national Alon fellowship.
Matthias Niessner
(Stanford University)
Interactive 3D: Static and Dynamic Environment Capture in Real Time

Please LOG IN to view the video.
Date: May 4, 2016
Description:
In recent years, commodity 3D sensors have become easily and widely available. These advances in sensing technology have inspired significant interest in using captured 3D data for mapping and understanding 3D environments. In this talk, I will show how we can now easily obtain 3D reconstructions of static and dynamic environments in an interactive manner, and how we can process and utilize the data efficiently in real time on modern graphics hardware. In a concrete example application for 3D reconstruction, I will talk about facial reenactment where we use an intermediate 3D reconstruction to interactively edit videos in real time.
Further Information:
Matthias Niessner is a visiting assistant professor at Stanford University. Previous to his appointment at Stanford, he earned his PhD from the University of Erlangen-Nuremberg, Germany under the supervision of Günther Greiner. His research focuses on different fields of computer graphics and computer vision, including the reconstruction and semantic understanding of 3D scene environments.
More Information: http://www.
Brian Wandell
(Stanford University)
Learning the image processing pipeline

Date: April 27, 2016
Description:
Many creative ideas are being proposed for image sensor designs, and these may be useful in applications ranging from consumer photography to computer vision. To understand and evaluate each new design, we must create a corresponding image-processing pipeline that transforms the sensor data into a form that is appropriate for the application. The need to design and optimize these pipelines is time-consuming and costly. I explain a method that combines machine learning and image systems simulation that automates the pipeline design. The approach is based on a new way of thinking of the image-processing pipeline as a large collection of local linear filters. Finally, I illustrate how the method has been used to design pipelines for consumer photography and mobile imaging.
Further Information:
Brian A. Wandell is the first Isaac and Madeline Stein Family Professor. He joined the Stanford Psychology faculty in 1979 and is a member, by courtesy, of Electrical Engineering and Ophthalmology. Wandell is the founding director of Stanford’s Center for Cognitive and Neurobiological Imaging, and a Deputy Director of the Stanford Neuroscience Institute. He is the author of the vision science textbook Foundations of Vision. His research centers on vision science, spanning topics from visual disorders, reading development in children, to digital imaging devices and algorithms for both magnetic resonance imaging and digital imaging. In 1996, together with Prof. J. Goodman, Wandell founded Stanford’s Center for Image Systems Engineering which evolved into SCIEN in 2003.
V. Michael Bove, Jr.
(MIT Media Lab)
Reconstructing and Augmenting Reality with Holographic Displays

Please LOG IN to view the video.
Date: April 26, 2016
Description:
From the popular press to possibly-questionable crowdfunding proposals, “holographic” displays seem to be everywhere this year. But are any of these actually holographic? And if not, what is a real holographic display? In this talk I explain why true holographic displays are not as far from deployment as one might think, despite their massive electro-optical and computational requirements, and describe how they will provide the ultimate in interactive visual user experience.
Further Information:
V. Michael Bove, Jr. holds an S.B.E.E., an S.M. in Visual Studies, and a Ph.D. in Media Technology, all from the Massachusetts Institute of Technology, where he is currently head of the Object-Based Media Group at the Media Lab. He is the author or co-author of over 90 journal or conference papers on digital television systems, video processing hardware/software design, multimedia, scene modeling, visual display technologies, and optics. He holds patents on inventions relating to video recording, hardcopy, interactive television, medical imaging, and holographic displays, and has been a member of several professional and government committees. He is co-author with the late Stephen A. Benton of the book Holographic Imaging (Wiley, 2008). He is on the Board of Editors of the SMPTE Motion Imaging Journal, as well as an Education Director for SMPTE. He served as general chair of the 2006 IEEE Consumer Communications and Networking Conference (CCNC’06) and co-chair of the 2012 International Symposium on Display Holography. Bove is a fellow of the SPIE and of the Institute for Innovation, Creativity, and Capital. He was a founder of and technical advisor to WatchPoint Media, Inc. and served as technical advisor to One Laptop per Child (creators of the XO laptop for children in developing countries).
Ramesh Raskar
(MIT Media Lab)
Extreme Computational Photography

Please LOG IN to view the video.
Date: April 20, 2016
Description:
The Camera Culture Group at the MIT Media Lab aims to create a new class of imaging platforms. This talk will discuss three tracks of research: femto photography, retinal imaging, and 3D displays.
Femto Photography consists of femtosecond laser illumination, picosecond-accurate detectors and mathematical reconstruction techniques allowing researchers to visualize propagation of light. Direct recording of reflected or scattered light at such a frame rate with sufficient brightness is nearly impossible. Using an indirect ‘stroboscopic’ method that records millions of repeated measurements by careful scanning in time and viewpoints we can rearrange the data to create a ‘movie’ of a nanosecond long event. Femto photography and a new generation of nano-photography (using ToF cameras) allow powerful inference with computer vision in presence of scattering.
EyeNetra is a mobile phone attachment that allows users to test their own eyesight. The device reveals corrective measures thus bringing vision to billions of people who would not have had access otherwise. Another project, eyeMITRA, is a mobile retinal imaging solution that brings retinal exams to the realm of routine care, by lowering the cost of the imaging device to a 10th of its current cost and integrating the device with image analysis software and predictive analytics. This provides early detection of Diabetic Retinopathy that can change the arc of growth of the world’s largest cause of blindness.
Finally the talk will describe novel lightfield cameras and lightfield displays that require a compressive optical architecture to deal with high bandwidth requirements of 4D signals.
Further Information:
Richard Baraniuk
(Rice University)
A Probabilistic Theory of Deep Learning

Please LOG IN to view the video.
Date: April 15, 2016
Description:
A grand challenge in machine learning is the development of computational algorithms that match or outperform humans in perceptual inference tasks that are complicated by nuisance variation. For instance, visual object recognition involves the unknown object position, orientation, and scale in object recognition while speech recognition involves the unknown voice pronunciation, pitch, and speed. Recently, a new breed of deep learning algorithms have emerged for high-nuisance inference tasks that routinely yield pattern recognition systems with near- or super-human capabilities. But a fundamental question remains: Why do they work? Intuitions abound, but a coherent framework for understanding, analyzing, and synthesizing deep learning architectures has remained elusive. We answer this question by developing a new probabilistic framework for deep learning based on the Deep Rendering Model: a generative probabilistic model that explicitly captures latent nuisance variation. By relaxing the generative model to a discriminative one, we can recover two of the current leading deep learning systems, deep convolutional neural networks and random decision forests, providing insights into their successes and shortcomings, a principled route to their improvement, and new avenues for exploration.
Further Information:
Richard G. Baraniuk is the Victor E. Cameron Professor of Electrical and Computer Engineering at Rice University. His research interests lie in new theory, algorithms, and hardware for sensing, signal processing, and machine learning. He is a Fellow of the IEEE and AAAS and has received national young investigator awards from the US NSF and ONR, the Rosenbaum Fellowship from the Isaac Newton Institute of Cambridge University, the ECE Young Alumni Achievement Award from the University of Illinois, the Wavelet Pioneer and Compressive Sampling Pioneer Awards from SPIE, the IEEE Signal Processing Society Best Paper Award, and the IEEE Signal Processing Society Technical Achievement Award. His work on the Rice single-pixel compressive camera has been widely reported in the popular press and was selected by MIT Technology Review as a TR10 Top 10 Emerging Technology. For his teaching and education projects, including Connexions (cnx.org) and OpenStax (openstaxcollege.org), he has received the C. Holmes MacDonald National Outstanding Teaching Award from Eta Kappa Nu, the Tech Museum of Innovation Laureate Award, the Internet Pioneer Award from the Berkman Center for Internet and Society at Harvard Law School, the World Technology Award for Education, the IEEE-SPS Education Award, the WISE Education Award, and the IEEE James H. Mulligan, Jr. Medal for Education.
Nicholas Frushour and Michael Carney
(Canon Mixed Reality)
Practical uses of mixed reality in a manufacturing workflow

Please LOG IN to view the video.
Date: April 13, 2016
Description:
There are strong use-cases for augmented and mixed reality outside of entertainment. One particularly practical use is in the manufacturing industry. It’s an industry with well-established workflows and product cycles, but also known problems and pinch-points. In this talk, we will walk through each step of the Product Lifecycle Management workflow and discuss how mixed reality is helping manufacturers each step of the way. We will also cover the background of mixed reality and the key differences between AR, MR, and VR.
Further Information:
Nicholas Frushour is a software engineer at Canon and has been working with mixed reality for over 3 years.
Michael Carney is a Visualization Consultant specializing in Mixed Reality at Canon USA. He is proficient in not only the technology of New Media but the culture in which it is immersed, how it is used, and the direction in which trends are heading. He is now applying Mixed Reality conventions to enterprise use cases, such as Design and Manufacturing, High Risk Training and Education.
Nicolas Pégard
(UC Berkeley)
Compressive light-field microscopy for 3D functional imaging of the living brain

Please LOG IN to view the video.
Date: April 7, 2016
Description:
We present a new microscopy technique for 3D functional neuroimaging in live brain tissue. The device is a simple light field fluorescence microscope allowing full volume acquisition in a single shot and can be miniaturized into a portable implant. Our computational methods first rely on spatial and temporal sparsity of fluorescence signals to identify and precisely localize neurons. We compute for each neuron a unique pattern, the light-field signature, that accounts for the effects of optical scattering and aberrations. The technique then yields a precise localization of active neurons and enables quantitative measurement of fluorescence with individual neuron spatial resolution and at high speeds, all without ever reconstructing a volume image. Experimental results are shown on live Zebrafish.
Further Information:
Nicolas Pégard received his B.S. in Physics from Ecole Polytechnique (France) in 2009, and his Ph.D. in Electrical Engineering at Princeton University under Prof. Fleischer in 2014. He is now a postdoctoral researcher at U.C. Berkeley under the supervision of Prof. H.Adesnik (Molecular and Cell Biology dpt.) and Prof. L.Waller. (Electrical Engineering and Computer Science dpt.) His main research interests are in optical system design and computational microscopy. He is currently developing all-optical methods to observe and control the activity of individual neurons in deep, live brain tissue with high spatial and temporal resolution.
Haricharan Lakshman
(Stanford University)
Data Representations for Cinematic Virtual Reality

Please LOG IN to view the video.
Date: March 30, 2016
Description:
Historically, virtual reality (VR) with head-mounted displays (HMDs) is associated with computer-generated content and gaming applications. However, recent advances in 360 degree cameras facilitate omnidirectional capture of real-world environments to create content to be viewed on HMDs – a technology referred to as cinematic VR. This can be used to immerse the user, for instance, in a concert or sports event. The main focus of this talk will be on data representations for creating such immersive experiences.
In cinematic VR, videos are usually represented in a spherical format to account for all viewing directions. To achieve high-quality streaming of such videos to millions of users, it is crucial to consider efficient representations for this type of data, in order to maximize compression efficiency under resource constraints, such as the number of pixels and bitrate. We formulate the choice of representation as a multi-dimensional, multiple choice knapsack problem and show that the resulting representations adapt well to varying content.
Existing cinematic VR systems update the viewports according to head rotation, but do not support head translation or focus cues. We propose a new 3D video representation, referred to as depth augmented stereo panorama, to address this issue. We show that this representation can successfully induce head-motion parallax in a predefined operating range, as well as generate light fields across the observer’s pupils, suitable for using with emerging light field HMDs.
Further Information:
Haricharan Lakshman is a Visiting Assistant Professor in the Electrical Engineering Department at Stanford University since Fall 2014. His research interests are broadly in Image Processing, Visual Computing and Communications. He received his PhD in Electrical Engineering from the Technical University of Berlin, Germany, in Jan 2014, while working as a Researcher in the Image Processing Group of Fraunhofer HHI. Between 2011-2012, he was a Visiting Researcher at Stanford. He was awarded the IEEE Communications Society MMTC Best Journal Paper Award in 2013, and was a finalist for the IEEE ICIP Best Student Paper Award in 2010 and 2012
More Information: http://web.stanford.edu/~harilaks/
Bas Rokers
(University of Wisconsin- Madison)
Fundamental and individual limitations in the perception of 3D motion: Implications for Virtual Reality

Please LOG IN to view the video.
Date: March 23, 2016
Description:
Neuroscientists have extensively studied motion and depth perception, and have provided a good understanding of the underlying neural mechanisms. However, since these mechanisms are frequently studied in isolation, their interplay remains poorly understood. In fact, I will introduce a number of puzzling deficits in the perception of 3D motion in this talk. Given the advent of virtual reality (VR) and the need to provide a compelling user experience, it is imperative that we understand the factors that determine the sensitivity and limitations of 3D motion perception.
I will present recent work from our lab which shows that fundamental as well as individual limitations in the processing of retinal information cause specific deficits in the perception of 3D motion. Subsequently, I will discuss the potential of extra-retinal (head motion) information to overcome some of these limitations. Finally, I will discuss how individual variability in the sensitivity to 3D motion predicts the propensity for simulator sickness.
Our research sheds light on the interplay of the neural mechanisms that underlie perception, and accounts for the visual system’s sensitivity to 3D motion. Our results provide specific suggestions to improve VR technology and bring virtual reality into the mainstream
Further Information:
Bas Rokers is Associate Professor in the Department of Psychology and a member of the McPherson Eye Research Institute at the University of Wisconsin – Madison. His work in visual perception aims to uncover the neural basis of binocular perception, visual disorders and brain development. In 2015 he was a Visiting Professor in the Department of Brain and Cognitive Sciences at MIT, and he can currently be seen in the National Geographic television series Brain Games on Netflix.
More Information: http://vision.psych.wisc.edu/
Colin Sheppard
(Italian Institute of Technology)
Confocal microscopy: past, present and future

Please LOG IN to view the video.
Date: March 16, 2016
Description:
Confocal microscopy has made a dramatic impact on biomedical imaging, in particular, but also in other areas such as industrial inspection. Confocal microscopy can image in 3D, with good resolution, into living biological cells and tissue. I have had the good fortune to be involved with the development of confocal microscopy over the last 40 years. Other techniques have been introduced that overcome some of its limitations, but still it is the preferred choice in many cases. And new developments in confocal microscopy, such as focal modulation microscopy, and image-scanning microscopy, can improve its performance in terms of penetration depth, resolution and signal level.
Further Information:
Colin Sheppard is Senior Scientist in the Nanophysics Department at the Italian Institute of Technology, Genoa. He is a Visiting Miller Professor at UC-Berkeley. He obtained his PhD degree from University of Cambridge. Previously he has been Professor in the Departments of Bioengineering, Biological Sciences and Diagnostic Radiology at the National University of Singapore, Professor of Physics at the University of Sydney, and University Lecturer in Engineering Science at the University of Oxford. He developed an early confocal microscope, the first with computer control and storage (1983), launched the first commercial confocal microscope (1982), published the first scanning multiphoton microscopy images (1977), proposed two-photon fluorescence and CARS microscopy (1978), and patented scanning microscopy using Bessel beams (1977). In 1988, he proposed scanning microscopy using a detector array with pixel reassignment, now known as image scanning microscopy.
Doug James
(Stanford University)
Physics-based Animation Sound: Progress and Challenges

Please LOG IN to view the video.
Date: February 24, 2016
Description:
Decades of advances in computer graphics have made it possible to convincingly animate a wide range of physical phenomena, such as fracturing solids and splashing water. Unfortunately, our visual simulations are essentially “silent movies” with sound added as an afterthought. In this talk, I will describe recent progress on physics-based sound synthesis algorithms that can help simulate rich multi-sensory experiences where graphics, motion, and sound are synchronized and highly engaging. I will describe work on specific sound phenomena, and highlight the important roles played by precomputation techniques, and reduced-order models for vibration, radiation, and collision processing.
More information: https://profiles.stanford.edu/doug-james
Further Information:
Doug L. James is a Full Professor of Computer Science at Stanford University since June 2015, and was previously an Associate Professor of Computer Science at Cornell University from 2006-2015. He holds three degrees in applied mathematics, including a Ph.D. in 2001 from the University of British Columbia. In 2002 he joined the School of Computer Science at Carnegie Mellon University as an Assistant Professor, before joining Cornell in 2006. His research interests include computer graphics, computer sound, physically based animation, and reduced-order physics models. Doug is a recipient of a National Science Foundation CAREER award, and a fellow of both the Alfred P. Sloan Foundation and the Guggenheim Foundation. He recently received a Technical Achievement Award from The Academy of Motion Picture Arts and Sciences for “Wavelet Turbulence,” and the Katayanagi Emerging Leadership Prize from Carnegie Mellon University and Tokyo University of Technology. He was the Technical Papers Program Chair of ACM SIGGRAPH 2015.
Tokuyuki Honda
(Canon Healthcare Optics Research Laboratory)
Medical innovation with minimally-invasive optical imaging and image-guided robotics

Please LOG IN to view the video.
Date: February 10, 2016
Description:
A fundamental challenge of healthcare is to meet the increasing demand for quality of care while minimizing the cost. We aim to make meaningful contributions to solve the challenge by creating innovative, minimally-invasive imaging and image-guided robotics technologies collaborating with leading research hospitals and other stake holders. In this lecture, we present technologies under development such as ultra-miniature endoscope, image-guided needle robot, and scanning-laser ophthalmoscope, and discuss how we can potentially address unmet clinical needs.
Further Information:
Dr. Tokuyuki (Toku) Honda is a Senior Fellow in Canon U.S.A., Inc., and has been the head of Healthcare Optics Research Laboratory in Cambridge, MA, since its creation in 2013. The mission of the laboratory is to grow the seeds of new medical business collaborating with research hospitals in Boston. Dr. Honda received his Ph.D. in Applied Physics from University of Tokyo, and worked in 1996-1998 as a Postdoctoral Fellow in the group of Professor Lambertus Hesselink at the Department of Electrical Engineering, Stanford University.
Edoardo Charbon
(Delft University of Technology, Netherlands)
The Photon Counting Camera: a Versatile Tool for Quantum Imaging and Quantitative Photography

Please LOG IN to view the video.
Date: February 3, 2016
Description:
The recent availability of miniaturized photon counting pixels in standard CMOS processes has paved the way to the introduction of photon counting in low-cost image sensors. The uses of these devices are multifold, ranging from LIDARs to Raman spectroscopy, from fluorescence lifetime to molecular imaging, from super-resolution microscopy to data security and encryption.
In this talk we describe the technology at the core of this revolution: single-photon avalanche diodes (SPADs) and the architectures enabling SPAD based image sensors. We discuss tradeoffs and design trends, often referring to specific sensor chips, new materials for extended sensitivity, and 3D integration for ultra-high speed operation. We also discuss the recent impact of SPAD cameras in metrology, robotics, mobile phones, and consumer electronics.
Further Information:
Edoardo Charbon (SM’10) received the Diploma from ETH Zürich in 1988, the M.S. degree from UCSD in 1991, and the Ph.D. degree from UC-Berkeley in 1995, all in Electrical Engineering and EECS. From 1995 to 2000, he was with Cadence Design Systems, where he was the architect of the company’s intellectual property protection and on-chip information hiding tools; from 2000 to 2002, he was Canesta Inc.’s Chief Architect, leading the design of consumer time-of-flight 3D cameras; Canesta was sold to Microsoft Corp. in 2010. Since November 2002, he has been a member of the Faculty of EPFL in Lausanne, Switzerland and in Fall 2008 he joined the Faculty of TU Delft, Chair of VLSI Design, succeeding Patrick Dewilde. Dr. Charbon is the initiator and coordinator of MEGAFRAME and SPADnet, two major European projects for the creation of CMOS photon counting image sensors in biomedical diagnostics. He has published over 250 articles in peer-reviewed technical journals and conference proceedings and two books, and he holds 18 patents.
Ronnier Luo
(Zhejiang University (China) and Leeds University (UK))
The Impact of New Developments of Colour Science on Imaging Technology

Please LOG IN to view the video.
Date: January 27, 2016
Description:
Colour science has been widely used in the imaging industry. This talk will introduce some new development areas of colour science. Among them, three areas related to imaging technology will be focused upon: LED lighting quality, CIE 2006 colorimetry, comprehensive colour appearance modelling. LED lightings has recently making great advances in the illumination industry. It has a unique feature on adjustability, capable of tuning its spectral for different applications. A tuneable LED system for image sensor applications such as white balance, calibration and characterisation will be introduced. It will be demonstrated and its performance will be reported.
Further Information:
Ronnier is a Global Expert Professor at Zhejiang University (China), and Professor of Colour and Imaging Science at Leeds University (UK). He is also the Vice-President of the International Commission on Illumination (CIE). He received his PhD in 1986 at the University of Bradford in the field of colour science. He has published over 500 scientific articles in the fields of colour science, imaging technology and LED illumination. He is a Fellow of the Society for Imaging Science and Technology, and the Society of Dyers and Colourists. He is also the Chief Editor of the Encyclopaedia of Colour Science and Technology published by Springer in December 2015.
Dominik Michels
(Stanford University)
Complex Real-Time High-Fidelity Simulations in Visual Computing

Please LOG IN to view the video.
Date: January 20, 2016
Description:
Whereas in the beginning of visual computing, mostly rudimentary physical models or rough approximations were employed to allow for real-time simulations, several modern applications of visual computing and related disciplines require fast and highly accurate numerical simulations at once; for example interactive computer-aided design and manufacturing processes, digital modeling and fabrication, training simulators, and intelligent robotic devices. This talk covers a selection of high fidelity algorithms and techniques for the fast simulation of complex scenarios; among others symbolic-numeric coupling, structure preserving integration, and timescale segmentation. Their powerful behavior is presented on a broad spectrum of concrete applications in science and industry from the simulation of biological microswimmers and molecular structures to the optimization of consumer goods like toothbrushes and shavers.
Further Information:
Dominik L. Michels is a visiting professor in the Computer Science Department at Stanford University and runs the High Fidelity Algorithmics Group at the Max Planck Center for Visual Computing and Communication since fall 2014. Previously, he was a Postdoc in Computing and Mathematical Sciences at Caltech. He studied Computer Science and Physics at University of Bonn and B-IT, from where he received a B.Sc. in Computer Science and Physics in 2011, a M.Sc. in Computer Science in 2013, and a Ph.D. in Mathematics and Natural Sciences on Stiff Cauchy Problems in Scientific Computing in early 2014. He was a visiting scholar at several international institutions, among others at JINR in Moscow, and at MIT and Harvard University in Cambridge, MA. His research comprises both fundamental and applied aspects on computational mathematics and physics addressing open research questions in algorithmics, computer algebra, symbolic-numeric methods, and mathematical modeling to solve practically relevant problems in scientific and visual computing. In the non-academic context, he works as a research partner in the sections of high-technology and consumer goods.
More Information: http://cs.stanford.edu/~michels/
Laura Waller
(Berkeley)
3D Computational Microscopy

Please LOG IN to view the video.
Date: January 14, 2016
Description:
This talk will describe new computational microscopy methods for high pixel-count 3D images. We describe two setups employing illumination-side and detection-side aperture coding of angle (Fourier) space for capturing 4D phase-space (e.g. light field) datasets with fast acquisition times. Using a multi-slice forward model, we develop efficient 3D reconstruction algorithms for both incoherent and coherent imaging models, with robustness to scattering. Experimentally, we achieve real-time 3D intensity and phase capture with high resolution across a large volume. Such computational approaches to optical microscopy add significant new capabilities to commercial microscopes without significant hardware modification.
Further Information:
Laura Waller is an Assistant Professor at UC Berkeley in the Department of Electrical Engineering and Computer Sciences (EECS) and a Senior Fellow at the Berkeley Institute of Data Science (BIDS), with affiliations in Bioengineering, QB3 and Applied Sciences & Technology. She was a Postdoctoral Researcher and Lecturer of Physics at Princeton University from 2010-2012 and received B.S., M.Eng., and Ph.D. degrees from the Massachusetts Institute of Technology (MIT) in 2004, 2005, and 2010, respectively. She is a Moore Foundation Data-Driven Investigator, Bakar fellow, NSF CAREER awardee and Packard Fellow.
More Information: http://www.laurawaller.com/
Audrey (Ellerbee) Bowden
(Stanford University)
Lighting the Path to Better Healthcare

Please LOG IN to view the video.
Date: January 7, 2016
Description:
Cancer. Infertility. Hearing loss. Each of these phrases can bring a ray of darkness into an otherwise happy life. The Stanford Biomedical Optics group, led by Professor Audrey Bowden, aims to develop and deploy novel optical technologies to solve interdisciplinary challenges in the clinical and basic sciences. In short, we use light to image life — and in so doing, illuminate new paths to better disease diagnosis, management and treatment. In this talk, I will discuss our recent efforts to design, fabricate and/or construct new hardware, software and systems-level biomedical optics tools to attack problems in skin cancer, bladder cancer, hearing loss and infertility. Our efforts span development of new fabrication techniques for 3D tissue-mimicking phantoms, new strategies for creating large mosaics and 3D models of biomedical data, machine-learning classifiers for automated detection of disease, novel system advances for multiplexed optical coherence tomography and low-cost technologies for point-of-care diagnostics.
More Information: http://sbo.
Further Information:
Audrey K (Ellerbee) Bowden is an Assistant Professor of Electrical Engineering at Stanford University. She received her BSE in EE from Princeton University, her PhD in BME from Duke University and completed her postdoctoral training in Chemistry and Chemical Biology at Harvard University. During her career, Dr. Bowden served as an International Fellow at Ngee Ann Polytechnic in Singapore and as a Legislative Assistant in the United States Senate through the AAAS Science and Technology Policy Fellows Program sponsored by the OSA and SPIE. She is a member of the OSA, a Senior Member of SPIE and is the recipient of numerous awards, including the Air Force Young Investigator Award, the NSF Career Award and the Hellman Faculty Scholars Award. She currently serves as Associate Editor of IEEE Photonics Journal. Her research interests include biomedical optics, microfluidics, and point of care diagnostics.
- Daniel Palanker » Electronic augmentation of body functions: progress in electro-neural interfaces
- Alexandre Alahi » Towards Socially-aware AI
- Saverio Murgia » Designing a smart wearable camera for blind and visually impaired people
- Emanuele Mandelli » Quantum dot-based image sensors for cutting-edge commercial multispectral cameras
- Christy Fernandez-Cull » Smart pixel imaging with computational arrays
- Matthew O'Toole » Optical Probing for Analyzing Light Transport
- Brian Cabral » The Soul of a New Camera
- Bernard Kress » Human-centric optical design
- Gal Chechik » Machine learning for large-scale image understanding
- Matthias Niessner » Interactive 3D: Static and Dynamic Environment Capture in Real Time
- Brian Wandell » Learning the image processing pipeline
- V. Michael Bove, Jr. » Reconstructing and Augmenting Reality with Holographic Displays
- Ramesh Raskar » Extreme Computational Photography
- Richard Baraniuk » A Probabilistic Theory of Deep Learning
- Nicholas Frushour and Michael Carney » Practical uses of mixed reality in a manufacturing workflow
- Nicolas Pégard » Compressive light-field microscopy for 3D functional imaging of the living brain
- Haricharan Lakshman » Data Representations for Cinematic Virtual Reality
- Bas Rokers » Fundamental and individual limitations in the perception of 3D motion
- Colin Sheppard » Confocal microscopy: past, present and future
- Doug James » Physics-based Animation Sound
- Tokuyuki Honda » Medical innovation with minimally-invasive optical imaging and image-guided robotics
- Edoardo Charbon » The Photon Counting Camera
- Ronnier Luo » Impact of New Developments of Colour Science on Imaging Technology
- Dominik Michels » Real-Time High-Fidelity Simulations in Visual Computing
- Laura Waller » 3D Computational Microscopy
- Audrey (Ellerbee) Bowden » Lighting the Path to Better Healthcare
SCIEN Colloquia 2015
Ofer Levi
(University of Toranto)
Portable optical brain imaging

Please LOG IN to view the video.
Date: November 18, 2015
Description:
Optical techniques are widely used in clinical settings and in biomedical research to interrogate bio-molecular interactions and to evaluate tissue dynamics. Miniature integrated optical systems for sensing and imaging can be portable, enabling long-term imaging studies in living tissues. We present the development of a compact multi-modality optical neural imaging system, to image tissue blood flow velocity and oxygenation, using a fast CCD camera and miniature VCSEL illumination. We combined two techniques of laser speckle contrast imaging (LSCI) and intrinsic optical signal imaging (IOSI) simultaneously, using these compact laser sources, to monitor induced cortical ischemia in a full field format with high temporal acquisition rates. We have demonstrated tracking seizure activity, evaluating blood-brain barrier breaching, and integrating fast spatial light modulators for extended imaging depth and auto-focusing during brain imaging of flow dynamics. Our current studies include prototype designs and system optimization and evaluation for a low-cost portable imaging system as a minimally invasive method for long-term neurological studies in un-anesthetized animals. This system will provide a better understanding of the progression and treatment efficacy of various neurological disorders, in freely behaving animals
Further Information:
Dr. Ofer Levi is an Associate Professor in the Institute of Biomaterials and Biomedical Engineering and the Edward S. Rogers Sr. Department of Electrical and Computer Engineering at the University of Toronto, currently on a Sabbatical leave at Stanford University. Dr. Levi received his Ph.D. in Physics from the Hebrew University of Jerusalem, Israel in 2000, and worked in 2000-2007 as a Postdoctoral Fellow and as a Research Associate at the Departments of Applied Physics and Electrical Engineering, Stanford University, CA. He serves as an Associate Editor in Biomedical Optics Express (OSA) and is a member of OSA, IEEE-Photonics, and SPIE. His recent research areas include biomedical imaging systems and optical bio-sensors based on semiconductor devices and nano-structures, and their application to bio-medical diagnostics, in vivo imaging, and study of bio-molecular interactions.
More Information: http://biophotonics.utoronto.ca/
Boyd Fowler
(Google)
Highlights from the International Workshop on Imaging Sensors

Please LOG IN to view the video.
Date: November 11, 2015
Description:
Image sensor innovation continues after more than 50 years of development. New image sensor markets are being developed while old markets continue to grow. Higher performance and lower cost image sensors are enabling these new applications. Although CMOS image sensors dominate the market, CCDs and other novel image sensors continue to be developed. In this talk we discuss trends in image sensor technology and present results from selected workshop papers. Moreover, we will discuss developments in phase pixel technology, stacked die image sensors, time of flight image sensors, SPAD image sensors, Quanta image sensors, low light level sensors, wide dynamic range sensors and global shutter sensors.
Further Information:
Boyd Fowler was born in California in 1965. He received his M.S.E.E. and Ph.D. degrees from Stanford University in 1990 and 1995 respectively. After finishing his Ph.D. he stayed at Stanford University as a research associate in the Electrical Engineering Information Systems Laboratory until 1998. In 1998 he founded Pixel Devices International in Sunnyvale California. Between 2005 and 2013 he was the CTO and VP of Technology at Fairchild Imaging. He is current at Google researching future directions for image sensors and imaging systems. He has authored numerous technical papers, book chapters and patents. His current research interests include CMOS image sensors, low noise image sensors, noise analysis, data compression, machine learning and vision.
Anat Levin
(The Weizmann Institute of Science)
Inverse Volume Rendering with Material Dictionaries

Please LOG IN to view the video.
Date: October 28, 2015
Description:
Translucent materials are ubiquitous, and simulating their appearance requires accurate physical parameters. However, physically-accurate parameters for scattering materials are difficult to acquire. We introduce an optimization framework for measuring bulk scattering properties of homogeneous materials (phase function, scattering coefficient, and absorption coefficient) that is more accurate, and more applicable to a broad range of materials. The optimization combines stochastic gradient descent with Monte Carlo rendering and a material dictionary to invert the radiative transfer equation. It offers several advantages: (1) it does not require isolating single-scattering events; (2) it allows measuring solids and liquids that are hard to dilute; (3) it returns parameters in physically-meaningful units; and (4) it does not restrict the shape of the phase function using Henyey-Greenstein or any other low-parameter model. We evaluate our approach by creating an acquisition setup that collects images of a material slab under narrow-beam RGB illumination. We validate results by measuring prescribed nano-dispersions and showing that recovered parameters match those predicted by Lorenz-Mie theory. We also provide a table of RGB scattering parameters for some common liquids and solids, which are validated by simulating color images in novel geometric configurations that match the corresponding photographs with less than 5% error.
Further Information:
http://www.wisdom.weizmann.ac.il/~levina/
Aydogan Ozcan
(UCLA)
Democratization of Next-Generation Microscopy, Sensing and Diagnostics Tools through Computational Photonics

Please LOG IN to view the video.
Date: October 20, 2015
Description:
My research focuses on the use of computation/algorithms to create new optical microscopy, sensing, and diagnostic techniques, significantly improving existing tools for probing micro- and nano-objects while also simplifying the designs of these analysis tools. In this presentation, I will introduce a new set of computational microscopes which use lens-free on-chip imaging to replace traditional lenses with holographic reconstruction algorithms. Basically, 3D images of specimens are reconstructed from their “shadows” providing considerably improved field-of-view (FOV) and depth-of-field, thus enabling large sample volumes to be rapidly imaged, even at nanoscale. These new computational microscopes routinely generate >1–2 billion pixels (giga-pixels), where even single viruses can be detected with a FOV that is >100 fold wider than other techniques. At the heart of this leapfrog performance lie self-assembled liquid nano-lenses that are computationally imaged on a chip. These self-assembled nano-lenses are stable for >1 hour at room temperature, and are composed of a biocompatible buffer that prevents nano-particle aggregation while also acting as a spatial “phase mask.” The field-of-view of these computational microscopes is equal to the active-area of the sensor-array, easily reaching, for example, >20 mm2 or >10 cm2 by employing state-of-the-art CMOS or CCD imaging chips, respectively.
In addition to this remarkable increase in throughput, another major benefit of this technology is that it lends itself to field-portable and cost-effective designs which easily integrate with smartphones to conduct giga-pixel tele-pathology and microscopy even in resource-poor and remote settings where traditional techniques are difficult to implement and sustain, thus opening the door to various telemedicine applications in global health. Some other examples of these smartphone-based biomedical tools that I will describe include imaging flow cytometers, immunochromatographic diagnostic test readers, bacteria/pathogen sensors, blood analyzers for complete blood count, and allergen detectors. Through the development of similar computational imagers, I will also report the discovery of new 3D swimming patterns observed in human and animal sperm. One of this newly discovered and extremely rare motion is in the form of “chiral ribbons” where the planar swings of the sperm head occur on an osculating plane creating in some cases a helical ribbon and in some others a twisted ribbon. Shedding light onto the statistics and biophysics of various micro-swimmers’ 3D motion, these results provide an important example of how biomedical imaging significantly benefits from emerging computational algorithms/theories, revolutionizing existing tools for observing various micro- and nano-scale phenomena in innovative, high-throughput, and yet cost-effective ways.
Rajiv Laroia
(The Light Company)
Gathering Light

Please LOG IN to view the video.
Date: October 16, 2015
Description:
With digital cameras in every cell phone, everyone is a photographer. But people still aspire to the better zoom, the lower noise, and the artistic bokeh effects provided by the digital SLR cameras, if only these features were available in as convenient and light-weight a package as a cell phone or a thin compact camera. Traditional high-end cameras have a big lens system that enables those features, but the drawback is weight, bulk, and inconvenience of carrying and switching lenses. In this talk, we discuss an alternative approach of using a heterogenous array of small cameras to provide those features, and more. Light’s camera technology combines prime lenses that provide an optical zoom equivalent of 35mm, 70mm, and 150mm lenses. Small mirrors allow reconfiguring the cameras to select the right level of zoom and field of view. This talk describes the architecture of this flexible computational camera.
Further Information:
Rajiv is the cofounder and CTO of The Light Company, a company dedicated to re-imagining photography. He previously founded and served as CTO of Flarion Technologies, which developed the base technology for LTE. Flarion was acquired by Qualcomm in 2006. Prior to Flarion, Rajiv held R&D leadership roles in Lucent Technologies Bell Labs. Rajiv holds a Ph.D. and Master’s degree from the University of Maryland, College Park and a Bachelor’s degree from the Indian Institute of Technology, Delhi, all in electrical engineering. He is recipient of the 2013 IEEE Industrial Innovation Award.
Robert LiKamWa
(Rice University)
Designing a Mixed-Signal ConvNet Vision Sensor for Continuous Mobile Vision

Please LOG IN to view the video.
Date: October 7, 2015
Description:
Continuously providing our computers with a view of what we see will enable novel services to assist our limited memory and attention. In this talk, we show that today’s system software and imaging hardware, highly optimized for photography, are ill-suited for this task. We present our early ideas towards a fundamental rethinking of the vision pipeline, centered around a novel vision sensor architecture, which we call RedEye. Targeting object recognition, we shift early convolutional processing into RedEye’s analog domain, reducing the workload of the analog readout and of the computational system. To ease analog design complexity, we design a modular column-parallel design to promote physical circuitry reuse and algorithmic cyclic reuse. RedEye also includes programmable mechanisms to admit noise for energy reduction, further increasing the sensor’s energy efficiency. Compared to conventional systems, RedEye reports an 85% reduction in sensor energy and a 45% reduction in computational energy.
Further Information:
Robert LiKamWa is a final year Ph.D. Student at Rice University. His research focus is on efficient support for continuous mobile vision. To supplement his research, he has interned and collaborated with Microsoft Research and Samsung Mobile Processor Innovation Lab on various projects related to vision systems. Robert received best paper awards from ACM MobiSys 2013 and PhoneSense 2011.
Liang Gao
(Rioch Innovations)
Developing Next Generation Multidimensional Optical Imaging Devices

Please LOG IN to view the video.
Date: September 30, 2015
Description:
When performing optical measurement with a limited photon budget, it is important to assure that each detected photon is as rich in information as possible. Conventional optical imaging systems generally tag light with just two characteristics (x, y), measuring its intensity in a 2D (x, y) lattice. However, this throws away much of the information content actually carried by a photon. This information can be written as (x, y, z, θ, φ, λ, t, ψ, χ): the spatial coordinates (x, y, z) are in 3D, the propagation polar angles (θ, φ) are in 2D, and the wavelength (λ), emission time (t), and polarization orientation and ellipticity angles (ψ, χ) are in 2D. Neglecting coherence effects, a photon thus carries with it nine tags. In order to explore this wealth of information, an imaging system should be able to characterize measured photons in 9D, rather than in 2D.
This presentation will provide an overview of the next generation of multidimensional optical imaging devices which leverage advances in computational optics, micro-fabrication, and detector technology. The resultant systems can simultaneously capture multiple photon tags in parallel, thereby maximizing the information content we can acquire from a single camera exposure. In particular, I will discuss our recent development of two game-changing technologies—a snapshot hyperspectral imager, image mapping spectrometer (IMS), and an ultrafast imager, compressed ultrafast photography (CUP)—and how these techniques can potentially revolutionize our sensation of surrounding world.
Further Information:
Dr. Liang Gao is currently an advisory research scientist in computational optical imaging group at Ricoh Innovations. His primary research interests are microscopy, including super-resolution microscopy and photoacoustic microscopy, cost-effective high-performance optics for diagnostics, computational optical imaging, ultrafast imaging, and multidimensional optical imaging. Dr. Liang Gao is the author of more than 30 peer-reviewed publications in top-tier journals, such as Nature, Physics Report, and Annual Review of Biomedical Engineering. He received his BS degree in Physics from Tsinghua University in 2005 and PhD degree in Applied Physics and Bioengineering from Rice University in 2011.
Stephen Hicks
(Oxford University )
From Electrodes to Smart Glasses: Augmented vision for the sight-impaired

Please LOG IN to view the video.
Date: August 14, 2015
Description:
The majority of the world’s 40 million “blind” people have some areas of remaining sight. This is known as residual vision and while it is generally insufficient for sighted tasks such as reading, navigating and detecting faces, it can often be augmented and enhanced through the use of near-eye displays.
Low vision is primarily a problem of contrast: patients are often unable to differentiate target objects (often foreground objects) from busy backgrounds. Depth imaging, and more recently semantic object segmentation, provide the tools to easily isolate foreground objects, allowing them to be enhanced in ways that exaggerate object boundaries, surface features and contrast. Computer vision and 3D mapping are also advancing a new form of enabling device, one that is more aware of its spatial surroundings and able to direct the user to specific objects on demand.
The emergence of small depth-RGB cameras, powerful portable computers and higher quality wearable displays means that for the first time we are able to consider building a vision augmenting system for daily long-term use. The requirements depend somewhat on the eye condition of the user such as the residual visual field, and colour and contrast sensitivity, but also on the needs and context of the user. Advances in all these areas, from low profile displays, deep learning, and context sensitive task prioritisation mean that advanced wearable assistants are now within reach.
In my talk I will discuss our efforts to develop and validate a generally useful Smart Specs platform, which is now part of the first wide-scale test of augmented vision in the UK funded by Google. This work will be put in context of ongoing Oxford projects such as implanted retinal prosthetics, gene therapies and sensory substitution devices. Much of this work has applications beyond visual impairment
Further Information:
Stephen Hicks is a Lecturer in Neuroscience at the University of Oxford, and Royal Academy of Engineering Enterprise Fellow. He is the lead investigator of the Smart Specs research group who are building and validating novel forms of sight enhancement for blind and partially sighted individuals. Stephen completed a PhD in neuroscience at the University of Sydney in Australia, studying vision and spatial memory. On moving to the UK he began a post doctoral position at Imperial College London developing portable eye trackers for neurological diagnoses and computer vision techniques for electronic retinal implants. Stephen joined Oxford in 2009 where he developed concepts for image optimization in prosthetic vision, leading to the formation of the Smart Specs research group. He works closely with Professor Phil Torr in the Department of Engineering to develop semantic imaging systems for 3D object recognition and mapping.
Stephen won the Royal Society’s Brian Mercer Award for Innovation in 2013 and led the team who won the UK’s Google Global Impact Challenge in 2014 to build portable smart glasses for sight enhancement. He is the co-founder of Visual Alchemy Ltd (www.VA-ST.com) which is beginning to commercialize the Smart Specs platform.
Kristina Irsch
(Wilmer Eye Institute, John Hopkins School of Medicine)
Remote detection of binocular fixation and focus using polarization optics and retinal birefringence properties of the eye

Please LOG IN to view the video.
Date: July 14, 2015
Description:
Amblyopia (“lazy eye”) is a major public health problem, caused by misalignment of the eyes (strabismus) or defocus. If detected early in childhood, there is an excellent response to therapy, yet most children are detected too late to be treated effectively. Commercially available vision screening devices that test for amblyopia’s primary causes can detect strabismus only indirectly and inaccurately via assessment of the positions of external light reflections from the cornea, but they cannot detect the anatomical feature of the eyes where fixation actually occurs (the fovea). This talk presents an accurate and calibration-free technique for remote localization of the true fixation point of an eye by employing the characteristic birefringence signature of the radially arrayed Henle fibers delineating the fovea. Progress on the development of a medical diagnostic screening device for eye misalignment and defocus will be presented, and other potential applications will be discussed.
Further Information:
Kristina Irsch is a German physicist specializing in biomedical and ophthalmic optics. She received her Ph.D. from the University of Heidelberg in Germany where she trained under Josef F. Bille, Ph.D. She went to the Johns Hopkins University School of Medicine in Baltimore, Maryland in 2005, first as a visiting graduate student, and later completed a post-doctoral research fellowship in ophthalmic optics and instrumentation under David L. Guyton, M.D. before joining the faculty as Assistant Professor of Ophthalmology in 2010. Much of her research has focused on remote eye fixation and focus detection, using polarization optics and retinal birefringence properties of the eye, and its use in clinical settings. The main goal is to identify children with strabismus (misalignment of the eyes) and focusing abnormalities, at an early and still easily curable stage, before irreversible amblyopia (functional monocular visual impairment, or “lazy eye”) develops.
Sara Abrahamsson
(Rockefeller University)
Instantaneous, high-resolution 3D imaging using multifocus microscopy (MFM) and applications in functional neuronal imaging

Please LOG IN to view the video.
Date: June 18, 2015
Description:
Multifocus microscopy (MFM) is an optical imaging technique that delivers instant 3D data at high spatial and temporal resolution. Diffractive Fourier optics are used to multiplex and refocus light, forming a 3D image consisting of an instantaneously formed focal stack of 2D images. MFM optics can be appended to a commercial microscope in a “black box” at the camera port, to capture 3D movies of quickly moving systems at the native resolution of the imaging system to which it is attached. MFM can also be combined with super-resolution methods to image beyond 200nm resolution. Systems can be tailored to fit different imaging volumes, covering for example seven or nine image planes to study mRNA diffusion inside a cell nucleus, or 25 focal planes or more to study cell division or neuronal activity in a developing embryo.
In the Bargmann lab at the Rockefeller University, MFM is applied in functional neuronal imaging in the nematode C. elegans – a millimeter-sized worm that with its compact but versatile nervous system of 302 neurons is a common model organism in neurobiology. The genetically expressed calcium-indicator dye GCaMP is used to visualize neuronal activity in the worm. Using MFM it is possible to image entire 3D clusters of neurons in living animals with single neuron resolution, and to perform unbiased imaging screenings of entire circuits to identify neuronal function during for example olfactory stimulation.
Further Information:
Sara came to UCSF in 2004 from the Royal Institute of Technology, Stockholm, Sweden to spend six months on an optical design project with Professor Mats Gustafsson. Mats tricked her into staying in the lab to do a Ph.D. and later to follow him to HHMI Janelia Research Campus. During this time, Sara developed two diffractive Fourier optics systems for live bio-microscopy in extended sample volumes: Extended Focus (EF)1 and multifocus microscopy (MFM)2,3. These techniques have since been applied to a wide variety of fast live-imaging projects in biological research.
Currently, Sara is a Leon Levy postdoctoral fellow in the laboratory of Cori Bargmann at the Rockefeller University, where she applies MFM to functional neuronal imaging in extended sample volumes. Sara also spends a lot of her time in the nanofabrication clean rooms at CNF, Cornell University, Ithaca, NY and NIST, MD, using multilevel, deep UV-lithography to fab the various diffractive Fourier optics devices she designs and builds in the labs of collaborators in the US and Europe. Sara spends her summers at the Marine Biological Laboratory in Woods Hole on Cape Cod, developing 3D polarization microscopy methods with the Oldenbourg lab.
Bernd Richter
(Fraunhofer FEP)
Bidirectional OLED Microdisplays: Technology and Applications

Please LOG IN to view the video.
Date: June 5, 2015
Description:
Microdisplays based on the OLED-on-Silicon technology getting more and more spread in data glasses and electronic viewfinders. In these applications the OLED microdisplay can take advantage of its low power consumption and the self-emitting behaviour of the OLED that enables simpler optics because of no need to use an additional backlight illumination. Fraunhofers approach is to extend these advantages by embedding an additional image sensor inside the active display area. So it is possible to create a bidirectional microdisplay (light emission and detection in the same plane). This special feature can be used to realize eye-tracking by capturing the eye-scene of the glass user and thus providing a hands-free capability of interaction with the system. This talk will introduce the OLED-on-Silicon technology and the approach of bidirectional OLED microdisplays and its applications in interactive data-glasses. In addition to that the latest generation of a bidirectional microdisplay with increased SVGA resolution will be demonstrated.
Further Information:
Bernd Richter (Fraunhofer FEP) received his diploma in electrical engineering from TU Dresden in 2003. Afterwards he joined the analog & mixed-signal IC design group at Fraunhofer in Dresden. The focus of his work is the CMOS design for OLED microdisplays and sensor applications as well as More-than-Moore technologies with focus on Organic-on-Silicon. He managed several projects and generations of OLED microdisplays for public and industrial customers. Since 2012 Bernd Richter is heading the department “IC & System Design” at Fraunhofer FEP.
Francesco Aieta
(HP Labs)
Achromatic Metasurfaces: towards broadband flat optics

Please LOG IN to view the video.
Date: May 27, 2015
Description:
Conventional optical components rely on the propagation through thick materials to control the amplitude, phase and polarization of light. Metasurfaces provide a new path for designing planar optical devices with new functionalities. In this approach, the control of the wavefront is achieved by tailoring the geometry of subwavelength-spaced nano antennas. By designing an array of low-loss dielectric resonators we create metasurfaces with an engineered wavelength-dependent phase shift that compensates for the dispersion of the phase accumulated by light during propagation. In this way the large chromatic effects typical of all flat optical components can be corrected. A flat lens without chromatic aberrations and a beam deflector are demonstrated. The suppression of chromatic aberrations in metasurface-based planar photonics will find applications in lightweight collimators for displays, and chromatically-corrected imaging systems.
Further Information:
Francesco Aieta is a researcher at HP Labs specializing in novel photonics devices for imaging and sensing applications. He worked as postdoctoral fellow at Harvard University and received a PhD in Applied Physics from the Politecnica delle Marche (Italy) in 2013. His present and past research are focused on the study of novel flat optical materials, design of devices from mid infrared to visible spectrum for biomedical applications as well as electronics consumer products. Other area of interest includes plasmonics, light matter interaction at the nanoscale, optical trapping in anisotropic environment and properties of liquid crystals. He is a member of the Optical Society of America.
Jules Urbach
(OTOY)
Light Field Rendering and Streaming for VR and AR

Please LOG IN to view the video.
Date: May 13, 2015
Description:
Light field rendering produces realistic imagery that can be viewed from any vantage point. It is an ideal format for deploying cinematic experiences targeting consumer virtual and augmented reality devices with position tracking, as well as emerging light field display hardware. Because of their computational complexity and data requirements, light field rendering and content delivery have not been practical in the past. Jules Urbach, CEO of OTOY, will delve into the company’s content creation and delivery pipelines which are designed to make light field production and content publishing practical today.
Further Information:
Jules Urbach is a pioneer in computer graphics, streaming and 3D rendering with over 25 years of industry experience. He attended Harvard-Westlake high school in LA before being accepted to Harvard University. He decided to defer his acceptance to Harvard (indefinitely as it turned out) to make revolutionary video games. He made his first game, Hell Cab (Time Warner Interactive) at age 18, which was one of the first CD-ROM games ever created. Six years after Hell Cab, Jules founded Groove Alliance. Groove created the first 3D game ever available on Shockwave.com (Real Pool). Currently, Jules is busy working on his two latest ventures, OTOY and LightStage which aim to revolutionize 3D content capture, creation and delivery.
More Information: http://home.otoy.
Thrasyvoulos Pappas
(Northwestern University)
Visual Signal Analysis: Focus on Texture Similarity

Please LOG IN to view the video.
Date: May 6, 2015
Description:
Texture is an important visual attribute both for human perception and image analysis systems. We present structural texture similarity metrics (STSIM) and applications that critically depend on such metrics, with emphasis on image compression and content-based retrieval. The STSIM metrics account for human visual perception and the stochastic nature of textures. They rely entirely on local image statistics and allow substantial point-by-point deviations between textures that according to human judgment are similar or essentially identical.
We also present new testing procedures for objective texture similarity metrics. We identify three operating domains for evaluating the performance of such similarity metrics: the top of the similarity scale, where a monotonic relationship between metric values and subjective scores is desired; the ability to distinguish between perceptually similar and dissimilar textures; and the ability to retrieve “identical” textures. Each domain has different performance goals and requires different testing procedures. Experimental results demonstrate both the performance of the proposed metrics and the effectiveness of the proposed subjective testing procedures. The focus of our current work at Lawrence Livermore is on texture space characterization for surveillance applications.
Further Information:
Thrasos Pappas received the Ph.D. degree in electrical engineering and computer science from MIT in 1987. From 1987 until 1999, he was a Member of the Technical Staff at Bell Laboratories, Murray Hill, NJ. He joined the EECS Department at Northwestern in 1999. He is currently on sabbatical leave at Lawrence Livermore National Laboratory (January to May 2015). His research interests are in human perception and electronic media, and in particular, image and video quality and compression, image and video analysis, content-based retrieval, model-based halftoning, and tactile and multimodal interfaces. Prof. Pappas is a Fellow of the IEEE and SPIE. He has served as editor-in-chief of the IEEE Transactions on Image Processing (2010-12), and technical program co-chair of ICIP-01 and ICIP-09. Prof. Pappas is currently serving as VP-Publications for the Signal Processing Society of IEEE. Since 1997 he has been co-chair of the SPIE/IS&T Conference on Human Vision and Electronic Imaging.
Peter Milford
(Eyefluence)
Wearable Eye Tracking

Please LOG IN to view the video.
Date: April 29, 2015
Description:
Eye interaction technology can be applied to wearable computing systems, ranging from augmented reality systems to virtual reality to information display devices. Applications of eye interaction range from eye tracking, user interface control, iris recognition, foveated rendering, biometric data capture and many others. Researchers have been working on eye tracking since the 1800’s, starting with ‘observations’ to photography to direct contact and electrical methods to current camera based methods. I will outline eye tracking in general, with focus on wearable eye tracking and the applications.
Further Information:
Peter received his Ph.D in astrophysics at the University of Queensland, Brisbane Australia. He worked for 5 years at Stanford on a satellite based Solar observing experiment – observing a ‘large spherical object’. He left Stanford to join a startup developing a three degree of freedom magnetic tracker, with applications in virtual reality head tracking. He went on to start his consulting company, working with a variety of Silicon Valley firms, mainly in consumer electronics industry, bringing a practical physics approach to embedded sensors, imaging, calibration, factory test, algorithms etc. Peter has been working with Eyefluence Inc. since it’s founding and is CTO/VP Engineering overseeing a strong multi-disciplinary team in developing wearable eye interaction technology. He now looks at ‘small spherical’ objects. Eyefluence Inc. goal is to transform intent into action through your eyes. Eyefluence is developing a variety of eye interaction technologies for upcoming wearable display systems, including eye tracking, iris recognition and user interfaces for control of HMDs.
http://
Paul Debevec
(USC Institute for Creative Technologies)
Achieving Photoreal Digital Actors

Please LOG IN to view the video.
Date: April 22, 2015
Description:
We have entered an age where even the human actors in a movie can now be created as computer generated imagery. Somewhere between “Final Fantasy: the Spirits Within” in 2001 and “The Curious Case of Benjamin Button” in 2008, digital actors crossed the “Uncanny Valley” from looking strangely synthetic to believably real. This talk describes how the Light Stage scanning systems and HDRI lighting techniques developed at the USC Institute for Creative Technologies have helped create digital actors in a wide range of recent films. For in‐depth examples, the talk describes how high‐resolution face scanning, advanced character rigging, and performance‐driven facial animation were combined to create “Digital Emily”, a collaboration with Image Metrics (now Faceware) yielding one of the first photoreal digital actors, and 2013’s “Digital Ira”, a collaboration with Activision Inc., yielding the most realistic real‐time digital actor to date. A recent project with USC’s Shoah Foundation is recording light field video of interviews with survivors of the Holocaust to allow interactive conversations with life-size automultiscopic projections.
Further Information:
Paul Debevec (http://www.ict.usc.edu/~debevec) is a Research Professor at the University of Southern California and the Chief Visual Officer at USC’s Institute for Creative Technologies. From his 1996 P.hD. at UC Berkeley, Debevec’s publications and animations have focused on techniques for photogrammetry, image‐based rendering, high dynamic range imaging, image‐based lighting, appearance measurement, facial animation, and 3D displays. Debevec is an IEEE Senior Member and Co-Chair of the Academy of Motion Picture Arts and Sciences’ (AMPAS) Science and Technology Council. He received a Scientific and Engineering Academy Award® in 2010 for his work on the Light Stage facial capture systems, used in movies including Spider‐Man 2, Superman Returns, The Curious Case of Benjamin Button, Avatar, Tron: Legacy, The Avengers, The Avengers, Oblivion, Gravity, Maleficent, and Furious 7. In 2014, Debevec was profiled in The New Yorker magazine’s “Pixel Perfect: the scientist behind the digital cloning of actors” article by Margaret Talbot. He also recently worked with the Smithsonian Institution to digitize a 3D model of President Barack Obama.
Peyman Milanfar
(Google)
Computational Imaging: From Photons to Photos

Please LOG IN to view the video.
Date: April 15, 2015
Description:
Fancy cameras used to be the exclusive domain of professional photographers and experimental scientists. Times have changed, but even as recently as a decade ago, consumer cameras were solitary pieces of hardware and glass; disconnected gadgets with little brains, and no software. But now, everyone owns a smartphone with a powerful processor, and every smartphone has a camera. These mobile cameras are simple, costing only a few dollars per unit. And on their own, they are no competition for their more expensive cousins. But coupled with the processing power native to the devices in which they sit, they are so effective that much of the low-end point-and-shoot camera market has already been decimated by mobile photography. Computational imaging is the enabler for this new paradigm in consumer photography. It is the art, science, and engineering of producing a great shot (moving or still) from small form factor, mobile cameras. It does so by changing the rules of image capture — recording information in space, time, and across other degrees of freedom — while relying heavily on post-processing to produce a final result. Ironically, in this respect, mobile imaging devices are now more like scientific instruments than conventional cameras. This has deep implications for the future of consumer photography. In this technological landscape, the ubiquity of devices and open platforms for imaging will inevitably lead to an explosion of technical and economic activity, as was the case with other types of mobile applications. Meanwhile, clever algorithms, along with dedicated hardware architectures, will take center stage and enable unprecedented imaging capabilities in the user’s hands.
Further Information:
Peyman received his undergraduate education in electrical engineering and mathematics from the University of California, Berkeley, and the MS and PhD degrees in electrical engineering from the Massachusetts Institute of Technology. He was a Professor of EE at UC Santa Cruz from 1999-2014, having served as Associate Dean of the School of Engineering from 2010-12. From 2012-2014 he was at Google-x, where he helped develop the imaging pipeline for Google Glass. He currently leads the Computational Imaging team in Google Research. He holds 8 US patents; has been keynote speaker at numerous conferences including PCS, SPIE, and ICME; and along with his students, won several best paper awards from the IEEE Signal Processing Society. He is a Fellow of the IEEE.
Yangyan Li and Matthias Niessner
(Stanford University)
From Acquisition to Understanding of 3D Shapes

Please LOG IN to view the video.
Date: April 8, 2015
Description:
Understanding 3D shapes is an essential but very challenging task for many scenarios ranging from robotics to computer graphics and vision applications. In particular, range sensing technology has made the shape understanding problem even more relevant, as we can now easily capture the geometry of the real world. In this talk, we will demonstrate how we can obtain a 3D reconstruction of an environment, and how we can exploit these results to infer semantic attributes in a scene. More specifically, we introduce a SLAM technique for large-scale 3D reconstruction using a Microsoft Kinect sensor. We will then present a method to jointly segment and classify the underlying 3D geometry. Furthermore, we propose to locate and recognize individual 3D shapes during the scanning, where large shape collections are used as the source of prior knowledge. In the future, we plan to extend our work from reconstructing geometry and object labelling to inferring objects’ physical properties such as weights.
Further Information:
Yangyan Li is a post-doctoral scholar at Prof. Leonidas J. Guibas’ Geometric Computation Group in Stanford University, affiliated with Max Planck Center for Visual Computing and Communication. Yangyan received his PhD degree rom Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, under the supervision of Prof. Baoquan Chen in 2013. His primary research interests fall in the field of Computer Graphics with an emphasis on 3D reconstruction.
Matthias Niessner is a visiting assistant professor at Stanford University affiliated with the Max Planck Center for Visual Computing and Communication. Previous to his appointment at Stanford, he earned his PhD from the University of Erlangen-Nuremberg, Germany under the supervision of Günther Greiner. His research focuses on different fields of computer graphics and computer vision, including real-time rendering, reconstruction of 3D scene environments, and semantic scene understanding.
Roland Angst
(Stanford Univeristy)
Challenges in Image-Based 3D Reconstructions

Please LOG IN to view the video.
Date: March 11, 2015
Description:
Driven by the needs for various applications such as robotics, immersive augmented and virtual reality, digitization of archeological sites and landmarks, medical imaging, etc., the extraction of 3D geometry from images has become increasingly important in the last couple of years. The theory of multiple view geometry which relates images from different viewpoints dates back more than 100 years. However, in practice, e.g. due to imperfections of cameras or measuring noise, the required assumptions for this theory are often not met exactly which makes 3D computer vision inherently difficult.
In my talk, I will first outline some of the challenges we are faced with and in the second part, I will focus on two of those challenges. Specifically, we will look into radial distortion estimation without calibration targets and dense 3D reconstructions for scenes where the rigidity assumption is violated. We will see how simple and very intuitive reasoning in geometric terms can provide the foundation for algorithms to tackle those challenges.
Further Information:
Roland Angst is currently affiliated to the Max Planck Center for Visual Computing and Communication. As such, he is currently a visiting assistant professor at Stanford University, where he is a member of Prof. Bernd Girod’s Image, Video, and Group as well as of Prof. Leonidas J. Guibas’ Geometric Computation Group. He will join the Max Planck Institute in Saarbrücken in April 2015.
Roland received his PhD degree from the Swiss Federal Institute of Technology (ETH) Zürich in 2012 under the supervision of Prof. Marc Pollefeys. His research focused on geometric computer vision and on subspace models and algorithms in particular. In 2010, he has received a prestigious Google European Doctoral Fellowship in Computer Vision. Roland received his Master’s degree in computer science with distinction from ETH Zürich in October 2007. His current primary research interests span computer vision and geometry, augmented and virtual reality
Kathrin Berkner
(Ricoh Innovations)
Measuring material and surfaces properties from light fields

Please LOG IN to view the video.
Date: March 4, 2015
Description:
Light field imaging has been emerging for the past view years. The capability of capturing the light field of a scene in a single snapshot has enabled new applications not just for consumer photography, but also for industrial applications where 3D reconstruction of scenes is desired. A less researched application areas is that of the characterization of materials and surfaces from light fields. In this talk we discuss some of those applications and show how the imaging tasks impact the end-to-end system design of a resulting task-specific light field imaging system.
Further Information:
Kathrin Berkner is Deputy Director of Research at Ricoh Innovations, where she is leading research on sensing technologies and collaborations with Ricoh R&D teams in Japan and India. Her research team of computational imaging experts has been developing technology that made its way into several new imaging products for Ricoh. Before working with optical elements, Dr. Berkner worked on a variety of image and document processing technologies that were implemented in Ricoh’s core multifunction printing products and achieved several internal awards. Prior to joining Ricoh, she was a Postdoctoral Researcher at Rice University, Houston, TX, performing research on wavelets with the Rice DSP group. Dr. Berkner holds a PhD. degree in mathematics from the University of Bremen, Germany.
Martin Banks
(UC Berkeley)
Vergence-Accommodation Conflicts in Stereoscopic Displays

Please LOG IN to view the video.
Date: February 25, 2015
Description:
Stereoscopic displays present different images to the two eyes and thereby create a compelling three-dimensional (3D) sensation. They are being developed for numerous applications. However, stereoscopic displays cause perceptual distortions, performance decrements, and visual discomfort. These problems occur because some of the presented depth cues (i.e., perspective and binocular disparity) specify the intended 3D scene while focus cues (blur and accommodation) specify the fixed distance of the display itself. We have developed a stereoscopic display that circumvents these problems. It consists of a fast switchable lens (>1 kHz) synchronized to the display such that focus cues are nearly correct. Using this display, we have investigated how the conflict between vergence and accommodation affects 3D shape perception, visual performance, and, most importantly, visual comfort. We offer guidelines to minimize these adverse effects.
Further Information:
Martin S. Banks is a Professor of Optometry and Vision Science at the University of California at Berkeley. He has received numerous awards for his work on basic and applied research on human visual development, on visual space perception, and on the development and evaluation of stereoscopic displays. He was appointed Fellow of the Center for Advanced Study of the Behavioral Sciences (1988), Honorary Research Fellow of Cardiff University (2007), Fellow of the American Association for the Advancement of Science (2008), Fellow of the American Psychological Society (2009), Holgate Fellow of Durham University (2011), and WICN Fellow of University of Wales (2011).
Professor Banks received his Bachelor’s degree at Occidental College in 1970 where he majored in Psychology and minored in Physics. He received a Master’s degree in Experimental Psychology from UC San Diego in 1973 and a doctorate in Developmental Psychology from University of Minnesota in 1976. He was Assistant and Associate Professor of Psychology at the University of Texas at Austin from 1976-1985. He moved to UC Berkeley School of Optometry in 1985, and was Chairman of the Vision Science Program from 1995-2002, and again in 2012.
Michael Zordan
(Sony Biotechnology)
The Spectral Flow Cytometer

Please LOG IN to view the video.
Date: February 18, 2015
Description:
Spectral flow cytometry is an exciting technology for cytomics and systems biology. Spectral flow cytometry differs from conventional flow cytometry in that the measured parameters for events are fluorescence spectra taken across all detectors as opposed to being primarily the fluorescence signal measured from one detector. This gives spectral flow cytometry capabilities and flexibility that far exceeds those of conventional flow cytometry.
There are several different hardware schemes that can be used to measure spectral data from cells. The core functions that a spectral detection scheme must have are:
1. A means to spatially separate collected light based on wavelength.
2. A multichannel detection system that will simultaneously measure the signals at different wavelengths independently.
3. The data processing power to perform spectral unmixing for real time display.
These fundamental differences enable spectral flow cytometers to perform applications that are not readily possible on conventional flow cytometers. Cellular autofluorescence can be used as a parameter in spectral flow cytometry, giving up new options for analysis that are not present in conventional flow cytometry. Additionally, because a spectral flow cytometer measures the whole fluorescence spectrum for each fluorophore, overlapping fluorophores can be resolved based on spectral shape allowing for the use of markers that would not be resolvable by conventional flow cytometry. Sony Biotechnology Inc. has recently released the SP6800, the world’s first commercial spectral flow cytometer.
Further Information:
Michael is a Staff Engineer at Sony Biotechnology Inc. specializing in the design and use of flow cytometry instrumentation, with particular emphasis on the optical design of the systems. He has been a lead engineer on the SY3200 cell sorter, the EC800 flow analyzer and has contributed to the SP6800 Spectral Analyzer. He received a Ph.D. from Purdue University in 2010 in Biomedical Engineering where he developed optical methods for the detection and isolation of single rare cells. He is an ISAC Scholar, and a member of the ISAC Data Standards Task Force. His current research interests include spectral cell analysis and next generation cellular analysis techniques.
Zeev Zalevsky
(Bar-Ilan University)
Remote Photonic Sensing and Super Resolved Imaging

Please LOG IN to view the video.
Date: February 11, 2015
Description:
My talk will be divided into two parts. In the first I will present a technological platform that can be used for remote sensing of biomedical parameters as well as for establishing a directional communication channel. The technology is based upon illuminating a surface with a laser and then using an imaging camera to perform temporal and spatial tracking of secondary speckle patterns in order to have nano metric accurate estimation of the movement of the back reflecting surface. If the back reflecting surface is a skin located close to main blood arteries then biomedical monitoring can be realized. If the surface is close to our neck or head then a directional communication channel can be established.
The proposed technology was already applied for remote and continuous estimation of heart beats, blood pulse pressure, intra ocular pressure, estimation of alcohol and glucose concentrations in blood stream as well as for early detection of malaria. It was also experimentally used as invisible photonic mean for remote, directional and noise isolated sensing of speech signals.
The second part of my talk will deal with optical super resolution. Digital imaging systems as well as human vision system have limited capability for separation of spatial features. Therefore, the imaging resolution is limited. The reasons to this limitation are related to the effect of diffraction i.e. the finite dimensions of the imaging optics, the geometry of the sensing array and its sensitivity as well as the axial position of the object itself which may be out of focus.
In my talk I will present novel photonic approaches and means to exceed the above mentioned limitations existing in the vision science and eventually to allow us having super resolved imaging providing improved lateral and axial capabilities for separation of spatial features.
Further Information:
Zeev Zalevsky is a full Professor in the faculty of engineering in Bar-Ilan University, Israel. His major fields of research are optical super resolution, biomedical optics, nano-photonics and electro-optical devices, RF photonics and beam shaping. Zeev received his B.Sc. and Ph.D. degrees in electrical engineering from Tel-Aviv University in 1993 and 1996 respectively. He has many publications, patents and awards recognizing his significant contribution to the field of super resolved imaging and biomedical sensing. Zeev is an OSA, SPIE and EOS fellow and IEEE senior member. He is currently serving as the vice Dean of engineering, the head of the electro-optics track and a director of the Nanophotonics Center at the Bar-Ilan Institute of Nanotechnology. Zeev is also the founder of several startup companies.
Joseph Ford
(UC San Diego)
Miniaturized panoramic cameras using fiber-coupled spherical optics

Please LOG IN to view the video.
Date: February 4, 2015
Description:
Conventional digital camera require lenses that form images directly onto focal planes, a natural consequence of the difficulties of fabricating non-planar image sensors. But using a curved image surface can dramatically increase the aperture, resolution and field of view achievable within a compact volume. This presentation will highlight imager research by UCSD, and collaborators at Distant Focus, done within the DARPA “SCENICC” program. I’ll show an acorn-sized F/1.7 lens with a 12 mm focal length, a 120˚ field of view, a spectrum that extends from the visible to near infrared, and a measured resolution of over 300 lp/mm on its spherical image surface. The spherical image surface is coupled to one or more focal planes by high-resolution optical fiber bundles, resulting in raw images that compare well to conventional cameras an order of magnitude larger. These images are further improved by computational photography techniques specific to the fiber-coupled “cascade” image, where a continuous image is sampled by a quasi-periodic fiber bundle before transfer and re-sampling by the rectangular pixel array. I’ll show the results of such image processing, and how this technology can fit an F/1 omnidirectional 150 Mpixel/frame (or larger) movie camera into a 4″ diameter sphere.
Further Information:
Joseph E. Ford is a Professor of ECE at the University of California San Diego working in free-space optics for communications, energy, and sensing. At AT&T Bell Labs from 1994 to 2000, Dr. Ford led research demonstrating the first MEMS attenuator, spectral equalizer and wavelength add/drop switch, technologies now in widespread use. Dr. Ford was General Chair of the first IEEE Conference on Optical MEMS in 2000, and General Chair for the 2008 OSA Optical Fiber Communications Conference. Dr. Ford is co-author on 47 United States patents and over 200 journal articles and conference proceedings, and a Fellow of the Optical Society of America. He leads UCSD’s Photonics Systems Integration Lab (psilab.ucsd.edu), a research group doing advanced free-space optical system design, prototyping and characterization for a wide range of applications.
Tom Malzbender
(Cultural Heritage Imaging)
Capturing and Transforming Surface Reflectance: Imaging the Antikythera Mechanism

Please LOG IN to view the video.
Date: January 28, 2015
Description:
In 1900, a party of sponge divers chanced on the wreck of a Roman merchant vessel between Crete and mainland Greece. It was found to contain numerous ancient Greek treasures, among them a mysterious lump of clay that split open to reveal ‘mathematical gears’ as it dried out. This object is now known as the Antikythera Mechanism, one of the most enlightening artifacts in terms of revealing the advanced nature of ancient Greek science and technology. In 2005 we traveled to the National Archeological Museum in Athens to apply our reflectance imaging methods to the mechanism for the purpose of revealing ancient writing on the device. These methods capture surface appearance and transform reflectance properties to allow subtle surface shape to be seen that is otherwise difficult to perceive. We were successful, and along with the results of Microfocus CT imaging, epigraphers were able to decipher 3000 characters compared with the original 800 known. This led to an understanding that the device was a mechanical, astronomical computer, built around 150 B.C.E. and capable of predicting solar and lunar eclipses. This talk will overview the reflectance imaging methods as well as what they reveal about the Antikythera Mechanism.
Further Information:
Tom Malzbender is a researcher who recently completed a 31 year career at Hewlett-Packard Laboratories, working at the interface of computer graphics, vision, imaging and signal processing. At HPL he and developed the methods of Fourier Volume Rendering, Polynomial Texture Mapping (PTM) and Reflectance Transformation, as well as directing the Visual Computing Department. Tom also developed the capacitive sensing technology that allowed HP to penetrate the consumer graphics tablet market. His PTM/RTI methods are used by most major museums in North America and Europe and in the fields of criminal forensics, paleontology and archaeology. He has co-chaired or served on the program comittee of over 30 conferences in computer graphics and vision. Tom now serves on the board of Cultural Heritage Imaging. More information can be found at https://sites.google.com/
Giacomo Chiari
(Turin University, Getty Conservation Institute)
Imaging Cultural Heritage

Please LOG IN to view the video.
Date: January 21, 2015
Description:
An image is worth a 1000 words. The recent progress in imaging techniques applied to Cultural Heritage has been immense and to cover them all it would take a full university course. This lecture presents in a succinct way the applications of imaging done at the Getty Conservation Institute in the last 10 years. The fundamental problem of registering and superimposing images obtained by using different techniques has been solved, and several examples will show how powerful this is. Chemical mapping, coupled with spot noninvasive analyses on selected points, enormously reduces the need to take samples. Several techniques to image the invisible are described, some old and revitalized thanks to new tools, like Electron Emission or defocused radiography; others totally new and made possible by the advent of modern detectors and excitation means. A 3D visualization of medium-large bronze statue via Ct-scan has opened new insights into the defects in the statue’s manufacture and the subsequent deterioration. The possibility to detect loose pieces of plaster at a distance, that are dangerous to the public, using Laser Speckle interferometry can save large amounts of money and conservators time. Multispectral analysis, giving different information for each wavelength, makes it possible to select the most informative images and combine them together. Visible Induced Luminescence can uniquely map Egyptian blue and the examples shown demonstrate how powerful this technique is.
Further Information:
Giacomo Chiari, a full professor in crystallography from Turin University, worked extensively on cultural heritage (Michelangelo’s Last Judgement, Maya Blue, adobe conservation in many countries). When he retired in 2003 from Turin University, he became the Chief Scientist at the Getty Conservation Institute in Los Angeles. He retired from the GCI in 2013 and he has been consulting and lecturing since then. At GCI he helped to develop new equipment (CT-scan for bronze, a portable noninvasive XRD/XRF device – DUETTO, a Laser Speckle interferometer to detect detached plasters, VIL, visually induced luminescence for mapping Egyptian Blue pigment and X-ray electron emission radiography). In the field he has worked on mural paintings in Peru, in Tutankhamen tomb and in Herculaneum.
Achintya Bhowmik
(Intel®)
Intel® RealSense Technology: Adding Immersive Sensing and Interactions to Computing Devices

Please LOG IN to view the video.
Date: January 14, 2015
Description:
How we interface and interact with computing and entertainment devices is on the verge of a revolutionary transformation, with natural user inputs based on touch, gesture, and voice replacing or augmenting the use of traditional interfaces based on the mouse, remote controls, and joysticks. With the rapid advances in natural sensing technologies, we are endowing the devices with the abilities to see, hear, feel, and understand us and the physical world. In this talk, we will present and demonstrate Intel® RealSense Technology, which is enabling a new class of interactive and immersive applications based on embedded real-time 3D visual sensing. We will also take a peek at the future of multimodal sensing and interactions.
Further Information:
Dr. Achin Bhowmik leads the research, development, and productization of advanced computing solutions based on natural interactions, intuitive interfaces, and immersive experiences, recently branded as Intel® RealSense Technology. Previously, he served as the chief of staff of the personal computing group, Intel’s largest business unit. Prior to that, he led the advanced video and display technology group, responsible for developing multimedia processing architecture for Intel’s computing products. His prior work includes liquid-crystal-on-silicon microdisplay technology and integrated optoelectronic devices.
As an adjunct and guest professor, he has taught graduate-level courses on advanced sensing and human-computer interactions, computer vision, and display technologies at the University of California, Berkeley, Kyung Hee University, Seoul, and University of California, Santa Cruz Extension. He has >150 publications, including two books, titled “Interactive Displays: Natural Human-Interface Technologies” and “Mobile Displays: Technology & Applications”, and 27 issued patents. He is an associate editor for the Journal of the Society for Information Display. He is the vice president of the Society for Information Display (SID), Americas, and a senior member of the IEEE. He is on the board of directors for OpenCV, the organization behind the open source computer vision library.
Andrew Gallagher
(Google)
Understanding Images of People with Social Context

Please LOG IN to view the video.
Date: January 7, 2015
Description:
When we see other humans, we can quickly make judgments such as their demographic description and identity if they are familiar to us. We can answer questions related to the activities and relationships between people in an image. We draw conclusions based not just on what we see, but also from a lifetime of experience of living and interacting with other people. Even simple, common sense knowledge such as the fact that children are smaller than adults allows us to better understand the roles of the people we see. In this work, we propose contextual features, for modelling social context, drawn from a variety of public sources, and models for understanding images of people with the objective of providing computers with access to the same contextual information that humans use.
Computer vision and data-driven image analysis can play a role in helping us learn about people. We now are able to see millions of candid and posed images of people on the Internet. We can describe people with a vector of possible first names, and automatically produce descriptions of particular people in an image. From a broad perspective, this work presents a loop in that our knowledge about people can help computer vision algorithms, and computer vision can help us learn more about people.
Further Information:
Andy is a Senior Software Engineer with Google, working with geo-referenced imagery. Previously, he was Visiting Research Scientist at Cornell University’s School of Electrical and Computer Engineering, and part of a computer vision start-up, TaggPic, that identified landmarks in images . He earned the Ph.D. degree in electrical and computer engineering from Carnegie Mellon University in 2009, advised by Prof. Tsuhan Chen. Andy worked for the Eastman Kodak Company from 1996 to 2012, initially developing computational photography and computer vision algorithms for digital photofinishing, such as dynamic range compression, red-eye correction and face recognition.
- Ofer Levi » Portable optical brain imaging
- Boyd Fowler » Highlights from the International Workshop on Imaging Sensors
- Anat Levin » Inverse Volume Rendering with Material Dictionaries
- Aydogan Ozcan » Microscopy, Sensing and Diagnostics Tools
- Rajiv Laroia » Gathering Light
- Robert LiKamWa » Mixed-Signal ConvNet Vision Sensor
- Liang Gao » Multidimensional Optical Imaging Devices
- Stephen Hicks » From Electrodes to Smart Glasses
- Kristina Irsch » Remote detection of binocular fixation and focus
- Sara Abrahamsson » 3D imaging using multifocus microscopy
- Bernd Richter » Bidirectional OLED Microdisplays
- Francesco Aieta » Achromatic Metasurfaces: towards broadband flat optics
- Jules Urbach » Light Field Rendering and Streaming for VR and AR
- Thrasyvoulos Pappas » Visual Signal Analysis: Focus on Texture Similarity
- Peter Milford » Wearable Eye Tracking
- Paul Debevec » Achieving Photoreal Digital Actors
- Peyman Milanfar » Computational Imaging: From Photons to Photos
- Yangyan Li and Matthias Niessner » From Acquisition to Understanding of 3D Shapes
- Roland Angst » Challenges in Image-Based 3D Reconstructions
- Kathrin Berkner » Measuring material and surfaces properties from light fields
- Martin Banks » Vergence-Accommodation Conflicts
- Michael Zordan » The Spectral Flow Cytometer
- Zeev Zalevsky » Remote Photonic Sensing
- Joseph Ford » Miniaturized panoramic cameras
- Tom Malzbender » Capturing and Transforming Surface Reflectance
- Giacomo Chiari » Imaging Cultural Heritage
- Achintya Bhowmik » Intel® RealSense Technology
- Andrew Gallagher » Understanding Images of People with Social Context
SCIEN Colloquia 2014
Aldo Badano
(US Food and Drug Administration)
A stereoscopic computational observer model for image quality assessment

Please LOG IN to view the video.
Date: December 10, 2014
Description:
As stereoscopic display devices become more commonplace, their image quality evaluation becomes increasingly important. Most studies on 3D displays rely on physical measurements or on human preference. Currently, there is no link correlating bench testing with detection performance for medical imaging applications. We describe a computational stereoscopic observer approach inspired by the mechanisms of stereopsis in human vision for task-based image quality assessment. The stereo-observer uses a left and a right image generated through a visualization operator to render 3D datasets for white and lumpy backgrounds. Our simulation framework generalizes different types of model observers including existing 2D and 3D observers as well as providing flexibility for the stereoscopic model approach. We show results quantifying the changes in performance when varying stereo angle as measured by the ideal linear stereoscopic observer. We apply the framework to the study of performance trade-offs for three stereoscopic display technologies. Our results show that the crosstalk signature for 3D content varies considerably when using different models of 3D glasses for active stereoscopic displays. Our methodology can be extended to model other aspects of the stereoscopic imaging chain in medical, entertainment, and other demanding applications.
Further Information:
Aldo Badano is a member of the Senior Biomedical Research Service and the Laboratory Leader for Imaging Physics in the Division of Imaging, Diagnostics, and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, US Food and Drug Administration. Dr. Badano leads a program on the characterization, modeling and assessment of medical image acquisition and display devices using experimental and computational methods. Dr. Badano is an affiliate faculty at the Fischell Bioengineering Department at the University of Maryland College Park, and at the Computer Science and Electrical Engineering Department of University of Maryland, Baltimore County. He received a PhD degree in Nuclear Engineering and a MEng in Radiological Health Engineering from the University of Michigan in 1999 and 1995, and a ChemEng degree from the Universidad de la República, Montevideo, Uruguay in 1992. He serves as Associate Editor for several scientific journals and as reviewer of technical proposals for DOD and NIH. Dr. Badano has authored more than 250 publications and a tutorial textbook on medical displays.
Johnny Lee
(Google)
Project Tango: Giving Mobile Devices a Human-Scale Understanding of Space and Motion

Please LOG IN to view the video.
Date: December 3, 2014
Description:
Project Tango is a focused effort to harvest research from the last decade of work in computer vision and robotics and concentrate that technology into a mobile device. It uses computer vision and advanced sensor fusion to estimate position and orientation of the device in the real-time, while simultaneously generating a 3D map of the environment. We will discuss some of the underlying technologies that make this possible, such as the hardware sensors and some of the software algorithms. We will also show demonstrations of how the technology could be used in both gaming and non-gaming applications. This is just the beginning and we hope you will join us on this journey. We believe it will be one worth taking.
Further Information:
Johnny Lee is a Technical Program Lead at the Advanced Technology and Projects (ATAP) group at Google. He leads Project Tango, which is a focused effort to bring computer vision and advanced sensor fusion to mobile platforms. Previously, he helped Google X explore new projects as Rapid Evaluator and was a core algorithms contributor to the original Xbox Kinect. His YouTube videos demonstrating Wii remote hacks have surpassed over 15 million views and became one of the most popular TED talk videos. In 2008, he received his PhD in Human-Computer Interaction from Carnegie Mellon University and has been recognized in MIT Technology Review’s TR35
Arthur Zhang
(Innovega)
iOptik: Contact Lens–Enabled Wearable Display Platform

Please LOG IN to view the video.
Date: November 12, 2014
Description:
A revolution in technology is currently underway with wearable electronics that will dramatically change how we interact with technology. Among all the ways that technology can provide us with sensory input and feedback, no method is more important than through our visual system. With smartphones, tablets, and TVs becoming ever larger to enable a more enjoyable and natural way to interact with digital content, consumers, industry, and military alike are turning to wearable displays. We are all seeking the ideal display that fits unobtrusively into our everyday lives, while providing very high visual performance. Innovega’s iOptik wearable display system accomplishes this by breaking away from any conventional optical method and merging the optics of the wearable display into high-tech contact lenses. The contact lenses provide the wearer with the ability to see their surroundings with perfectly corrected vision, while simultaneously allowing them the ability to view an immersive display, in a tiny form factor, embedded within fashionable eyewear. This talk will present an overview of how our technology works and provide some examples of the elements within our system.
Further Information:
Arthur Zhang is an engineer, scientist, and entrepreneur. He has been working in the startup environment since graduating with a PhD in applied physics in 2010 and is now the Senior Member of Technical Staff at Innovega. He has been responsible for developing many of the key components of Innovega’s technology, including the world’s first polarized contact lens. He has also been a key contributor to Innovega’s many eyewear platforms. Arthur has great interest in the merging of technology with the human body and is an expert in the integration of nano/micro-scale devices into medical devices.
Eli Peli
(Harvard Medical School)
Visual Issues with Head-Mounted Displays

Please LOG IN to view the video.
Date: October 29, 2014
Description:
After 25 years of commercial development of head-mounted displays (HMD) we seem to be approaching a point of maturation of the technology that will finally penetrate the market place. The presentation of images in near eye displays whether monocular, binocular, stereoscopic, or see-through for augmented vision has important consequences for the visual experience and of particular importance for the technology success is the comfort and safety of the users. I will discuss the ophthalmic consequences of HMDs that has been suggested, and the evidence collected so far. A major concern has been the decoupling of accommodation and convergence in (stereo and non-stereo) HMD that is presumed to cause eye strain and lead to numerous technological approaches to overcome. Motion sickness like symptoms are common with HMDs and with non-HMD stereo displays, but have been addressed to a much lesser extent. Other visual phenomena and visual challenges presented by HMDs will be presented as well.
Further Information:
Eli Peli is trained as an Electrical Engineer and an Optometrist. He is the Moakley Scholar in Aging Eye Research at Schepens, Massachusetts Eye and Ear, and Professor of Ophthalmology at Harvard Medical School. Dr. Peli is a Fellow of the American Academy of Optometry, the Optical Society of America, the Society for Information Display, and The International Society of Optical Engineering. He was presented the 2010 Otto Schade Prize from the SID (Society for Information Display) and the 2010 Edwin H Land Medal awarded jointly by the Optical Society of America and the Society for Imaging Science and Technology. His principal research interests are image processing in relation to visual function and clinical psychophysics in low vision rehabilitation, image understanding and evaluation of display-vision interaction. He also maintains an interest in oculomotor control and binocular vision. Dr. Peli is a consultant to many companies in the ophthalmic instrumentation area and to manufacturers of head mounted displays (HMD).
Bernard Kress
(Google [X] Labs)
From Virtual Reality Headsets to Smart Glasses and Beyond

Description:
Helmet Mounted Displays (HMDs) and Head Up Displays (HUDs) have been used extensively over the past decades especially within the defense sector. The complexity of the design and the fabrication of high quality see-through combiner optics to achieve high resolution over a large FOV have hindered their use in consumer electronic devices. Occlusion Head Mounted Displays (HMD) have also been used in the defense sector for simulation and training purposes, over similar large FOV, packed with custom head tracking and eye gesture sensors. Recently, a paradigm shift to consumer electronics has occurred as part of the wider wearable computing effort. Technologies developed for the smart phone industry have been used to build smaller, lower power, cheaper, electronics. Similarly, novel integrated sensors and micro-displays have enabled the development of consumer electronic smart glasses and smart eyewear, professional AR (Augmented Reality) HMDs as well as VR (Virtual Reality) headsets. Reducing the FOV while addressing the needs for an increased exit pupil (thus allowing their use by most people) alongside stringent industrial design constrains have been pushing the limits of the design techniques and technologies available to the optical engineer (refractive, catadioptric, micro-optic, segmented Fresnel, waveguide, diffractive, holographic, …).
The integration of the optical combiner within conventional meniscus prescription lenses is a challenge that has yet to be solved. We will review how a broad range of optical design techniques have been applied to fulfill such requirements, as well as the various head-worn devices developed to date. Finally, we will review additional optical technologies applied as input mechanisms (eye and head gesture sensing, gaze tracking and hand gesture sensing).
Further Information:
For over 20 years, Bernard has made significant scientific contributions as a researcher, professor, consultant, advisor, instructor, and author, in the field of micro-optics, diffractive optics and holography for research, industry and consumer electronics. He has been involved in half a dozen start-ups in the Silicon Valley on optical data storage, optical telecom, optical position sensors and display (picos, HUDs and HMDs). Bernard holds 28 international granted patents and 30 patents applications. He has published more than 100 proceeding papers and 18 refereed journal papers. He is a short course instructor for the SPIE on micro-optics, diffractive optics and wafer scale optics. He has published three books edited by John Wiley and Sons “Digital Diffractive Optics” (1999), “Applied Digital Optics” (2007) and Mac Graw Hill “Optical System Design: Diffractive Optics” (2005) and a field guide by SPIE “Digital Micro-Optics” (2014). He has been chairman of the SPIE conference “Photonics for Harsh Environments” for the past three years. He is currently with Google [X] working on the Google Glass project as the Principal Optical Architect.
Gordon Wetzstein
(Stanford University)
Compressive Light Field Display and Imaging Systems

Please LOG IN to view the video.
Description:
With rapid advances in optical fabrication, digital processing power, and computational perception, a new generation of display technology is emerging: compressive displays exploring the co-design of optical elements and computational processing while taking particular characteristics of the human visual system into account. We will review advances in this field and give an outlook on next-generation compressive display and imaging technology. In contrast to conventional technology, compressive displays aim for a joint-design of optics, electronics, and computational processing that together exploit compressibility of the presented data. For instance, light fields show the same 3D scene from different perspectives – all these images are very similar and therefore compressible. By combining displays that use multilayer architectures or directional backlighting combined with optimal light field factorizations, limitations of existing devices, for instance resolution, depth of field, and field of view, can be overcome. In addition to light field display and projection, we will discuss a variety of technologies for compressive super-resolution and high dynamic range image display as well as compressive light field imaging and microscopy.
Further Information:
Prior to joining Stanford University’s Electrical Engineering Department as an Assistant Professor in 2014, Gordon Wetzstein was a Research Scientist in the Camera Culture Group at the MIT Media Lab. His research focuses on computational imaging, microscopy, and display systems as well as computational light transport. At the intersection of computer graphics, machine vision, optics, scientific computing, and perception, this research has a wide range of applications in next-generation consumer electronics, scientific imaging, human-computer interaction, remote sensing, and many other areas. Gordon’s cross-disciplinary approach to research has been funded by DARPA, NSF, Intel, Samsung, and other grants from industry sponsors and research councils. In 2006, Gordon graduated with Honors from the Bauhaus in Weimar, Germany, and he received a Ph.D. in Computer Science from the University of British Columbia in 2011. His doctoral dissertation focuses on computational light modulation for image acquisition and display and won the Alain Fournier Ph.D. Dissertation Annual Award. He organized the IEEE 2012 and 2013 International Workshops on Computational Cameras and Displays, founded displayblocks.org as a forum for sharing computational display design instructions with the DIY community, and presented a number of courses on Computational Displays and Computational Photography at ACM SIGGRAPH. Gordon won the best paper awards at the International Conference on Computational Photography in 2011 and 2014 as well as a Laval Virtual Award in 2005.
Eric Fossum
(Dartmouth)
Quanta Image Sensor (QIS) Concept and Progress

Please LOG IN to view the video.
Description:
The Quanta Image Sensor (QIS) was conceived when contemplating shrinking pixel sizes and storage capacities, and the steady increase in digital processing power. In the single-bit QIS, the output of each field is a binary bit plane, where each bit represents the presence or absence of at least one photoelectron in a photodetector. A series of bit planes is generated through high speed readout, and a kernel or “cubicle” of bits (X,Y, t) is used to create a single output image pixel. The size of the cubicle can be adjust post-acquisition to optimize image quality. The specialized sub-diffraction-limit photodetectors in the QIS are referred to as “jots” and a QIS may have a gigajot or more, read out at 1000 fps, for a data rate exceeding 1Tb/s. Basically, we are trying to count photons as they arrive at the sensor. Recent progress towards realizing the QIS for commercial and scientific purposes will be discussed. This includes investigation of a pump-gate jot device implemented in a 65nm process, power efficient readout electronics, currently less than 20pJ/b in 0.18 um CMOS, creating images from jot data with high dynamic range, and understanding the imaging characteristics of single-bit and multi-bit QIS devices, such as the inherent and interesting film-like D-log(H) characteristic. If successful, the QIS will represent a major paradigm shift in image capture.
Further Information:
Eric R. Fossum is a Professor at the Thayer School of Engineering at Dartmouth. His work on miniaturizing NASA interplanetary spacecraft cameras at Caltech’s Jet Propulsion Laboratory in the early 1990’s led to his invention of the CMOS image sensor “camera-on-a-chip” that has touched many here on Earth, from every smartphone to automobiles and medicine, from security and safety to art, social media and political change. Used in billions of cameras each year, his technology has launched a world-wide explosion in digital imaging and visual communications. Honors include induction into the National Inventors Hall of Fame and election to the National Academy of Engineering and the National Academy of Inventors. He received the NASA Exceptional Achievement Medal and is a Fellow of the IEEE. He co-founded the International Image Sensor Society and served as its first President. A graduate of Trinity College and Yale University, Dr. Fossum taught at Columbia and then worked at JPL. He co-founded and led Photobit Corporation and later led MEMS-maker Siimpel. He joined Dartmouth in 2010, where he teaches and continues research on image sensors, and is Director of the school’s Ph.D. Innovation Program. He has published over 260 technical papers and holds over 150 U.S. patents. He and his wife have a small hobby farm in New Hampshire and he enjoys his time on his tractor.
Chris Bregler
(New York University)
Next Gen Motion Capture: From the Silver Screen to the Stadium and the Streets

Please LOG IN to view the video.
Date: May 28, 2014
Description:
Mermaids and pirates, the Hulk and Iron Man! This talk will describe the behind-the-scenes technology of our match-moving and 3D capture system used in recent movies, including The Avengers, Pirates of the Caribbean, Avatar, Star Trek, and The Lone Ranger, to create the latest 3D visual effects. It will also show how we have used similar technology for New York Times infographics to demonstrate the body language of presidential debates, the motions of a New York Philharmonics conductor, New York Yankee Mariano Rivera’s pitch style, and Olympic swimmer Dana Vollmer’s famous butterfly stroke that won her four gold medals.
While Motion Capture is the predominant technology used for these domains, we have moved beyond such studio-based technology to do special effects, movement visualization, and recognition without markers and without multiple high-speed IR cameras. Instead, many projects are shot on-site, outdoors, and in challenging environments with the benefit of new interactive computer vision techniques as well as new crowed-sourced and deep learning techniques.
Further Information:
Chris Bregler is a Professor of Computer Science at NYU’s Courant Institute, director of the NYU Movement Lab, and C.E.O. of ManhattanMocap, LLC. He received his M.S. and Ph.D. in Computer Science from U.C. Berkeley and his Diplom from Karlsruhe University. Prior to NYU he was on the faculty at Stanford University and worked for several companies including Hewlett Packard, Interval, Disney Feature Animation, and LucasFilm’s ILM. His motion capture research and commercial projects in science and entertainment have resulted in numerous publications, patents, and awards from the National Science Foundation, Sloan Foundation, Packard Foundation, Electronic Arts, Microsoft, Google, U.S. Navy, U.S. Airforce, and other sources. He has been named Stanford Joyce Faculty Fellow, Terman Fellow, and Sloan Research Fellow. He received the Olympus Prize for achievements in computer vision and pattern recognition and was awarded the IEEE Longuet-Higgins Prize for “Fundamental Contributions in Computer Vision that have withstood the test of time”. Some of his non-academic achievements include being the executive producer of Squidball.net, which required building the world’s largest real-time motion capture volume, and a massive multi-player motion game holding several world records in The Motion Capture Society. He was the chair for the SIGGRAPH Electronic Theater and Animation Festival. He has been active in the Visual Effects industry, for example, as the lead developer of ILM’s Multitrack system that has been used in many feature film productions. His work has also been featured in mainstream media such as the New York Times, Los Angeles Times, Scientific American, National Geographic, WIRED, Business Week, Variety, Hollywood Reporter, ABC, CBS, NBC, CNN, Discovery/Science Channel, and many other outlets.
Jon Shlens
(Google)
Engineering a Large Scale Vision System by Leveraging Semantic Knowledge

Please LOG IN to view the video.
Date: May 21, 2014
Description:
Computer-based vision systems are increasingly indispensable in our modern world. Modern visual recognition systems have been limited though in their ability to identify large numbers of object categories. This limitation is due in part to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows unbounded. One remedy is to leverage data from other sources – such as text data – both to train visual models and constrain their predictions. In this talk I will present our recent efforts at Google to build a novel architecture that employs a deep neural network to identify visual objects employing both labeled image data as well as semantic information gleaned from unannotated text. I will demonstrate that this model matches state-of-the-art performance on academic benchmarks while making semantically more reasonable errors. Most importantly, I will discuss how semantic information can be exploited to make predictions about image labels not observed during training. Semantic knowledge substantially improves “zero-shot” predictions achieving state-of-the-art performance on predicting tens of thousands of object categories never previously seen by the visual model.
Further Information:
Jon Shlens is a senior research scientist at Google since 2010. Prior to joining Google Research he was a research fellow at the Howard Hughes Medical Institute and a Miller Fellow at UC Berkeley. His research interests include machine perception, statistical signal processing, machine learning and biological neuroscience.
Ramesh Jain
(Stanford University)
Situation Recognition

Please LOG IN to view the video.
Date: May 7, 2014
Description:
With the growth in social media, Internet of things, wearable devices, mobile phones, and planetary- scale sensing there is an unprecedented need and opportunity to assimilate spatio-temporally distributed heterogeneous data streams into actionable information. Consequently the concepts like objects, scenes, and events, need to be extended to recognize situations (e.g. epidemics, traffic jams, seasons, flash mobs). This presentation motivates and computationally grounds the problem of situation recognition. It presents a systematic approach for combining multimodal real-time heterogeneous big data into actionable situations. Specifically, an approach for modeling and recognizing situations using available data streams is implemented using EventShop to model and detect situations of interest. Similar framework is applied at personal level to determine evolving personal situations. By combining personal situation and environmental situation, it is possible to connect needs of people to appropriate resources efficiently, effectively, and promptly. We will discuss this framework using some early examples.
Further Information:
Ramesh Jain is an entrepreneur, researcher, and educator.
He is a Donald Bren Professor in Information & Computer Sciences at University of California, Irvine where he is doing research in Event Web and experiential computing. Earlier he served on faculty of Georgia Tech, University of California at San Diego, The University of Michigan, Ann Arbor, Wayne State University, and Indian Institute of Technology, Kharagpur. He is a Fellow of ACM, IEEE, AAAI, IAPR, and SPIE. His current research interests are in processing massive number of geo-spatial heterogeneous data streams for building Smart Social System. He is the recipient of several awards including the ACM SIGMM Technical Achievement Award 2010.
Ramesh co-founded several companies, managed them in initial stages, and then turned them over to professional management. These companies include PRAJA, Virage, and ImageWare. Currently he is involved in Stikco and SnapViz. He has also been advisor to several other companies including some of the largest companies in media and search space.
Leo Guibas
(Stanford University)
The Space Between the Images

Please LOG IN to view the video.
Date: April 30, 2014
Description:
Multimedia content has become a ubiquitous presence on all our computing devices, spanning the gamut from live content captured by personal device sensors such as smartphone cameras to immense databases of images, audio and video stored in the cloud. As we try to maximize the utility and value of all these petabytes of content, we often do so by analyzing each piece of data individually and foregoing a deeper analysis of the relationships between the media. Yet with more and more data, there will be more and more connections and correlations, because the data captured comes from the same or similar objects, or because of particular repetitions, symmetries or other relations and self-relations that the data sources satisfy.
In this talk we focus on the “space between the images”, that is on expressing the relationships between different multimedia data. We aim to make such relationships explicit, tangible, first-class objects that themselves can be analyzed, stored, and queried — irrespective of the media they originate from. We discuss mathematical and algorithmic issues on how to represent and compute relationships or mappings between media data sets at multiple levels of detail. We also show how to analyze and leverage networks of maps and relationships, small and large, between inter-related data. The network can act as a regularizer, allowing us to to benefit from the “wisdom of the collection” in performing operations on individual data sets or in map inference between them.
Further Information:
Leonidas Guibas obtained his Ph.D. from Stanford under the supervision of Donald Knuth. His main subsequent employers were Xerox PARC, DEC/SRC, MIT, and Stanford. He is currently the Paul Pigott Professor of Computer Science (and by courtesy, Electrical Engineering) at Stanford University. He heads the Geometric Computation group and is part of the Graphics Laboratory, the AI Laboratory, the Bio-X Program, and the Institute for Computational and Mathematical Engineering. Professor Guibas’ interests span geometric data analysis, computational geometry, geometric modeling, computer graphics, computer vision, robotics, ad hoc communication and sensor networks, and discrete algorithms. Some well-known past accomplishments include the analysis of double hashing, red-black trees, the quad-edge data structure, Voronoi-Delaunay algorithms, the Earth Mover’s distance, Kinetic Data Structures (KDS), Metropolis light transport, and the Heat-Kernel Signature. Professor Guibas is an ACM Fellow, an IEEE Fellow and winner of the ACM Allen Newell award.
David Fattal
(LEIA Inc.)
Mobile Holography

Please LOG IN to view the video.
Date: April 16, 2014
Description:
The mobile computing industry is experiencing a booming development fueled by a growing worldwide demand for smartphones, tablets (and soon wearables), the availability of low-power system-on-a-chip (SoCs), and a well established supply chain in Asia. Remarkably, there is enough graphical processing power on the latest smartphones to manipulate and render multiview 3D “holographic” content on the fly. If only we had the technology to project this holographic content from a portable screen…
LEIA Inc. is a spin-off from HP Labs which aims at commercializing a disruptive display technology, a diffractive backlit LCD system allowing the rendering of holographic 3D content at video rate on a mobile platform. LEIA’s core technology resides in the surface treatment of the backlight, and otherwise utilizes a commercially available LCD panel to create animated content. It uses standard LED illumination, comes in an ultra-thin form factor, is capable of high pixel densities and large field of view and does not consume more optical power than a regular LCD screen.
In this talk, I will present some fundamental aspects of the technology and will discuss various consumer applications.
Further Information:
David is the founder and CEO of LEIA Inc, a spin-off from HP Labs aiming at commercializing a novel holographic display technology for mobile devices. He previously spent 9 years as a senior researcher in the Intelligent Infrastructure Laboratory at HP Labs, working on various aspects of quantum computing and photonics, and specializing in the manipulation of light at the nanoscale. He holds a PhD in Physics from Stanford University and a BS in theoretical physics from Ecole Polytechnique, France. David received the 2010 Pierre Faurre award for young French industrial career achievement, and was named French Innovator of the year 2013 by the MIT technology Review before featuring on the global innovator list that same year. David has 60 granted patents and co-authored the text-book “Single Photon Devices and Applications”.
For more please visit: https://www.leiainc.com/
Austin Roorda
(UC Berkeley)
Studying Human Vision One Cone at a Time

Please LOG IN to view the video.
Date: April 9, 2014
Description:
Vision scientists employ a diversity of approaches in their quest to understand human vision – from studying behavior of cells in a dish to studying responses of humans to visual stimuli. A new generation of tools are helping to bridge these two approaches. First, adaptive optics removes the blur caused by optical imperfections, offering optical access to single cells in the human retina. Second, advanced eye tracking allows us to repeatedly probe targeted retinal locations. The combined system allows us to perform psychophysics with an unprecedented level of stimulus control and localization. In this talk I will review the technology and present our latest results on human color and motion perception.
Further Information:
Austin Roorda received his Ph.D. in Vision Science & Physics from the University of Waterloo, Canada in 1996. For over 15 years, Dr. Roorda has been pioneering applications of adaptive optics, including mapping of the trichromatic cone mosaic while a postdoc at the University of Rochester, designing and building the first adaptive optics scanning laser ophthalmoscope at the University of Houston, tracking and targeting light delivery to individual cones in the human eye at UC Berkeley, and being part of the first team to use AO imaging to monitor efficacy of a treatment to slow retinal degeneration. Since January 2005, he’s been at the UC Berkeley School of Optometry where he is the current chair of the Vision Science Graduate Program. He is a Fellow of the Optical Society of America and of the Association for Research in Vision and Ophthalmology and is a recipient of the Glenn A. Fry award, the highest research honor from the American Academy of Optometry.
Bernd Girod
(Stanford University)
Mobile Visual Search - Linking the Virtual and the Physical World

Please LOG IN to view the video.
Date: April 2, 2014
Description:
Mobile devices are expected to become ubiquitous platforms for visual search and mobile augmented reality applications. For object recognition on mobile devices, a visual database is typically stored in the cloud. Hence, for a visual comparison, information must be either uploaded from, or downloaded to, the mobile over a wireless link. The response time of the system critically depends on how much information must be transferred in both directions, and efficient compression is the key to a good user experience. We review recent advances in mobile visual search, using compact feature descriptors, and show that dramatic speed-ups and power savings are possible by considering recognition, compression, and retrieval jointly. For augmented reality applications, where image matching is performed continually at video frame rates, interframe coding of SIFT descriptors achieves bit-rate reductions of 1-2 orders of magnitude relative to advanced video coding techniques. We will use real-time implementations for different example applications, such as recognition of landmarks, media covers or printed documents, to show the benefits of implementing computer vision algorithms on the mobile device, in the cloud, or both.
Further Information:
Bernd Girod is Professor of Electrical Engineering in the Information Systems Laboratory of Stanford University, California, since 1999. Previously, he was a Professor in the Electrical Engineering Department of the University of Erlangen-Nuremberg. His current research interests are in the area of networked media systems. He has published over 500 conference and journal papers and 6 books, receiving the EURASIP Signal Processing Best Paper Award in 2002, the IEEE Multimedia Communication Best Paper Award in 2007, the EURASIP Image Communication Best Paper Award in 2008, the EURASIP Signal Processing Most Cited Paper Award in 2008, as well as the EURASIP Technical Achievement Award in 2004 and the Technical Achievement Award of the IEEE Signal Processing Society in 2011. As an entrepreneur, Professor Girod has been involved in several startup ventures, among them Polycom, Vivo Software, 8×8, and RealNetworks. He received an Engineering Doctorate from University of Hannover, Germany, and an M.S. Degree from Georgia Institute of Technology. Prof. Girod is a Fellow of the IEEE, a EURASIP Fellow, and a member of the German National Academy of Sciences (Leopoldina). He currently serves Stanford’s School of Engineering as Senior Associate Dean for Online Learning and Professional Development.
Michael Kriss
(MAK Consultants)
ISO Speed for Digital Cameras: Real or Imaginary

Please LOG IN to view the video.
Date: March 12, 2014
Description:
The concept of an ISO speed for digital cameras is somewhat of a conundrum. There is an accepted ISO 12232:2006 Standard on how to measure the ISO speed for a digital camera. In fact there are three accepted measures including the Recommended Exposure Index (REI), the Standard Output Sensitivity (SOS) and the Saturation-Based technique. These measures are all based on the final output of the digital imaging system and are often confined to a specific file format (TIFF for example) and color encoding (sRGB for example). The “traditional” negative film ISO (ASA) speed was empirically defined as the exposure that gave an excellent image when printed on either color or black and white paper. The “rule of thumb” was that on a bright sunny day the camera should be set to F/16 (or F/11 for reversal slide film) with a shutter speed of 1/ISO speed. This would ensure that there are two stops under and over exposure protection. This criterion made it possible for simple cameras to always get a good picture on a bright day. The speaker will present a way to calculate the “ISO speed” speed of the sensor rather than that of the camera. The calculation will take into consideration all the physics involved in creating photoelectrons, storing them, and the degrading sources of noise present in image sensors. The calculation will draw from the film terminology, but instead of a threshold speed calculation used in film studies, a signal-to-noise (S/N) calculation will be used for sensors. The impact on sensor speed from f-stop and shutter time manipulation will be discussed. The concept allows one to just replace the film by a sensor and use the same metering systems. Higher ISO speeds are possible by just increasing the overall gain of the imaging system (after white balance) and better image processing to hold the sharpness and lower the noise.
Further Information:
Dr. Kriss received his BA(62), MS(64) and PhD(69) in Physics from the University of California at Los Angeles. He joined the Eastman Kodak Research Laboratories, Color Photography Division, in 1969 and later the Physics Division until his retirement in 1993. In his early years at Kodak, Dr, Kriss focused on color film image structure and modled and simulated the impact of chemical development on image structure and color reproduction. When he joined the Physics Division he focused on image processing of scanned and captured digital images. Dr Kriss spent three years in Japan where helped build an advanced research facility. At Kodak he headed up the Imaging Processing Laboratory and Algorithm Developing Laboratory. He joined the University of Rochester in 1993 where he was the executive director of the Center for Electronic Imaging Systems and taught through the Computer and Electrical Engineering Department. He joined Sharp Laboratories of America in 2000 where he headed the Color Imaging Group. Dr Kriss retired in 2004 but is still active as a consultant, Adjunct Professor at Portland State University, IS&T activities, and as the Editor in Chief of the Wiley-IS&T Series on Imaging Science and Technology and the forthcoming Handbook of Digital Imaging Technologies.
Daniel Palanker
(Stanford University)
Restoration of Sight with Photovoltaic Subretinal Prosthesis

Please LOG IN to view the video.
Date: March 5, 2014
Description:
Retinal degeneration leads to blindness due to gradual loss of photoreceptors. Information can be reintroduced into the visual system by patterned electrical stimulation of the remaining retinal neurons. Photovoltaic subretinal prosthesis directly converts light into pulsed electric current in each pixel, stimulating the nearby inner retinal neurons. Visual information is projected onto the retina by video goggles using pulsed near-infrared (~900nm) light.
Subretinal arrays with 70μm photovoltaic pixels provide highly localized stimulation: retinal ganglion cells respond to alternating gratings with the stripe width of a single pixel, which is half of the native resolution in healthy controls (~30μm). Similarly to normal vision, retinal response to prosthetic stimulation exhibits flicker fusion at high frequencies (20-40 Hz), adaptation to static images, and non-linear summation of subunits in the receptive fields. In rats with retinal degeneration, the photovoltaic subretinal arrays also provide visual acuity up to half of its normal level (~1 cpd), as measured by the cortical response to alternating gratings. If these results translate to human retina, such implants could restore visual acuity up to 20/250. With eye scanning and perceptual learning, human patients might even cross the 20/200 threshold of legal blindness. Ease of implantation and tiling of these wireless modules to cover a large visual field, combined with high resolution open doors to highly functional restoration of sight.
Further Information:
Daniel Palanker is an Associate Professor in the Department of Ophthalmology and in the Hansen Experimental Physics Laboratory at Stanford University. He received PhD in Applied Physics in 1994 from the Hebrew University of Jerusalem, Israel.
Dr. Palanker studies interactions of electric field with biological cells and tissues in a broad range of frequencies: from quasi-static to optical, and develops their diagnostic, therapeutic and prosthetic applications, primarily in ophthalmology. Several of his developments are in clinical practice world-wide: Pulsed Electron Avalanche Knife (PEAK PlasmaBladeTM), Patterned Scanning Laser Photocoagulator (PASCALTM), and OCT-guided Laser System for Cataract Surgery (CatalysTM). In addition to laser-tissue interactions, retinal phototherapy and associated neural plasticity Dr. Palanker is working on electro-neural interfaces, including Retinal Prosthesis, electronic control of vasculature and of the glands.
EJ Chichilnisky
(Stanford University)
Artificial retina: Design principles for a high-fidelity brain-machine interface

Please LOG IN to view the video.
Description:
The retina communicates visual information to the brain in spatio-temporal patterns of electrical activity, and these signals mediate all of our visual experience. Retinal prostheses are designed to artificially elicit activity in retinas that have been damaged by disease, with the hope of conveying useful visual information to the brain. Current devices, however, produce limited visual function. The reasons for this can be understood based on the organization of visual signals in the retina, and I will show experimental data suggesting that it is possible in principle to produce a device with exquisite spatial and temporal resolution, approaching the fidelity of the natural visual signal. These advances in interfacing to the neural circuitry of the retina may have broad implications for future brain-machine interfaces in general. I will also discuss how novel technologies may be used to optimize the use of such devices for the purpose of helping blind people see.
Further Information:
E.J. Chichilnisky received a BA in Mathematics from Princeton University and an MS in Mathematics and PhD in Neuroscience from Stanford University. He is now a Professor in the Neurosurgery Department and Hansen Experimental Physics Laboratory at Stanford. Prior to joining Stanford, EJ was a professor in the Systems Neurobiology Laboratories at the Salk Institute for Biological Studies in San Diego where he held the Ralph S and Becky O’Connor Chair.
Hendrik Lensch
(Tübingen University)
Emphasizing Depth and Motion

Please LOG IN to view the video.
Description:
Monocular displays are rather bad at transporting relative distances and velocities between objects to the observer as some of the binocular cues are missing. In our framework we use a stereo camera to first observe depth, relative distances and velocity and then modify the captured images in different ways to convey the lost information. Depth for example can be emphasized even on a monocular display using depth-of-field-rendering, local intensity or color contrast enhancement or using unsharp masking of the depth buffer. Linear motion, on the other hand can be emphasized by motion blur, streaks, rendered bursts or simply color coding the remaining distances between vehicles. These are a few ways of modifying pictures of the real world for actively controlling the user’s attention while trying to introduce only rather subtle modifications. We will present a real-time frame-work based on edge-optimized wavelets that optimizes depth estimation and emphasizes depth or motion.
Further Information:
Hendrik P. A. Lensch holds the chair for computer graphics at Tübingen University. He received his diploma in computers science from the University of Erlangen in 1999. He worked as a research associate at the computer graphics group at the Max-Planck-Institut für Informatik in Saarbrücken, Germany, and received his PhD from Saarland University in 2003. Hendrik Lensch spent two years (2004-2006) as a visiting assistant professor at Stanford University, USA, followed by a stay at the MPI Informatik as the head of an independent research group. From 2009 to 2011 he has been a full professor at the Institute for Media Informatics at Ulm University, Germany. In his career, he received the Eurographics Young Researcher Award 2005, was awarded an Emmy-Noether-Fellowship by the German Research Foundation (DFG) in 2007 and received an NVIDIA Professor Partnership Award in 2010. His research interests include 3D appearance acquisition, computational photography, global illumination and image-based rendering, and massively parallel programming.
Jon Hardeberg
(Gjøvik University College, Norway)
Next Generation Colour Printing: Beyond Flat Colors

Please LOG IN to view the video.
Description:
Colour Printing 7.0: Next Generation Multi-Channel Printing (CP7.0) is an Initial Training Network funded by EU’s Seventh Framework programme. The project addresses a significant need for research, training and innovation in the printing industry. The main objectives of this project are to train a new generation of printing scientists who will be able to assume science and technology leadership in this traditional technological sector, and to do research in the colour printing field by fully exploring the possibilities of using more than the conventional four colorants (CMYK) in printing; focusing particularly on the spectral properties. We primarily focus on four key areas of research; spectral modeling of the printer/ink/paper combination, spectral gamut prediction and gamut mapping, the effect of paper optics and surface properties on the colour reproduction of multi-channel devices, and optimal halftoning algorithms and tonal reproduction characteristics of multi-channel printing devices. Several application areas are considered, including textile and fine art.
In one part of the project, an extra dimension is added to the print in the form of relief, with the goal of controlling the angle-dependent reflection properties. For such prints, called 2.5D prints, a surface texture is created by printing multiple layers of ink on desired locations. The ongoing study will lead us to create prints with relief that closely resemble the original, with a focus on aspects such as improving the print quality, reducing the print costs and discovering new market opportunities.
In this presentation we will give an overview of the CP7.0 project, its goals, accomplishments and challenges. Furthermore as most if not all of the involved scientists-in-charge, postdoctoral researchers and PhD students will be present, it will be possible to go more in depth in the discussions following the talk. More information about the project can also be found here: http://cp70.org/
Simone Bianco
(University of Milano-Bicocca, Italy)
Adaptive illuminant estimation using faces

Please LOG IN to view the video.
Date: February 7, 2014
Description:
This talk will show that it is possible to use skin tones to estimate the illuminant color. In our approach, we use a face detector to find faces in the scene, and the corresponding skin colors to estimate the chromaticity of the illuminant. The method is based on two observations: first, skin colors tend to form a cluster in the color space, making it a cue to estimate the illuminant in the scene; second, many photographic images are portraits or contain people. If no faces are detected, the input image is processed with a low-level illuminant estimation algorithm automatically selected. The algorithm automatically switches from global to spatially varying color correction on the basis of the illuminant estimations on the different faces detected in the image. An extensive comparison with both global and local color constancy algorithms is carried out to validate the effectiveness of the proposed algorithm in terms of both statistical and perceptual significance on a large heterogeneous dataset of RAW images containing faces.PROJECT PAGE:http://www.ivl.disco.
Further Information:
Ram Narayanswamy
(Intel)
Computational Imaging Platforms for Next Generation User Experience

Please LOG IN to view the video.
Date: January 28, 2014
Description:
Cameras in mobile devices has made photography ubiquitous. It is estimated that approximately 900 Billion photos will be captured in 2014. Innovation and low cost is making cameras increasingly accessible to a wide range of consumer products. It is also enabling imaging platforms with multiple cameras. Users would like newer experiences with their photographs and videos. Furthermore they want to express themselves in novel ways using these visual media forms. The next revolution in imaging is happening at the nexus of Computational imaging, Industrial design and User experience. In this talk, we develop this theme, discuss the opportunities for innovative work and the requirements it places on the various components and the system.
Further Information:
Ram Narayanswamy currently leads an Intel effort in Computational Imaging as part of Intel Lab’s User Experience Group. He cut his teeth in imaging at NASA Langley Research Center in the Visual Image Processing lab under Fred Huck working on imaging system design and optimization. Back then he co-authored a paper titled “Characterizing digital image acquisition devices”, better known today as the “slanted-edge test” a de facto standard to measure camera MTF. Subsequently, he did his PhD at the Optoelectronics Systems Center at University of Colorado under Prof. Kristina Johnson and joined CDM Optics, a start-up which pioneered Wavefront Coding and the field of computational imaging. Upon CDM’s acquisition by OmniVision Technologies, Ram led the effort to productize Wavefront Coding for the mobile phone segment. He joined Aptina Imaging where he helped bring the world’s first performance-720p reflowable cameras modules to market – a complete-camera that ships in tape and reel! While at Aptina, Ram also led their effort in Array cameras. Ram has a PhD from the University of Colorado-Boulder, MS from the University of Virginia – Charlottesville and a BS from the National Institute of Technology – Trichy. He loves this golden age of imaging and is looking forward to ushering the platinum age.
Silvio Savarese
(Stanford University)
Perceiving the 3D World from Images

Please LOG IN to view the video.
Date: January 14, 2014
Description:
When we look at an environment such as a coffee shop, we don’t just recognize the objects in isolation, but rather perceive a rich scenery of the 3D space, its objects and all the relations among them. This allows us to effortlessly navigate through the environment, or to interact and manipulate objects in the scene with amazing precision.The past several decades of computer vision research have, on the other hand, addressed the problems of 2D object recognition and 3D space reconstruction as two independent ones. Tremendous progress have been made in both areas. However, while methods for object recognition attempt to describe the scene as a list of class labels, they often make mistakes due to the lack of a coherent understanding of the 3D spatial structure. Similarly, methods for scene 3D modeling can produce accurate metric reconstructions but cannot put the reconstructed scene into a semantically useful form.
A major line of work from my group in recent years has been to design intelligent visual models that understand the 3D world by integrating 2D and 3D cues, inspired by what humans do. In this talk I will introduce a novel paradigm whereby objects and 3D space are modeled in a joint fashion to achieve a coherent and rich interpretation of the environment. I will start by giving an overview of our research for detecting objects and determining their geometric properties such as 3D location, pose or shape. Then, I will demonstrate that these detection methods play a critical role for modeling the interplay between objects and space which, in turn, enable simultaneous semantic reasoning and 3D scene reconstruction. I will conclude this talk by demonstrating that our novel paradigm for scene understanding is potentially transformative in application areas such as autonomous or assisted navigation, robotics, automatic 3D modeling of urban environments and surveillance.
Further Information:
Silvio Savarese is an Assistant Professor of Computer Science at Stanford University. He earned his Ph.D. in Electrical Engineering from the California Institute of Technology in 2005 and was a Beckman Institute Fellow at the University of Illinois at Urbana-Champaign from 2005–2008. He joined Stanford in 2013 after being Assistant and then Associate Professor (with tenure) of Electrical and Computer Engineering at the University of Michigan, Ann Arbor, from 2008 to 2013. His research interests include computer vision, object recognition and scene understanding, shape representation and reconstruction, human activity recognition and visual psychophysics.
- Aldo Badano » A stereoscopic computational observer model
- Johnny Lee » Project Tango
- Arthur Zhang » iOptik: Contact Lens–Enabled Wearable Display
- Eli Peli » Visual Issues with Head-Mounted Displays
- Bernard Kress » Virtual Reality Headsets to Smart Glasses and Beyond
- Gordon Wetzstein » Compressive Light Field Display and Imaging Systems
- Eric Fossum » Quanta Image Sensor Concept and Progress
- Chris Bregler » Next Gen Motion Capture
- Jon Shlens » Large Scale Vision System
- Ramesh Jain » Situation Recognition
- Leo Guibas » The Space Between the Images
- David Fattal » Mobile Holography
- Austin Roorda » Human vision one cone at a time
- Bernd Girod » Mobile Visual Search
- Michael Kriss » ISO Speed for Digital Cameras
- Daniel Palanker » Photovoltaic Subretinal Prosthesis
- EJ Chichilnisky » Artificial retina
- Hendrik Lensch » Emphasizing Depth and Motion
- Jon Hardeberg » Next Generation Colour Printing
- Simone Bianco » Adaptive illuminant estimation
- Ram Narayanswamy » Computational Imaging Platforms
- Silvio Savarese » Perceiving the 3D World
SCIEN Colloquia 2013
Kirk Martinez
(University of Southampton, England)
Reflectance Transformation Imaging of Cultural Heritage Objects

Please LOG IN to view the video.
Date: December 13, 2013
Description:
The imaging of cultural heritage objects has helped to drive many imaging developments. This talk will briefly round-up personal experiences of building high resolution, colorimetric and 3D object systems during five large European projects. Recently there has been a growing interest in reflectance transformation imaging, where systems with many light positions are used to create images which can be viewed with varying light angles. This has proven to be useful in the study of archaeological objects with subtle surface textures which are not rendered well with a single image. Several “dome” based systems have been made using high power white LEDs and off the shelf digital SLR cameras have been produced for campaigns to image clay tablets. The designs and results will be discussed with examples from the Ashmolean Museum in Oxford.
Further Information:
Kirk Martinez is a Reader in Electronics and Computer Science, University of Southampton. He has a PhD in Image Processing from the University of Essex. He previously ran the MA in Computer Applications for History of Art in Birkbeck College London while working on a variety of European imaging projects. This included VASARI (High resolution colorimetric imaging of art), MARC (image and print), ACOHIR (3D objects), Viseum (IIPimage viewer) projects. He went on to Content-based retrieval and semantic web applications for museums (Artiste, SCULPTEUR, eCHASE). He now mainly works on Sensor networks for the environment: Glacsweb and Internet of Things. He founded the VIPS image processing library and co-designed RTI imaging systems as part of an AHRC project.
Kartik Venkataraman
(Pelican Imaging)
High Performance Camera Arrays for Light Field Imaging

Please LOG IN to view the video.
Date: December 10, 2013
Description:
Light field imaging with Camera arrays have been explored extensively in academia and have been been used to showcase applications such as view point synthesis, synthetic refocus, computing range images, and capturing high speed video among others. However, none of the prior approaches have addressed the modifications needed to achieve the small form factor and image quality required to make them viable for mobile devices and consumer imaging. In our approach, we customize many aspects of the camera array including lenses, pixels, and software algorithms to achieve the imaging performance required for consumer imaging. We analyze the performance of camera arrays and establish scaling laws that allow one to predict the performance of such systems with varying system parameters. A key advantage of this architecture is that it captures depth. The technology is passive, supports both stills and video, and is low light capable. In addition, we explore extending the capabilities of the depth map through regularization to support applications such as fast user-guided matting.
Further Information:
Kartik Venkataraman has over 20 years experience working with technology companies in Silicon Valley. Prior to founding Pelican Imaging, Kartik headed Computational Cameras at Micron Imaging (Aptina). He spearheaded the design of Extended Depth of Field (EDOF) imaging systems for the mobile camera market. As Manager of the Camera & Scene Modeling group, Kartik¹s end-to-end simulation environment for camera system architecture and module simulations has been adopted in parts of the mobile imaging ecosystem. Previously at Intel, Kartik was principally associated with investigating medical imaging and visualization between Johns Hopkins Medical School and the Institute of Systems Science in Singapore. His interests include Image Processing, Computer Graphics and Visualization, Computer Architectures, and Medical Imaging. Venkataraman founded Pelican Imaging in 2008. Venkataraman received his Ph.D. in Computer Science from University of California, Santa Cruz, MS in Computer Engineering from University of Massachusetts, Amherst, and B.Tech (Honors) in Electrical Engineering from the Indian Institute of Technology, Kharagpur.
Peter Clark & Tigran Galstian
(LensVector Inc)
How the collective behavior of molecules improves mobile imaging

Please LOG IN to view the video.
Date: November 19, 2013
Description:
LensVector Inc, is a Sunnyvale company producing a new generation of tunable optical elements based on spatial alignment of molecules enabling the fabrication of electrically variable “molecular” lenses and prisms. Those components are suitable for a number of applications, including providing autofocus capability for miniature digital cameras using no moving parts. We will describe the basic technology behind the LensVector products and some design considerations for camera applications.
Further Information:
Tigran V. Galstian, CTO, LensVector Inc., is involved in the area of academic research on optical materials and components as well as their applications in imaging, telecommunication and energy.
Peter P. Clark, VP Optics, LensVector Inc., does optical system design and engineering. Before joining LensVector, he was with Flextronics, Polaroid, Honeywell, and American Optical. He is a Fellow of the Optical Society of America.
Ramakrishna Kakarala
(Nanyang Technological University)
What parts of a shape are discriminative?

Please LOG IN to view the video.
Date: 11/05/2013
Description:
What is distinctive about the shape of an apple? The answer likely depends on comparison to similar shapes. The reason to study this question is that shape is a distinguishing feature of objects, and is therefore useful for object recognition in computer vision. Though shape is useful, when objects have similar overall shapes, discriminating among them becomes difficult; successful object recognition calls for the identification of important parts of the shapes. In this talk we introduce the concept of discriminative parts and propose a method to identify them. We show how we can assign levels of importance to different regions of contours, based on their discriminative potential. Our experiments show that the method is promising and can identify semantically meaningful segments as being important ones. We place our work in context by reviewing the related work on saliency.
Further Information:
Ramakrishna Kakarala is an Associate Professor in the School of Computer Engineering at the Nanyang Technological University (NTU) in Singapore. He has worked in both academia and industry; prior to joining NTU, he spent 8 years at Agilent Laboratories in Palo Alto, and at Avago Technologies in San Jose. He received the Ph.D. in Mathematics at UC Irvine, after completing a B.Sc. in Computer Engineering at the University of Michigan. Two of his students have recently won awards: the BAE Systems award at EI 2012, and the Best Student Paper award at ICIP 2013. The latter award went to Vittal Premachandran, whose Ph.D. thesis work at NTU is the basis of this talk.
Marc Levoy
(Stanford University)
What Google Glass means for the future of photography

Please LOG IN to view the video.
Date: 10/15/2013
Description:
Although head-mounted cameras (and displays) are not new, Google Glass has the potential to make these devices commonplace. This has implications for the practice, art, and uses of photography. So what’s different about doing photography with Glass? First, Glass doesn’t work like a conventional camera; it’s hands-free, point-of-view, always available, and instantly trigger able. Second, Glass facilitates different uses than a conventional camera: recording documents, making visual todo lists, logging your life, and swapping eyes with other Glass users. Third, Glass will be an open platform, unlike most cameras. This is not easy, because Glass is a heterogeneous computing platform, with multiple processors having different performance, efficiency, and programmability. The challenge is to invent software abstractions that allow control over the camera as well as access to these specialized processors. Finally, devices like Glass that are head-mounted and perform computational photography in real time have the potential to give wearers “superhero vision”, like seeing in the dark, or magnifying subtle motion or changes. If such devices can also perform computer vision in real time and are connected to the cloud, then they can do face recognition, live language translation, and information recall. The hard part is not imagining these capabilities, but deciding which ones are feasible, useful, and socially acceptable.
Further Information:
Marc Levoy is the VMware Founders Professor of Computer Science at Stanford University, with a joint appointment in the Department of Electrical Engineering. He received a Bachelor’s and Master’s in Architecture from Cornell University in 1976 and 1978, and a PhD in Computer Science from the University of North Carolina at Chapel Hill in 1989.
Douglas Lanman
(NVIDIA Research)
Near-Eye Light Field Displays

Please LOG IN to view the video.
Date: 10/9/2013
Description:
Near-eye displays project images directly into a viewer’s eye, encompassing both head-mounted displays (HMDs) and electronic viewfinders. Such displays confront a fundamental problem: the unaided human eye cannot accommodate (focus) on objects placed in close proximity. This talk introduces a light-field-based approach to near-eye display that allows for dramatically thinner and lighter HMDs capable of depicting accurate accommodation, convergence, and binocular-disparity depth cues. Such near-eye light field displays depict sharp images from out-of-focus display elements by synthesizing light fields that correspond to virtual scenes located within the viewer’s natural accommodation range. Building on related integral imaging displays and microlens-based light-field cameras, we optimize performance in the context of near-eye viewing. Near-eye light field displays support continuous accommodation of the eye throughout a finite depth of field; as a result, binocular configurations provide a means to address the accommodation convergence conflict that occurs with existing stereoscopic displays. This talk will conclude with a demonstration featuring a binocular OLED-based prototype and a GPU-accelerated stereoscopic light field renderer.
Further Information:
Douglas Lanman works in the Computer Graphics and New User Experiences groups within NVIDIA Research. His research is focused on computational imaging and display systems, including head-mounted displays (HMDs), automultiscopic (glasses-free) 3D displays, light field cameras, and active illumination for 3D reconstruction. He received a B.S. in Applied Physics with Honors from Caltech in 2002 and M.S. and Ph.D. degrees in Electrical Engineering from Brown University in 2006 and 2010, respectively.
Boyd Fowler
(Google)
Highlights of the International Image Sensor Workshop

Please LOG IN to view the video.
Date: 10/01/2013
Description:
In this paper we present the latest trends in image sensors. This includes developments in CMOS image sensors, CCDs, and single photon avalanche photodiode (SPAD) image sensors.
Further Information:
Boyd Fowler was born in California in 1965. He received his M.S.E.E. and Ph.D. degrees from Stanford University in 1990 and 1995 respectively. After finishing his Ph.D. he stayed at Stanford University as a research associate in the Electrical Engineering Information Systems Laboratory until 1998. In 1998 he founded Pixel Devices International in Sunnyvale California. After selling Pixel Devices to Agilent technologies, he served as the advanced development manager in the Sensor Solutions Division (SSD) between 2003 and 2005. Between 2005 and 2013 he was the CTO and VP of Technology at Fairchild Imaging/BAE Systems Imaging Solutions. Currently he a technical program manager at Google in Mountain View California. He has authored numerous technical papers and patents. He’s current research interests include CMOS image sensors, low noise image sensors, noise analysis, low power image sensors, data compression, and image processing.
Patrick Gill
(Rambus Labs)
Ultra-miniature lensless diffractive computational imagers

Please LOG IN to view the video.
Date: 9/24/2013
Description:
Rambus Labs is developing a new class of computational optical sensors and imagers that do not require traditional focusing. We have recently built our first proof-of-concept lensless imagers that exploit spiral phase anti-symmetric diffraction gratings. These gratings produce diffraction patterns on a photodiode array below, and the diffraction patterns contain information about the faraway scene sufficient to reconstruct the scene without ever having to focus incident light. Image resolution, sensor size, low-light performance and wavelength robustness are all improved over previous diffractive computational imagers.
Further Information:
Patrick R. Gill was a national champion of both mathematics and physics contests in Canada prior to conducting his doctoral work in sensory neuroscience at the University of California at Berkeley (Ph.D. awarded in 2007). He conducted postdoctoral research at Cornell University and the University of Toronto before joining the Computational Sensing and Imaging (CSI) group at Rambus Labs in 2012. He is best known in the optics community for his lead role in inventing the planar Fourier capture array at Cornell University, and he was awarded the Best Early Career Research Paper Award at the 2013 Optical Society of America meeting on Computational Sensing and Imaging.
David Cardinal
(Cardinal Photo)
Photography: The Big Picture - Current innovations in cameras, sensors and optics

Please LOG IN to view the video.
Date: 6/11/2013
Description:
An update on the state of camera technologies and how they are playing out in the marketplace. Being heads-down in one area of technology, it can be easy to lose track of others, even those which might be relevant. A big part of the role of technology journalists like David Cardinal is to look at the big picture and provide this type of context. No area of technology has seen more rapid progress than digital photography. A broad array of new technologies, companies, and products is attacking imaging problems from every direction. David’s unique perspective as both an award-winning pro photographer and veteran tech journalist writing about imaging technologies allows him to speak about many of these developments in the context of how they are faring in the market and where the industry is likely to go from here. In particular, some of the developments in smartphone imagers, mirrorless imaging, and new imaging form factors like Google Glass will be discussed. The goal is also to have a highly interactive presentation, as many of you in the audience are top researchers in these areas and will want to share your own perspectives.
Further Information:
Website: http://www.cardinalphoto.com
Facebook: http://facebook.com/CardinalPhoto
Twitter: http://twitter.com/DavidCardinal
Tibor Balogh
(CEO - Holografika)
Light Field Display Architectures

Please LOG IN to view the video.
Date: 5/30/2013
Description:
Holografika multi-projector display systems provide high quality hologram-like horizontal-parallax automultiscopic viewing for multiple observers without the need for tracking or restrictive viewing positions. They range in size from TV-box-set to wall-filling life size. Unlike other glasses-free 3D displays offering few perspectives, HoloVizio presents a near-continuous view of the 3D light field free of inversions, discontinuities, repeated views, or their accompanying discomfort – and all this with a near 180 degree region of viewing. The presentation will highlight design considerations and performance issues, and cover aspects for interactive use.
Further Information:
The latest Holografika display– the HoloVizio 80WLT with a 30″ diagonal – was on site and available for viewing. A 2D video showing this 3D system is available here: http://www.youtube.com/
Roland Angst
(Stanford University)
Geometry and Semantics in Computer Vision

Please LOG IN to view the video.
Date: 5/28/2013
Description:
Recent developments in computer vision have led to well-established pipelines for fully automated image- or video-based scene reconstructions. While we have seen progress in 2D scene understanding, 3D reconstructions and scene understanding have evolved independently. Hence, a major trend in computer vision is currently the development of more holistic views which combine scene understanding and 3D reconstructions in a joint, more robust and accurate framework (3D scene understanding).
In my talk, I will present two recent, but entirely different approaches how to build upon such geometric concepts in order to extract scene semantics. Using a structure-from-motion pipeline and a low-rank factorization technique, the first approach analyzes the motion of rigid parts in order to extract motion constraints between those rigid parts. Such motion constraints reveal valuable information about the functionality of an object, eg. a rolling motion parallel to a plane is likely due to a wheel. The second approach combines appearance-based superpixel classifiers with class-specific 3D shape priors for joint 3D scene reconstruction and class segmentation. The appearance based classifiers guide the selection of an appropriate 3D shape regularization term whereas the 3D reconstruction in turn helps in the class segmentation task. We will see how this can be formulated as a convex optimization problem over labels of a volumetric voxel grid.
Audrey Ellerbee
(Stanford University)
New Designs for the Next Generation Optical Coherence Tomography Systems

Please LOG IN to view the video.
Date: 5/14/2013
Description:
In its nearly 25-year history, optical coherence tomography (OCT) has gained appreciable notoriety as a technology capable of non-invasively imaging the microstructure of biological tissue. The rapid penetration of OCT into the clinical market has both been driven by and itself stimulated by new technological advances in the field, with the end result of a highly informative, patient-friendly diagnostic modality. Recently, our own group has taken a systematic approach to re-engineering the traditional OCT system both to understand and to overcome some of the known limitations of current designs. In this talk, I provide a perspective on historical developments in the field, our recent contributions, and new application areas.
Jitendra Malik
(Berkeley)
The Three R's of Computer Vision: Recognition, Reconstruction and Reorganization

Please LOG IN to view the video.
Date: 5/8/2013
Description:
Over the last two decades, we have seen remarkable progress in computer vision with demonstration of capabilities such as face detection, handwritten digit recognition, reconstructing three-dimensional models of cities, automated monitoring of activities, segmenting out organs or tissues in biological images, and sensing for control of robots and cars. Yet there are many problems where computers still perform significantly below human perception. For example, in the recent PASCAL benchmark challenge on visual object detection, the average precision for most 3D object categories was under 50%.
I will argue that further progress on the classic problems of computational vision: recognition, reconstruction and re-organization requires us to study the interaction among these processes. For example recognition of 3d objects benefits from a preliminary reconstruction of 3d structure, instead of just treating it as a 2d pattern classification problem. Recognition is also reciprocally linked to reorganization, with bottom up grouping processes generating candidates, which with top-down activations of object and part detectors. In this talk, I will show some of the progress we have made towards the goal of a unified framework for the 3 R’s of computer vision. I will also point towards some of the exciting applications we may expect over the next decade as computer vision starts to deliver on even more of its grand promise.
Mark Schnitzer
(Stanford University)
Visualizing the neuronal orchestra: Imaging the dynamics of large-scale neural ensembles in freely behaving mice

Please LOG IN to view the video.
Date: 4/23/2013
Description:
A longstanding challenge in neuroscience is to understand how populations of individual neurons and glia contribute to animal behavior and brain disease. Addressing this challenge has been difficult partly due to lack of appropriate brain imaging technology for visualizing cellular properties in awake behaving animals. I will describe a miniaturized, integrated fluorescence microscope for imaging cellular dynamics in the brains of freely behaving mice. The microscope also allows time-lapse imaging, for watching how individual cells’ coding properties evolve over weeks. By using the integrated microscope to perform calcium-imaging in behaving mice as they repeatedly explored a familiar environment, we tracked the place fields of thousands of CA1 hippocampal neurons over weeks. Spatial coding was highly dynamic, for on each day the neural representation of this environment involved a unique subset of neurons. Yet, the cells within the ~15–25% overlap between any two of these subsets retained the same place fields, and though this overlap was also dynamic it sufficed to preserve a stable and accurate ensemble representation of space across weeks. This study in CA1 illustrates the types of time-lapse studies on memory, ensemble neural dynamics, and coding that will now be possible in multiple brain regions of behaving rodents…
Thomas Vogelsang
(Rambus)
Binary Pixel and High Dynamic Range Imaging

Please LOG IN to view the video.
Date: 4/16/2013
Description:
Modern image sensors, especially in devices like mobile phones, are using smaller and smaller pixels. The resulting reduced full well capacity limits the possible dynamic range significantly. Spatial and temporal oversampling based on the concept of binary pixels is a way to overcome this limitation. It is an approach to achieve high dynamic range imaging in a single exposure using small pixels.The concept of binary pixels has been initially proposed by Eric Fossum now at Dartmouth College. Feng Yang et al. of EPFL have developed the first theory of binary oversampling based on photon statistics. We have developed these initial proposals further to enable feasibility in today’s image sensor technology while maintaining the theoretical understanding based on photon statistics which has been key to the development of this technology.In this talk I will describe the theory behind binary pixel and show how to achieve good low light sensitivity together with high dynamic range. Then I will give examples how this technology can be implemented using today’s technology and show a roadmap to take advantage of further developments of image sensor technology.
Ricardo Motta
(NVIDIA)
NVIDIA's Mobile Computational Photography Architecture

Please LOG IN to view the video.
Date: 3/9/2013
Description:
Chimera, NVIDIA’s computational photography architecture, was created to increase the flexibility of camera image processing pipelines and support new capture methods. This flexibility has become essential for the support of new sensors, including multi-camera and plenoptic methods, for many new bayer space algorithms, such as HDR and non-local means noise reduction, as well for post-demosaic algorithms such as image merging and local tone mapping. Chimera achieves this flexibility by creating a simplified framework to mix and match the ISP, the GPU and the CPU, allowing, for example, the GPU to be used in bayer space before the ISP for noise reduction, and the CPU after the ISP for scaling. In this talk we will discuss how the evolution of sensors and capture methods are shaping mobile imaging, and provide an overview of Chimera and the Always-on-HDR method.
Shailendra Mathur
(AVID)
A Post-Production Framework for Light-field Based Story-telling

Please LOG IN to view the video.
Date: 3/22/2013
Description:
A strong trend in the media industry is the acquisition of higher data resolutions and broader ranges in color, spatial and temporal domains. Another associated trend is the acquisition of multiple views over the same domains. Media may be captured from multiple devices, a single device with high resolution/range capabilities or, a synthetic source such as a CGI scene. Video productions are evolving from working with a single view on to the scene to capturing as much information as possible. The editor and story-teller is the winner in this world, since they can “super-sample the world and edit it later”. They can extracting the views required for the story they are telling after acquisition has taken place. The role of traditional editing and data management tools becomes challenging to support this multi-view and multi-resolution data sets. Well known editorial and data management techniques have to be applied to multiple media samples of the same scene for different ranges and resolutions in the temporal, color and spatial domains. This talk presents how concepts from the area of Light Fields have been used to for an editorial and data management system to work with these multi-view sources. A proposed data model and run-time system shows how media from various views can be grouped in a way that they appear as a single light field source to the editing and data management functions. The goal of the talk is to encourage a discussion on how this post-production framework can be used make productions with light field acquisition a reality.
Andrew Watson
(NASA Ames Research Center)
High Frame Rate Movies and Human Vision

Please LOG IN to view the video.
Date: 3/19/2013
Description:
“High Frame Rate” is the new big thing in Hollywood. Directors James Cameron and Douglas Trumbull extoll its virtues. Peter Jackson’s movie “The Hobbitt,” filmed at 48 Hz, twice the industry standard, has recently been released. In this talk I will provide a scientific look at the role of frame rate in the visual quality of movies. My approach is to represent the moving image, and the various steps involved in its capture, processing and display, in the spatio-temporal frequency domain. This transformation exposes the various artifacts generated in the process, and makes it possible to predict their visibility. This prediction is accomplished by means of a tool we call the Window of Visibility, that is a simplified representation of the human spatio-temporal contrast sensitivity function. With the aid of the Window of Visibility, the movie-making process, including frame rate, can be optimized for the eye of the beholder.
David Tobie
(datacolor)
Why Color Management is Hard: Lessons from the Trenches

Please LOG IN to view the video.
Date: 02/19/2013
Description:
Color Management is often considered a science, but has many applied aspects, and even a few artistic functions. This lecture will compare and contrast affordable color management’s practical commercial side with the science behind it, and the art its used to create. Examples from the product line David Tobie was worked to create will provide opportunities to describe the relationship of these products to color science, optics, and industry standards, as well as mass production, fast changing commercial markets, and demanding but non-scientific end users.
Further Information:
David is currently Global Product Technology Manager at Datacolor, where he develops new products and features for their Spyder <spyder.datacolor.com> line of calibration tools. His work has received a long line of digital imaging product awards including the coveted TIPA award, and a nomination for the DesignPreis. He was recognized by Microsoft as an MVP in Print and Imaging, 2007-2010. Much of David’s recent writing can be found at his photography blog: cdtobie.wordpress.com, and his samples of his photography can be seen at: cdtobie.com.
Harlyn Baker
Capture Considerations for Multi-view Panoramic Cameras

Please LOG IN to view the video.
Date: 2/12/2013
Description:
This presentation will discuss methods and insights from a lab/business-unit collaborative project in multi-view imaging and display. Our goal was to better understand the potential of new immersive technologies by developing demonstrations and experiments that we performed within the context of corporate customer interests. The demonstrations centered around using multi-viewpoint capture and binocular and multiscopic display for immersive — very large scale — 3D entertainment experiences. Image acquisition is with our wide-VGA and high-definition Herodion multi-camera capture system. 3D display at a very large scale used tiled projection systems with polarization permitting binocular stereo viewing and, at a smaller scale, multi pico-type projectors presenting autostereo viewing.
Lingfei Ming
(RICOH)
System Model and Performance Evaluation of Spectrally Coded Plenoptic Cameras

Please LOG IN to view the video.
Date: 1/15/2013
Description:
Plenoptic camera architectures are designed to capture a 4D light field of the scene and have been used for different applications, such as digital refocusing and depth estimation. This plenoptic architecture can also be modified to collect multispectral images in a single snapshot by inserting a filter array in the pupil plane of the main lens. In this talk I will first introduce an end-to-end imaging system model for a spectral coded plenoptic camera. I will then present our prototype, which was developed based on a modified DSLR camera containing a microlens array on the sensor and a filter array in the main lens. Finally, I will show results based on both simulations and measurements obtained from the prototype.
Further Information:
Coming soon…
- Kirk Martinez » Reflectance Transformation Imaging
- Kartik Venkataraman » High Performance Camera Arrays
- Peter Clark & Tigran Galstian » How molecules improve mobile imaging
- Ramakrishna Kakarala » What parts of a shape are discriminative?
- Marc Levoy » What Google Glass means ...
- Douglas Lanman » Near-Eye Light Field Displays
- Boyd Fowler » International Image Sensor Workshop
- Patrick Gill » Ultra-miniature computational imagers
- David Cardinal » Photography: The Big Picture
- Tibor Balogh » Light Field Display Architectures
- Roland Angst » Geometry and Semantics in Computer Vision
- Audrey Ellerbee » Optical Coherence Tomography Systems
- Jitendra Malik » The Three R's of Computer Vision
- Mark Schnitzer » Visualizing the neuronal orchestra
- Thomas Vogelsang » Binary Pixel and High Dynamic Range Imaging
- Ricardo Motta » NVIDIA's Mobile Computational Photography Architecture
- Shailendra Mathur » Light-field Based Story-telling
- Andrew Watson » High Frame Rate Movies and Human Vision
- David Tobie » Why Color Management is Hard
- Harlyn Baker » Multi-view Panoramic Cameras
- Lingfei Ming » Spectrally Coded Plenoptic Cameras
SCIEN Colloquia 2012
Laura Waller
(University of California, Berkeley)
Phase imaging with partially coherent light

Please LOG IN to view the video.
Date: 12/11/2012
Description:
This talk will describe computational phase imaging methods based on intensity transport, with a focus on imaging systems using partially coherent illumination (e.g. optical and X-ray microscopes). Knowledge of propagation dynamics allows quantitative recovery of wavefront shape with very little hardware modification. The effect of spatial coherence in typical and ‘coherence engineered’ systems will be explored. All of these techniques use partially coherent light, whose wave-fields are inherently richer than coherent (laser) light, having many more degrees-of-freedom. Measurement and control of such high dimensional beams will allow new applications in bioimaging and metrology, as well as bring new challenges for efficient algorithm design.
Shih-Fu Chang
(Columbia University)
Recent Advances of Compact Hashing for Large-Scale Visual Search

Please LOG IN to view the video.
Date: 12/05/2012
Description:
Finding nearest neighbor data in high-dimensional spaces is a common yet challenging task in many applications, such as stereo vision, image retrieval, and large graph construction. [ … ] Recent advances in locality sensitive hashing show promises by hashing high-dimensional features into a small number of bits while preserving proximity in the original feature space. [ … ] In this talk, I will first survey a few recent methods that extends basic hashing methods to incorporate labeled information through supervised and semi-supervised hashing, employ hyperplane hashing for finding nearest points to subspaces (e.g., planes), and demonstrate the practical utility of compact hashing methods in solving several challenging problems of large-scale mobile visual search – low bandwidth, limited processing power on mobile devices, and needs of searching large databases on servers. Finally, we study the fundamental questions of high-dimensional search – how is nearest neighbor search performance affected by data size, dimension, and sparsity; can we predict the performance of hashing methods over a data set before its implementation? (joint work with Junfeng He, Sanjiv Kumar, Wei Liu, and Jun Wang)
Scott Daly & Timo Kunkel
(Dolby Laboratories)
Viewer preference statistics for shadow, diffuse, specular, and emissive luminance of high dynamic range displays

Please LOG IN to view the video.
Description:
A subjective study was performed to find minimum and maximum display luminances based on viewer preferences. The motivation was to find values based on real-world structured image content, as opposed to geometric test patterns commonly used in the literature, and to find values relevant for display of video content. The test images were specifically designed, both in scene set-up and in capture techniques (i.e., HDR multiple exposure merging), to test these limits without the usual perceptual conflicts of contrast induction, the Stevens effect, the Hunt effect, contrast/sharpness interactions. The display was designed to render imagery at extreme ranges of luminance and high contrast to avoid the usual elevation of black level with increasing brightness, ranging from 0.004 to 20,000 cd/m2. The image signals were captured, represented, and processed to avoid the common unintended signal distortions of clipping, contrast reduction, and tonescale shape changes. The image range was broken into diffuse reflective and highlight regions. Preferences were studied as opposed to detection thresholds, to provide results more directly relevant to viewers of media. Statistics of the preferences will be described, as opposed to solely reporting mean and standard deviation values. As a result, we believe these results are robust to future hardware capabilities in displays.
Peter Vajda
(Stanford University)
Personalized TeleVision News

Please LOG IN to view the video.
Date: 02/11/20122012
Description:
In this presentation we demonstrate a platform for personalized television news to replace the traditional one-broadcast-fits-all model. We forecast that next-generation video news consumption will be more personalized, device agnostic, and pooled from many different information sources. The technology for our research represents a major step in this direction, providing each viewer with a personalized newscast with stories that matter most to them. We believe that such a model can provide a vastly superior user experience and provide fine-grained analytics to content providers. While personalized viewing is increasingly popular for text-based news, personalized real-time video news streams are a critically missing technology.
Hiroshi Shimamoto
(NHK )
120 Hz-frame-rate SUPER Hi-VISION Capture and Display Devices

Please LOG IN to view the video.
Date: 10/30/2012
Description:
In this talk I will describe world’s first 120-Hz SUPER Hi-VISION devices that we have developed. One is a 120-Hz SUPER Hi-VISION image-capture device that uses three 120-Hz, 33-megapixel CMOS image sensors. The sensor uses 12-bit ADCs and operates at a data rate of 51.2 Gbit/s. Our unique ADC technology realized the high-speed operation and low-power consumption at the same time. The second is a 120-Hz SUPER Hi-VISION display system that uses three 8-megapixel LCOS chips and e-shift technology. These 120-Hz SUPER Hi-VISION devices were exhibited at our open house in May and at IBC2012 in September 2012 and showed a superb picture quality with less motion blur.
Hamid Aghajan
(Stanford University)
Camera Networks for Ambient Intelligence: Personal Recommendations via Behavior Modeling

Please LOG IN to view the video.
Date: 10/16/2012
Description:
Vision offers rich information about events involving human activities in applications from gesture recognition to occupancy reasoning. Multi-camera vision allows for applications based on 3D perception and reconstruction, offers improved accuracy through feature and decision fusion, and provides access to different attributes of the observed events through camera task assignment. However, the inherent complexities in vision processing stemming from perspective view and occlusions, as well as setup and calibration requirements have challenged the creation of meaningful applications that can operate in uncontrolled environments. Moreover, the task of studying user acceptance criteria such as privacy management and the implications in visual ambient communication has for the most part stayed out of the realm of technology design, further hindering the roll-out of vision-based applications in spite of the available sensing, processing, and networking technologies.
Further Information:
- Laura Waller » Phase imaging with partially coherent light
- Shih-Fu Chang » Compact Hashing for Large-Scale Visual Search
- Scott Daly & Timo Kunkel » High dynamic range displays
- Peter Vajda » Personalized TeleVision News
- Hiroshi Shimamoto » SUPER Hi-VISION Capture and Display Devices
- Hamid Aghajan » Camera Networks for Ambient Intelligence