Silicon Valley May 8-11, 2017

Schedule Planner

Print
Download PDF
 
  • List View

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |

PANEL

Presentation
Details

S7526 - Scaling Deep Learning on High-Performance Computers for Use in Scientific Workloads

Fernanda Foertter HPC User Assistance and Outreach Group, Oak Ridge National Laboratory
Highly-Rated Speaker
Fernanda Foertter is a member of the User Assistance Team at the National Center for Computational Sciences (NCCS) located at Oak Ridge National Laboratory (ORNL). This team is responsible for assisting all users at the Oak Ridge Leadership Computing Facility (OLCF). Fernanda is responsible for the training program at the center and represents OLCF at both the OpenACC and OpenMP organizations.
Jack Wells Director of Science, Oak Ridge Leadership Computing Facility
Jack Wells is the Director of Science for the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science national user facility, and the Titan supercomputer, located at Oak Ridge National Laboratory (ORNL). Wells is responsible for the scientific outcomes of the OLCF's user programs. Jack has previously lead both ORNL's Computational Materials Sciences group in the Computer Science and Mathematics Division and the Nanomaterials Theory Institute in the Center for Nanophase Materials Sciences. Prior to joining ORNL as a Wigner Fellow in 1997, Wells was a postdoctoral fellow within the Institute for Theoretical Atomic and Molecular Physics at the Harvard-Smithsonian Center for Astrophysics. Jack has a Ph.D. in physics from Vanderbilt University, and has authored or co-authored over 80 scientific papers and edited 1 book, spanning nanoscience, materials science and engineering, nuclear and atomic physics computational science, applied mathematics, and text-based data analytics.
Steven Young Research Scientist in Deep Learning, Oak Ridge National Laboratory
Steven Young is a researcher at Oak Ridge National Laboratory working in the Computational Data Analytics Group. His research focuses on applying deep learning to challenging datasets using HPC to enable faster training and quicker discovery. He has a Ph.D. in computer engineering from the University of Tennessee, where he studied machine learning in the Machine Intelligence Lab.
William Tang Principal Research Physicist, Princeton University
William Tang of Princeton University is principal research physicist at the Princeton Plasma Physics Laboratory for which he served as chief scientist (1997-2009) and is currently lecturer with rank and title of professor in astrophysical sciences, and member of the executive board for the Princeton Institute for Computational Science and Engineering, which he helped establish and served as associate director (2003-2009). William is internationally recognized for expertise in the mathematical formalism and associated computational applications dealing with electromagnetic kinetic plasma behavior in complex geometries -- with over 200 publications with more than 150 peer-reviewed papers and an "h-index" or "impact factor" of 44 on the Web of Science, including well over 7,000 total citations. William has taught for over 30 years and has supervised numerous Ph.D. students, including recipients of the Presidential Early Career Award for Scientists and Engineers in 2000 and 2005. He is also head of the Intel Parallel Computing Center at the Princeton Institute for Computational Science & Engineering at Princeton University.
Daniel George Scientist, University of Illinois at Urbana-Champaign, National Center for Supercomputing Applications
Daniel George is a Ph.D. student in astronomy, pursuing the computational science and engineering concentration, at the University of Illinois at Urbana-Champaign. He obtained his bachelor's degree in engineering physics from IIT Bombay. He is currently a research assistant in the Gravity Group at the National Center for Supercomputing Applications and a member of the LIGO collaboration working at the interface of deep learning, high performance computing, and gravitational wave and multimessenger astrophysics. His long-term interests lie in applying cutting-edge computer science and technology, especially machine learning and artificial intelligence, to accelerate discoveries in the fundamental sciences.

Deep learning has become a popular tool for insight on problems where deterministic models don't yet exist. Recent development of deep learning frameworks using GPUs has allowed the application of deep learning to problems where fast solutions are required. The scientific community has traditionally sought to develop deterministic models to describe physical phenomena, using highly scalable systems to simulate problems with ever increasing fidelity. While many science domains have developed robust predictive methods, there are still problems lacking models that can describe observed phenomena. In many of these cases, the problem may contain unknown variables, or be fundamentally hard to solve, where the simulation cannot fully predict observations. These areas include biological systems, chaotic systems, and medical research. There are also fields where a priori models do exist, but surveying the parameter space through simulation of large datasets would have very long time-to-solutions. These areas include instrument data analysis and materials by design. We'll explore how the scientific community is using deep learning to conduct leading-edge research outside of traditional modeling techniques. We'll also explore opportunities and obstacles to scaling deep learning workloads on high performance computing systems.

Level:
Type: Panel
Tags: HPC and Supercomputing; Deep Learning and AI
Industry Segments: Higher Education / Research

Day: TBD
Time: TBD
Location: TBD

Panel

TALK

Presentation
Details

S7105 - ADAS Challenges: GPU Scheduling and Synchronization

Venugopala Madumbu Software Architect, NVIDIA
Venugopala Madumbu is a software architect in the Automotive business at NVIDIA. He manages the requirements, software architecture, use case, and performance modeling for ADAS and AutoPilot programs.

Learn how the GPU schedules different work and how it poses challenges to ADAS systems where some functionalities are expected to be executed with deterministic manner and even prioritized and synchronized with different functionalities involved with GPU. We'll discuss the preemption feature in different GPU architectures and also introduce two different approaches for achieving deterministic and priority execution of different GPU functionalities.

Level: Intermediate
Type: Talk
Tags: Self-Driving Cars

Day: TBD
Time: TBD
Location: TBD

S7107 - Improving Patient Care Using EchoPixel's Interactive Virtual Reality Technology

Janet Goldenstein Lead Engineer, EchoPixel
Janet Goldenstein is lead developer at EchoPixel. Previously, Janet worked at Elekta and General Electric Healthcare. She is an expert in medical imaging and image processing, and received her Ph.D. in biomedical engineering from UC Berkeley.

Get the latest information on how virtual reality is being used to change healthcare outcomes. EchoPixel, a company focused on VR in healthcare, has developed the True 3D Viewer, a real-time, interactive VR platform. It offers physicians an unprecedented opportunity to view and interact with patient tissues and organs in an open 3D space as if they were real, physical objects. The resulting improvement in clinical efficacy and workflow has had a significant positive impact on patient care.

Level: All
Type: Talk
Tags: Virtual Reality and Augmented Reality; Healthcare and Life Sciences; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7108 - Time-Lapse Visualization of Building Construction

Innfarn Yoo Sr. Software Engineer, NVIDIA
Innfarn Yoo is a software engineer in OpenGL core and chips team at NVIDIA Corporation. He received his doctoral and master's degrees from Purdue University.

We'll present a novel visualization method that creates time-lapse ray-traced video using NVIDIA's Iray renderer. NVIDIA's new building CAD model is converted as a time-lapse model using time-lapse point cloud, and then efficiently rendered using NVIDIA's Iray rendering technique.

Level: Intermediate
Type: Talk
Tags: Rendering and Ray Tracing; Real-Time Graphics; Large Scale and Multi-Display Visualization

Day: TBD
Time: TBD
Location: TBD

S7112 - Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLINK

Rajesh Bordawekar Research Staff Member, IBM T. J. Watson Research Center
Rajesh Bordawekar is a research staff member at the IBM T. J. Watson Research Center. He is working on GPU acceleration of cognitive and analytics workloads, for example, Spark ML.

We'll discuss approaches for accelerating out-of-core nearest neighbor computation on multi-GPU systems using various system features such as NVLink. Nearest neighbor calculations operate over a set of high-dimensional vectors and compute pair-wise distances using certain similarity metrics such as cosine or maxNorm distances. In practice, the number of vectors can be very large and can have very high dimension (for example, 5 million 1,000 vectors for the Wikipedia corpus). In such cases, the data cannot fit the GPU device memory, and needs to be fetched from the host memory. We'll present GPU implementations of key nearest neighbor algorithms (for example, locality sensitive hashing) for these scenarios and demonstrate how one can use NVLink for optimizing these algorithms.

Level: All
Type: Talk
Tags: Deep Learning and AI; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7113 - GPU-Accelerated Graph Analytics

Howie Huang Associate Professor, The George Washington University
Howie Huang is an associate professor in the Department of Electrical and Computer Engineering at George Washington University.

Future high-performance computing systems will enable fast processing of large datasets, as highlighted by President Obama's executive order on the National Strategic Computing Initiative. Of significant interest is the need for analyzing big graphs arising from a variety of areas -- from social networks and biology, to national security. We'll present our ongoing efforts at George Washington University in accelerating big graph analytics on GPUs. We've developed a GPU-based graph analytics system that delivers exceptional performance through efficient scheduling of a large number of GPU threads and effective utilization of GPU memory hierarchy.

Level: All
Type: Talk
Tags: Accelerated Analytics; HPC and Supercomputing
Industry Segments: Higher Education / Research

Day: TBD
Time: TBD
Location: TBD

S7114 - How GPUs and Deep Learning Help to Make Dental Care More Affordable

Sergei Azernikov Machine Learning Team Lead, Glidewell Dental
Sergei Azernikov is an accomplished professional with broad academic and industrial experience in the areas of biomedical CAD/CAM, computational geometry, computer vision, and machine learning. He has made important contributions to a wide variety of projects across different industries both as an individual contributor and as a team lead. He has filed a number of patents and published intensively in the leading peer-reviewed professional journals and conference proceedings.

Learn about the unique challenges being solved using deep learning on GPUs in a large-scale mass customization of medical devices. Deep neural networks have been successfully applied to some of the most difficult problems in computer vision, natural language processing, and robotics. But we still haven't seen the full potential of this technology used in manufacturing. Glidewell Labs daily produces thousands of patient specific items, such as dental restorations, implants, and appliances. Our goal is to make high-quality restorative dentistry affordable to more patients. This goal can only be achieved with flexible, highly autonomous CAD/CAM systems, which rely on AI for real-time decision making.

Level: All
Type: Talk
Tags: Manufacturing Industries; Deep Learning and AI; Healthcare and Life Sciences

Day: TBD
Time: TBD
Location: TBD

S7117 - Accelerating Cross-Validation in Spark Using GPU

Minsik Cho Research Staff Member, IBM Research
Minsik Cho has been a research staff memeber with IBM Research since 2008. He received a B.S. in electrical engineering from Seoul National University, Korea in 1999 and a Ph.D. in electrical and computer engineering from the University of Texas, Austin in 2008.

Learn how to utilize GPUs better to accelerate cross-validation in Spark, which is widely used in many bigdata analytics/machine learning applications.

Level: All
Type: Talk
Tags: Accelerated Analytics; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7118 - Using DRIVE PX 2 to Drive a Vehicle Autonomously

Shri Sundaram Senior Product Manager - DRIVE PX 2, NVIDIA
PX 2 Product manager

We'll discusss the process of installing NVIDIA DRIVE PX 2 in a car, including data acquisition, data annotation, neural network training, and in-vehicle inference. We'll focus on the type of sensors required to perceive, on how to log data, annotate data, train a neural network with that data, and use that neural network to inference on DRIVE PX 2 to create an occupancy grid and drive the car.

Level: All
Type: Talk
Tags: Self-Driving Cars; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7121 - Jacobi-Based Eigenvalue Solver on GPU

Lung-Sheng Chien Software Engineer, NVIDIA
Highly-Rated Speaker
Lung-Sheng Chien is a software engineer at NVIDIA, working on CUSOLVER and CUSPARSE libraries.

Learn how to use Jacobi-based eigenvalue solver in your applications. We'll describe the basic idea of the Jacobi method and its applications on exact eigenvalue solver, approximate eigenvalue solver, and batched eigenvalue solver on small matrices.

Level: Beginner
Type: Talk
Tags: Tools and Libraries; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7122 - CUDA Optimization Tips, Tricks and Techniques

Stephen Jones Principal Software Engineer, NVIDIA
Stephen Jones is a principal software engineer in the CUDA group at NVIDIA, working on making the CUDA language and programming model span the needs of parallel programming from high performance computing to artificial intelligence. Prior to NVIDIA, he lead the Simulation & Analytics group at SpaceX, where he worked on various projects, including large-scale simulation of combustion processes in rocket engines. His background is in computational fluid mechanics and plasma physics, but he has worked in diverse industries, including networking, CAD/CAM, and scientific computing.

Optimizing your code can be one of the most challenging tasks in GPU programming, but also one of the most rewarding: the performance difference between an initial version and well-tuned code can be a factor of 10 or more. Some optimizations can be quite straightforward while others require care and deep understanding of how the code is executing. A particular focus will be on optimization of the CPU part of your code, which is frequently overlooked even though it is often easier to tune and just as effective. Sometimes the biggest obstacle is just knowing what to look for, so we'll cover a range of techniques that everyone from beginners to CUDA ninjas might not have thought of before.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Accelerated Analytics; Algorithms; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7124 - Drone Net: Using Tegra for Multi-Spectral Detection and Tracking in Shared Air Space

Sam Siewert Assistant Professor, Embry-Riddle Aeronautical University
Sam Siewert has faculty appointments at Embry Riddle Aeronautical and Colorado University. He's worked as a computer engineer since 1989, serving as a developer, CTO, and architect.

The challenge and opportunity presented by use of UAS "drones" in the national airspace has historic significance. The FAA estimates that by 2020 the drone market will be $98 billion with 7 million drones added annually. How drones ranging from professional service to hobby will safely share airspace is unclear. Preliminary research at Embry Riddle to develop a drone detector, which can be placed on rooftops and networked with other detectors and information services, has shown that multi-spectral electro-optical/infrared detection is quite effective. Our team is using NVIDIA(R) Jetson(TM) systems in an EO/IR detector system. The NVIDIA Kepler architecture-based NVIDIA Tegra(R) co-processor provides real-time object detection for aircraft and drones using salient object detection algorithms accelerated by GPUs. We'll present the power efficiency and real-time processing advantages GP-GPU provides compared to FPGA and multi-core, which we've also tested for this application.

Level: Intermediate
Type: Talk
Tags: Computer Vision and Machine Vision; Video and Image Processing; Federal; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7125 - Efficient Imaging in Radio Astronomy Using GPUs

Bram Veenboer PhD Researcher, Astron
Bram Veenboer is a Ph.D. researcher at ASTRON, the Netherlands Institute for Radio Astronomy. His work focuses on accelerator platforms towards the biggest radio telescope in the world: SKA.

Realizing the next generation of radio telescopes such as the Square Kilometre Array requires both more efficient hardware and algorithms than today's technology provides. We'll present our work on the recently introduced Image-Domain Gridding (IDG) algorithm that tries to avoid the performance bottlenecks of traditional AW-projection gridding. We'll demonstrate how we implemented this algorithm on various architectures. By applying a modified roofline analysis, we show that our parallelization approaches and optimization leads to nearly optimal performance on all architectures. The analysis also indicates that, by leveraging dedicated hardware to evaluate trigonometric functions, NVIDIA GPUs are much faster and more energy-efficient than regular CPUs. This makes IDG on GPUs a candidate for meeting the computational and energy-efficiency constraints for future telescopes.

Level: Intermediate
Type: Talk
Tags: Astronomy and Astrophysics; Performance Optimization
Industry Segments: Higher Education / Research; Government / National Labs

Day: TBD
Time: TBD
Location: TBD

S7126 - Red Blood Cells Simulations with Chemical Transport Properties

Ansel Blumers Graduate Student, Brown University
Ansel Blumers is a graduate student in the Department of Physics at Brown University.

We'll explore new techniques in GPU-accelerated red blood cell simulations. The desire to study the underlying chemical influences on red blood cell functionalities motivates the use of a method that can capture the diffusion and reaction processes. To take advantage of the GPU's parallelism, the new technique involves stream diversion tactic and non-blocking MPI communications to streamline the computation. The speed is then tested against the CPU counterpart. Strong scaling and weak scaling are performed to characterize scalibility.

Level: All
Type: Talk
Tags: Computational Biology; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7127 - cuMF_sgd: Fast and Scalable Matrix Factorization on GPUs

Wei Tan Research Staff Member, IBM T. J. Watson Research Center
Wei Tan is a research staff member at IBM's T.J. Watson Research Center. Wei's work and code have been incorporated into IBM's patent portfolio and software such as Spark, BigInsights, and Cognos. Wei is a visiting professor at Tsinghua University and Tianjin University, China, and associate editor of IEEE Transactions on Automation Science and Engineering.

Matrix factorization (MF) has been widely used in recommender systems, topic modeling, word embedding, and more. Stochastic gradient descent (SGD) for MF is memory bound. Meanwhile, single-node CPU systems with caching performs well only for small datasets. Distributed systems have higher aggregated memory bandwidth but suffer from relatively slow network connections. This observation inspires us to accelerate MF by utilizing GPUs's high memory bandwidth and fast intra-node connection. We present cuMF_SGD, a CUDA-based SGD solution for large-scale MF problems. On a single CPU, we design two workload schedule schemes, i.e., batch-Hogwild! and wavefront-update, that fully exploit the massive amount of cores. batch-Hogwild! as a vectorized version of Hogwild! especially overcomes the issue of memory discontinuity. On three datasets with only one Maxwell or Pascal GPU, cuMF_SGD runs 3.1 to 28.2x as fast compared with state-of-art CPU solutions on 1 to 64 CPU nodes.

Level: All
Type: Talk
Tags: Accelerated Analytics

Day: TBD
Time: TBD
Location: TBD

S7128 - How to Enable NVIDIA CUDA Stream Synchronous Communications Using GPUDirect

Davide Rossetti Senior Software Engineer, NVIDIA
Davide Rossetti is lead engineer for GPUDirect at NVIDIA. Previously, he spent more than 15 years at the Italian National Institute for Nuclear Physics as a researcher and member of the APE experiment.
Elena Agostini Ph.D. and Intern at NVIDIA, University of Rome
Elena Agostini received her Ph.D. in Computer Science from the University of Rome “La Sapienza” in collaboration with the National Research Council of Italy. The main topics of her research are GPUs used for cryptanalysis or communications, parallel computing, HPC and network protocols. Her first Internship at NVIDIA, in the Santa Clara (CA) headquarter, consisted of a collaboration with the CUDA team about the GPUDirecty Async technology, recently released by NVIDIA. She is currently doing her second Internship at NVIDIA, helping to improve the technology.

Learn how to enable CUDA stream synchronous communications in your applications by employing novel GPUDirect features.

Level: Advanced
Type: Talk
Tags: HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7129 - Scalable Deep Learning with Microsoft Cognitive Toolkit

Sayan Pathak Principal Engineer and ML Scientist, Microsoft
Sayan Pathak is a principal engineer and machine learning scientist at Microsoft. He is on the faculty at the University of Washington and IIT Kharagpur, India. His interests are in deep learning, vision, informatics, and online ads.

We'll introduce the Microsoft open source, production-grade deep learning Cognitive Toolkit (formerly CNTK) in a talk that will be a prelude to a detailed hands-on tutorial. The Cognitive Toolkit was used recently to achieve a major breakthrough in speech recognition by reaching human parity in conversational speech. The toolkit has been powering use cases leveraging highly performant GPU platforms. It is being used by several customers both on-premises and on Azure cloud. We'll introduce different use cases leveraging fully connected CNN, RNN/LSTM, auto encoders, and reinforcement learning. We'll deep dive into topics that enable superior performance of the toolkit in comparison with similar open source toolkits. We'll showcase scalability across multiple GPUs and multiple servers. We'll provide a teaser hands-on experience with Jupyter notebooks running on Azure with simple introductory to very advanced end-to-end use cases.

Level: Beginner
Type: Talk
Tags: Deep Learning and AI; Accelerated Analytics
Industry Segments: Software

Day: TBD
Time: TBD
Location: TBD

S7130 - Efficient Deep Model Selection

Jose Alvarez Researcher, Commonwealth Scientific and Industrial Research Organisation(CSIRO)
Jose M. Alvarez is a computer vision researcher at Data61 at CSIRO (formerly NICTA) in Australia, working on large-scale dynamic scene understanding and efficient deep learning architectures.

Convolutional neural networks have achieved impressive success in many tasks in computer vision. However, they come at a high memory and computational cost, thus making it difficult for deep learning to be commercially viable. In addition, selecting the architecture is still an engineering process. We'll introduce DecomposeMe, an efficient architecture based on filter-compositions. This architecture can be trained quickly and is capable of achieving real-time operation in embedded platforms (250+ fps in an NVIDIA Jetson TX1). We'll also introduce our approach to automatically determining the number of neurons of the architecture during the training process. Finally, we'll introduce a novel approach to quantizing the network parameters.

Level: All
Type: Talk
Tags: Computer Vision and Machine Vision; Performance Optimization; Algorithms; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7131 - The Modern CAD/CAM Workflow: Scan, Design, Edit, Analyze, and Fabricate Without Triangles

Duane Storti Professor, University of Washington - Seattle
Duane Storti is a professor of mechanical engineering at the University of Washington and author of "CUDA for Engineers." Duane has extensive experience applying CUDA in CAD/CAM and medical imaging.

Learn about a new solid modeling approach created to provide support for customer- and patient-specific design and additive manufacturing (3D printing) with graded materials and properties. The new modeling approach involves a hybrid of function-based (implicit) modeling and voxel modeling; models consist of function values on a regular grid (along with a simple interpolant), so meshing/triangulation of objects' surfaces and/or volumes is avoided. Learn the basic ideas behind the modeling approach and see demonstrations of: (1) CUDA-accelerated, real-time interactions between digital models imported from CAD systems and digitized/scanned models, (2) design and fabrication of objects with graded materials/properties, and (3) initial results of CUDA-accelerated methods for mesh-free property evaluation and analysis.

Level: Intermediate
Type: Talk
Tags: Manufacturing Industries; Healthcare and Life Sciences

Day: TBD
Time: TBD
Location: TBD

S7132 - New CUDA Features and Beyond

Mark Harris Chief Technologist, GPU Computing Software, NVIDIA
Highly-Rated Speaker
Mark Harris is Chief Technologist for GPU Computing Software at NVIDIA. Mark has 15 years of experience developing software for GPUs, ranging from graphics and games to physically based simulation, parallel algorithms, and high performance computing. Mark has been using GPUs for general-purpose computing since before they even supported floating point arithmetic. While a Ph.D. student at UNC, he recognized this nascent trend and coined a name for it: GPGPU (general-purpose computing on graphics processing units), and started GPGPU.org to provide a forum for those working in the field to share and discuss their work.

CUDA is NVIDIA's parallel computing platform and programming model. In this talk you'll learn about new programming model enhancements and performance improvements in the latest release of CUDA; preview upcoming GPU programming technology; and gain insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You will learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.

Level: All
Type: Talk
Tags: Programming Languages; Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7133 - Multi-GPU Programming with MPI

Jiri Kraus Senior Devtech Compute, NVIDIA
Highly-Rated Speaker
Jiri Kraus is a senior developer on the European DeveTech team at NVIDIA, where he focuses on multi-GPU programming models and NVIDIA's collaboration with the Juelich Super Computing Centre.

Learn how to program multi-GPU systems or GPU clusters using the message passing interface (MPI) and OpenACC or NVIDIA CUDA. We''ll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We'll also cover the latest improvements with CUDA-aware MPI, interaction with Unified Memory, the multi-process service (MPS, aka Hyper-Q for MPI), and MPI support in NVIDIA performance analysis tools.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing; Programming Languages

Day: TBD
Time: TBD
Location: TBD

S7135 - NVIDIA VRWorks Audio - Improving VR Immersion with Acoustic Fidelity

Tony Scudiero Audio Technology Lead, NVIDIA
Highly-Rated Speaker
Tony Scudiero is the lead engineer on audio for virtual reality applications at NVIDIA, including the NVIDIA VRWorks Audio SDK. Tony did his graduate research on audio for virtual environments.

The demand for realism increases dramatically the instant a player puts on a head-mounted display (HMD) - images, sounds, and interactions make or break the immersiveness of the experience. We'll provide an overview and examples of the NVIDIA VRWorks Audio SDK, a geometric acoustics rendering toolkit that helps developers improve realism and immersion through realistic acoustic simulation and audio rendering.

Level: All
Type: Talk
Tags: Virtual Reality and Augmented Reality; Rendering and Ray Tracing; Game Development

Day: TBD
Time: TBD
Location: TBD

S7136 - DNA Sequences Alignment in Multi-GPUs: Energy Payoff on Speculative Executions

Manuel Ujaldon Full Professor and NVIDIA CUDA Fellow, University of Malaga (Spain), Computer Architecture Department
Manuel Ujaldon is a full professor in computer architecture at the University of Malaga. He earned a B.S. in computer science from the University of Granada (Spain, 1991) and an M.S. and Ph.D. in computer science from the University of Malaga (Spain, 1993 and 1996).

Find out the energy cost of launching speculative executions when handling data dependencies to enhance parallelism on multi-GPU platforms. We present CUDAlign 4.0 as case study, a multi-GPU execution for an optimal alignment of huge DNA sequences using the exact Smith-Waterman algorithm. Our speculative approach easily attains 10-20x speed-up versus the baseline pipelined version where GPUs are idle waiting for dependencies to be solved. But working on mispredictions, GPUs waste energy. In the green computing era where GFLOPS/w is the trending metric, we need to know which is worse: wasting time or power. Our experimental study analyzes speculation hit ratios to evaluate extra performance and measures energy spent on mispredictions, to conclude to what extent the speculative approach jeopardizes the GFLOPS/w ratio.

Level: All
Type: Talk
Tags: Computational Biology; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7138 - Enhancing Pricing Performance and Quants Productivity in a Cloud Based Development Environment

Bram Leenhouwers Development manager, Misys
Bram Leenhouwers is a senior architect at Misys, where he leads the Fusion Parallel Platform team responsible for the Misys GPU pricing platform. He holds an M.S. in cryptography from Polytech'Nice in France.

Misys quants use a groovy-based DSL to write efficient GPU-enabled pricing models without any OpenCL or NVIDIA(R) CUDA(R) knowledge. Allowing progressive migration from legacy code to GPU-enabled models, this framework leverages GPGPU strengths to achieve high-performance pricing with a really short learning curve. We'll start with an overview of the framework, and then focus on the online ecosystem Misys provides to allow third parties to develop and run their custom code on GPUs in the cloud through a PaaS-like interface.

Level: Intermediate
Type: Talk
Tags: Finance

Day: TBD
Time: TBD
Location: TBD

S7139 - Prices Drop as You Shop – How Walmart is Using Jet's GPU based Smart Merchant Selection to Gain a Competitive Advantage

Daniel Egloff Partner, QuantAlea and InCube
Highly-Rated Speaker
Daniel Egloff, partner of InCube Group and managing director of QuantAlea, is an expert in large-scale numerical computing and GPU software development with more than 20 years of experience.

Last year Walmart acquired the New Jersey based startup Jet to improve their e-commerce platform with new innovations, to compete more successfully in the e-commerce market and to optimize their order fulfillment costs. A core value of the Jet platform is the smart merchant selection. When a customer orders several items at once they usually can be fulfilled from multiple merchants and different warehouses. The goal is to find the merchant and warehouse combination so that the total order cost, including shipment costs and commissions, is as low as possible. Jet developed an innovative solution to find the most attractive combination of merchants. The bigger the shopping cart, the larger the savings that can be generated. We'll explain how only a clever combination of Machine Learning, new algorithms and GPUs at scale in the cloud can address the problem. This allows to unlock new use cases and business applications, which would not be possible with traditional computing resources.

Level: All
Type: Talk
Tags: Deep Learning and AI; Performance Optimization; Finance
Industry Segments: Retail / Etail

Day: TBD
Time: TBD
Location: TBD

S7142 - Multi-GPU Programming Models

Jiri Kraus Senior Devtech Compute, NVIDIA
Highly-Rated Speaker
Jiri Kraus is a senior developer on the European DeveTech team at NVIDIA. He focuses on multi-GPU programming models and NVIDIA's collaboration with the Juelich Supercomputing Centre.
Sreeram Potluri Senior Software Engineer, NVIDIA CORP
Sreeram Potluri is a senior software engineer at NVIDIA. His work focuses on parallel programming models and communication runtimes for GPU clusters.

Do you need to compute larger or faster than a single GPU allows you to? Learn how to scale your application to multiple GPUs. Learn how to use the different available multi-GPU programming models and what are their individual advantages. All programming models will be introduced using same example applying a domain decomposition strategy.

Level: Intermediate
Type: Talk
Tags: Programming Languages; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7143 - Anomaly Detection for Network Intrusions Using Deep Learning

Adam Gibson CTO, Skymind
Adam Gibson is the cofounder and CTO of Skymind as well as the creator of Deeplearning4j, the first commercial-grade deep learning library for the JVM. He helps companies deploy deep learning to production.

We'll describe how deep learning can be applied to detect anomalies, such as network intrusions, in a production environment. In part one of the talk, we'll build an end-to-end data pipeline using Hadoop for storage, Streamsets for data flow, Spark for distributed GPUs, and Deeplearning4j for anomaly detection. In part two, we'll showcase a demo environment that demonstrates how a deep net uncovers anomalies. This visualization will illustrate how system administrators can view malicious behavior and prioritize efforts to stop attacks. It's assumed that registrants are familiar with popular big data frameworks on the JVM.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Deep Learning and AI; AI Startup; Federal
Industry Segments: Cloud Services; Software

Day: TBD
Time: TBD
Location: TBD

S7148 - Using Kokkos for Performant Cross-Platform Acceleration of Liquid Rocket Simulations

Michael Carilli Computational Scientist, ERC Incorporated
Dr. Michael Carilli works at the Air Force Research Laboratory. His role is code optimization for modern parallel architectures including GPUs, multicore vector CPUs, and Intel Xeon Phis.

We'll demonstrate acceleration of a large, preexisting Fortran fluid dynamics solver using Kokkos, a C++ library that enables a single codebase to achieve high performance on multiple parallel architectures, including NVIDIA GPUs. We'll describe the complete process: identifying performance-critical physics subroutines, porting and optimizing these routines, integrating Kokkos C++ with the main Fortran code in a minimally invasive way, and tuning cluster-level performance. We'll compare the performance achieved when Kokkos uses NVIDIA(R) Tesla(R) K40 GPUs, Knight's Corner Xeon Phis, and Xeon CPUs. We'll also present some GPU-specific optimizations. For "trivially parallel" physics calculations, assigning one NVIDIA CUDA(R) thread to each grid point may not be ideal. If a small team works cooperatively on each grid point, performance can improve due to the larger amount of effective cache available to each team.

Level: Intermediate
Type: Talk
Tags: Computational Fluid Dynamics; Tools and Libraries; Computer Aided Engineering

Day: TBD
Time: TBD
Location: TBD

S7149 - 3D DeepObject for Precision 3D Mapping

Bingcai Zhang Tech Fellow, BAE Systems
Bingcai Zhang is a tech fellow with expertise in computer vision and artificial intelligence. He invented DeepObject with Simplicity Learning and Singular Classification, achieving near human-level accuracy.

3D DeepObject achieves mapping-level positional accuracy. In the geospatial intelligence space, positional accuracy is as important as precision and recall. Unfortunately, convolutional networks in deep learning are invariant to translation. In other words, the positional accuracy from deep learning object detection is inherently poor. Combining deep learning and 3D model fitting, our 3D DeepObject has the best of both worlds. Deep learning can detect object (a bounding box) with close to human-level accuracy, while 3D model fitting can achieve pixel-level positional accuracy. The output (bounding boxes) from deep learning are the input for 3D model fitting. A bounding box from deep learning can significantly reduce the search space for 3D model fitting. Our latest test indicates that 3D DeepObject can achieve much higher positional accuracy than deep learning or 3D model fitting alone can achieve.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Computer Vision and Machine Vision; Federal

Day: TBD
Time: TBD
Location: TBD

S7150 - ACCELERATING CUBLAS/CUDNN USING INPUT-AWARE AUTO-TUNING: THE ISAAC LIBRARY

Philippe Tillet Ph.D. Candidate, Harvard University
Philippe Tillet is a Ph.D. Candidate at Harvard University. His research focuses on the interaction between Machine Learning and Compiler Construction.

This session describes the design and implementation of the ISAAC library, an open-source supplement to cuBLAS that provides improved performance for rectangular matrices. Attendees will learn about input-aware auto-tuning, a technique that relies on stochastic optimization and machine learning to automatically derive efficient input- and hardware- portable kernels. Benchmarks will be presented for GEMM in the context DeepSpeech, PCA and SVD, showing up to 2x improvements over cuBLAS on a Pascal Titan X. Transparent compatibility with OpenCL will be shown, leading to similar improvements over clBLAS on both the AMD Fiji and Intel GEN8 architectures.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Deep Learning and AI; Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7153 - Efficient Observations Forecast for the World's Biggest Eye Using DGX-1

Damien Gratadour Associate Professor, Université Paris Diderot & Observatoire de Paris
Damien Gratadour has been an associate professor at Universite Paris Diderot and research scientist at LESIA, Observatoire de Paris since 2008. Damien holds an M.S. in theoretical physics and a Ph.D. in observational astronomy from Universite Paris Diderot. In the past, Damien has been responsible for the last stages of commissioning of the LGS upgrade to the Altair AO system on the Gemini North Telescope in Hawaii (2006). He spent two years as an AO scientist, with the responsibility of instrument scientist for GeMS, the Gemini MCAO System, a $15 million facility, participating in the various acceptance tests and integration of its sub-systems and first stages of technical tests of the full instrument and most notably the DSP-based RTC. At Observatoire de Paris, Damien is concentrating on high-performance numerical techniques for astronomy for modeling, signal processing, and instrumentation and on the development of observational programs.
Hatem Ltaief Senior Research Scientist, KAUST
Highly-Rated Speaker
Hatem Ltaief is a senior research scientist in the Extreme Computing Research Center at KAUST, where he also advises several students in their M.S. and Ph.D. research. Hatem's research interests include parallel numerical algorithms, fault tolerant algorithms, parallel programming models, and performance optimizations for multicore architectures and hardware accelerators. His current research collaborators include Aramco, Total, Observatoire de Paris, Cray, NVIDIA, and Intel. Hatem received his engineering degree from Polytech Lyon at the University of Claude Bernard Lyon I, France, an M.S. in applied mathematics at the University of Houston, and a Ph.D. in computer science from the University of Houston. From 2008 to 2010, he was a research scientist in the Innovative Computing Laboratory in the Department of Electrical Engineering and Computer Science at the University of Tennessee, Knoxville.

Have you heard about the largest ground-based telescope ever built? Are you interested in the newest NVIDIA DGX-1 hardware accelerator? Come and learn how the DGX-1 architecture dramatically leaps forward the computational astronomy community in designing major, multimillion-dollar optical instruments for the European Extremely Large Telescope. Starting from the mathematical model up to the high-performance implementation on distributed-memory systems with hardware accelerators, we'll explain how the resulting matrix computations associated with an efficient task-based programming model help design the next generation of telescope instruments.

Level: Intermediate
Type: Talk
Tags: Astronomy and Astrophysics; Tools and Libraries; Federal
Industry Segments: Higher Education / Research; Government / National Labs

Day: TBD
Time: TBD
Location: TBD

S7155 - Optimized Inter-GPU Collective Operations with NCCL

Sylvain Jeaugey Senior Communication and Computing Engineer, NVIDIA
Sylvain Jeaugey has been optimizing HPC communication libraries for 10+ years with a strong focus on collective operations. Before joining NVIDIA, Sylvain worked for Bull, optimizing MPI libraries to work on 100,000+ cores, then designing the BXI high-speed network for HPC. He is the main developer of the NCCL library.

We'll present the functionalities of NCCL (pronounced "Nickel"), a standalone library of standard collective communication routines, such as all-gather, reduce, broadcast, etc., that have been optimized to achieve high bandwidth over GPU topologies. NCCL can be used in either single- or multi-process (for example, MPI) applications.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7156 - GPU-Enabled Comparative Genomics Calculations on Leadership-Class HPC Systems

Wayne Joubert Computational Scientist, Oak Ridge National Laboratory
Wayne Joubert is a computational scientist in the Scientific Computing Group at Oak Ridge National Laboratory. He specializes in the development, deployment, and optimization of mathematical algorithms and software for leadership-class high performance computing systems. He participated in the ORNL Titan system CAAR application readiness effort, in which he developed the GPU-enabled Sn sweep component of the Denovo radiation transport code. He currently participates in application and library readiness efforts for the upcoming 200-petaflop ORNL Summit system. He has previously participated in multiple computational science efforts, including the R&D 100 award-winning FALCON Reservoir Simulation Project and the Gordon Bell award finalist project, "The In-Silico Lab-On-A-Chip: Petascale And High-Throughput Simulations Of Microfluidics At Cell Resolution."

We'll describe recent work to map comparative genomics algorithms to GPU-accelerated leadership-class systems. The explosion in availability of genomic data holds promise for enabling determination of the genetic causes of phenotypic characteristics, with applications to problems such as the discovery of the genetic roots of diseases. The growing sizes of these datasets and the quadratic and cubic scaling properties of the algorithms necessitate use of leadership-scale accelerated computing. We'll discuss the mapping of two-way and three-way algorithms for comparative genomics calculations to large-scale GPU-accelerated systems. Focusing primarily on the Proportional Similarity metric and the Custom Correlation Coefficient, we'll discuss issues of optimal mapping of the algorithms to GPUs, eliminating redundant calculations due to symmetries, and efficient mapping to many-node parallel systems. We'll also present results scaled to thousands of GPUs on the ORNL Titan system.

Level: Intermediate
Type: Talk
Tags: Computational Biology; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7157 - Deep Racing: Golden Age of New Performance and Autonomous Driving

John Waraniak Vice President of Vehicle Technology, Specialty Equipment Market Association, SEMA
John Waraniak has been vice president of Vehicle Technology at the Specialty Equipment Market Association (SEMA) since 2006. In this role, John helps leading automotive aftermarket companies understand emerging vehicle technology challenges, develop solutions, and capitalize on new revenue and business opportunities – whether it's vehicle dynamics, vehicle emissions, green performance, electronics, or connected and autonomous driving. John earned a bachelor's degree in mechanical engineering from the University of Michigan. He has a master's degree in mechanical and industrial engineering from the University of Illinois and a master's degree in engineering management from West Coast University. He also graduated from the California Institute of Technology's Executive Engineering Management Program.
Bruce Falls Director of Engineering, AVL
Bruce Falls is director of Engineering at AVL, which he first joined in 2007 as director of the AVL California Technology Center. Bruce has 30 years of experience in automotive engineering, mostly in powertrain development and vehicle systems integration. He has concentrated on the areas of base engine development, electronic controls, emissions development, alternative fuels applications, and vehicle electrification.

Learn directly from a grid of frontline leaders on the latest self-driving, self-racing, deep learning, and machine language technologies being used in automotive racing and production. Next-gen racing technologies, vehicle electrification, and autonomous systems are critical to the continued relevance of and innovation within the automotive industry for traditional players, as well as new players focused on artificial intelligence and machine learning. Performance vehicles account for 10 percent of sales yet generate 90 percent of the media coverage and are critical to brand and technology leadership. AI and ML technologies and systems are smarter, faster, and more connected as software takes a lead role in defining the future of the auto industry and motor racing. Disruptive self-driving technologies are changing how cars are designed, developed, customized, sold, serviced, shared, and owned.

Level: All
Type: Talk
Tags: Self-Driving Cars; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7160 - NVIDIA GPU Support for Apache Mesos and DC/OS

Klues Kevin Senior Software Engineer, Mesosphere
Kevin Klues is a senior software engineer at Mesosphere working on the Mesos core team. Prior to joining Mesosphere, he worked at Google on an experimental operating system for data centers called Akaros. He and a few others founded the Akaros project while working on their Ph.D.s at UC Berkeley. In a past life, Kevin was a lead developer of the TinyOS project, working at Stanford, the Technical University of Berlin, and the CSIRO in Australia. When not working, you can usually find Kevin on a snowboard or up in the mountains in some capacity or another.

DC/OS is a distributed operating system based on the Apache Mesos distributed systems kernel. DC/OS can be used to automate resource management, schedule process placement, facilitate inter-process communication, and simplify the installation and management of distributed services. In the past, Mesos was famous for helping Twitter to eradicate the "Fail Whale." More recently, DC/OS has been adopted in production by high-profile companies such as Autodesk, Esri, Time Warner Cable, Verizon, and Wellframe. Until recently, however, Mesos did not support GPUs as an allocatable resource. Users that wished to use GPUs had to have separate GPU clusters outside of their primary Mesos or DC/OS installation. By adding first-class support for GPUs, companies can now leverage the power of the GPU in the same shared cluster as their other workloads. We'll introduce how we added support for GPUs to Mesos, including a demo of Tensorflow jobs running on a standard DC/OS installation.

Level: Intermediate
Type: Talk
Tags: Data Center and Cloud Computing; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7161 - OpenStack + AWS, HPC (aaS), and GPUs - a Pragmatic Guide

Martijn de Vries Chief Technology Officer, Bright Computing
Martijn de Vries serves as Chief Technology Officer of Bright Computing, where he is responsible for Bright's software development. Prior to Bright Computing, Martijn was head of software development at ClusterVision. Martijn taught distributed programming in the Computer Science department at Hanze Polytechnic in the Netherlands and programmed for the New York-based internet startup Shop.com.

Why do HPC in a cloud? How to do HPC (aaS), with GPU passthrough, in OpenStack? How to create full GPU HPC cluster, from scratch, on demand, under five minutes, all equipped with NVIDIA's DCGM and CUDA environment, and deep learning libraries/frameworks? Hybrid clouds with GPUs spanning OpenStack and AWS? How to easily and automatically move HPC user data and workloads between the private and public cloud? How to dynamically scale a virtualized HPC cluster, both horizontally (within private cloud) and vertically (to public cloud)? We'll answer these questions during a deep dive into the world of HPC on top of OpenStack and AWS. We'll discuss many ways OpenStack private clouds can be used for bursting HPC workloads, HPC-as-a-service, XaaS (anything-as-a-service), and creating hybrid clouds composed of on-prem private/community cloud OpenStack deployment, which dynamically scale them to public clouds, like AWS. Session includes demo.

Level: Beginner
Type: Talk
Tags: Data Center and Cloud Computing; HPC and Supercomputing; Federal

Day: TBD
Time: TBD
Location: TBD

S7165 - SurvivalNet: Predicting Patient Survival with Fully Convolutional and Residual Neural Networks

Florian Ettlinger Student, Technical University Munich
Florian Ettlinger is a student in the group of Professor Bjoern Menze at the Institute for Medical Engineering at the Technical University Munich. In his research he investigates the use of deep neural networks for medical computer vision problems, in particular for automatic tumor segmentation in medical images and for survival prediction. He studied physics with focus on deep learning and machine learning. He also studied technology management at the Center for Digital Technology and Management of the Technical University Munich. There he met the co-founders for his startup MealoMi, the world's first smart food app that helps diabetes patients track bread units based on automatic analysis of a photo food journal. Florian is an alumni of the German National Academic Foundation that awards scholarships to exceptionally talented students.

SurvivalNet predicts patient survival from diffusion weighted magnetic resonance images using cascaded fully convolutional and 3D convolutional neural networks. We predict fully automatically the survival of HCC cancer patients from MRI using a two-step deep learning approach. First, we automatically detect and segment HCC from MRI images using fully convolutional neural networks. Second, we take the automatic tumor segmentation and train a 3D residual neural network to classify the tumor in long or short survival. We'll show experimentally that this approach outperforms prior work and methods. End to end, we achieve an accuracy of 68% for tumor malignancy classification based on expert annotations. All in all, we presented a automatic survival prediction framework, which could be also applied to other tumor diseases and help in tumor treatment planning.

Level: All
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Computer Vision and Machine Vision; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7166 - Implementing High-Resolution Fluid Dynamics Solver in a Performance Portable Way

Pierre Kestener Research Engineer, CEA
Pierre Kestener is a research engineer at CEA (France's National Nuclear Energy Research Center) within the Maison de la Simulation, a research division for high-performance computing. His main interest is in helping domain scientists from astrophysics or computational fluid dynamics to design, develop, and optimize production-level code for large computing platforms. He is the lead developer of code RamsesGPU dedicated to magneto-hydrodynamics turbulent flow studies. He also recently started the design of CanoP, a new application platform for CFD with adaptive mesh refinement. As part of the CUDA research and teaching center program, he is involved in teaching GPU programming for students at master level as well as for researchers and engineers during trainings at France's PRACE advanced training center.

We'll report on the use of the kokkos C++ library for designing new performance portable implementations of the algorithms used in astrophysics computational fluid dynamics applications. Among others libraries with similar features, kokkos, which is developed at Sandia National Laboratory, provides a very promising way of designing high-performance computing parallel applications with performance portability across multiple hardware architectures, code readability, and high productivity in mind. Many scientific domains use community codes developed by tens of developers, and such high-level language approach will help them use today's GPU and next generations productively. We'll illustrate several advantages of our new kokkos-based implementation of the computational intensive compressible magneto-hydrodynamics kernels involved in code RamsesGPU, and demonstrate its efficiency on a multi GPU platform (NVIDIA Pascal(TM) P100).

Level: All
Type: Talk
Tags: Computational Fluid Dynamics; HPC and Supercomputing; Computer Aided Engineering

Day: TBD
Time: TBD
Location: TBD

S7168 - Leverage GPU Acceleration for your Program on Apache Spark

Kazuaki Ishizaki Research Staff Member, IBM Research - Tokyo
Kazuaki Ishizaki is a research staff member at IBM Research - Tokyo. Kazuaki has over 20 years of experience conducting research and development of dynamic compilers for Java and other languages. He has been working for IBM SDK, Java Technology Edition. His latest work is to enable GPU programming in Java language on IBM SDK, Java Version 8. He is an expert in compiler optimizations, runtime systems, and parallel processing. Recently, his research has focused on how system software can enable programmers to easily exploit GPUs to accelerate their workloads without increasing their burden in high-level languages such as Java and frameworks such as Apache Spark. He is a contributor to Apache Spark and is an ACM senior member.

Learn how to transparently and effectively leverage NVIDIA GPUs from your Spark program on Apache Spark. We'll provide an overview on how common programmers can leverage GPUs on Apache Spark using our two approaches. One is that a Ninja programmer provides an optimized GPU kernel to develop Spark libraries, which is implemented as a drop-in module to Spark. This allows common programmers to transparently use GPUs by calling these libraries. The other is that enhanced Spark runtime transparently generates GPU code from a Spark program. Our two approaches use the following two components for ease of leveraging GPUs and for achieving high performance. One component is a GPU driver for managing GPU devices, performing data copy, and launching GPU kernels. The other is column-oriented data structure for Spark's data structures, which is suitable for GPU. See experimental results on acceleration of Spark Applications with two approaches using NVIDIA GPUs.

Level: All
Type: Talk
Tags: Accelerated Analytics

Day: TBD
Time: TBD
Location: TBD

S7169 - GA3C: A Hybrid CPU/GPU Implementation of A3C for Deep Reinforcement Learning

Iuri Frosio Senior Research Scientist, NVIDIA
Iuri Frosio is a senior research scientist at NVIDIA, which he joined in 2014. Iuri was a research fellow in the Computer Science Department at the University of Milan from 2003 and an assistant professor in the same department from 2006 to 2013. In the same period, he worked as a consultant for various companies in Italy and in the U.S. He got his Ph.D. in biomedical engineering at the Politecnico of Milan in 2006. Iuri is author of 12 international patents, one Italian patent, 16 journal papers, two book chapters, and 40 papers in international conferences.

We'll introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We'll analyze its computational traits and concentrate on the critical aspects to leverage the GPU's computational power. We'll introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed-up compared to a CPU implementation and is publicly available to other researchers.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Performance Optimization; Game Development

Day: TBD
Time: TBD
Location: TBD

S7170 - Bicycle Green Waves Powered by Deep Learning

Edward Zimmermann Principal Consultant, Nonmonotonic Networks / joint R&D with GESIG. Gesellschaft für Signalanlagen
Edward Zimmmermann has worn many hats throughout his career, including that of a mathematician, national economist, market/social researcher, computer scientist, and entrepreneur. A dominant focus of his R&D over the past 20+ years has been text retrieval, metadata, data mining, knowledge discovery, pattern recognition, natural language processing, and machine learning. Edward has been a part of many publically funded projects, working with German, EU, and UN organizations, and has collaborated with a number of research institutes and national scientific agencies. Parallel to the rapid development of GPGPUs, he renewed his interest in computer vision, artificial neural networks, and reinforced learning. Edward recently became involved as a consultant in a number of applications of deep learning. His involvement with traffic sprang from discussions with Michael Hartig, managing director of GESIG Germany, which is a partner on the project.

We'll explore using deep learning to improve urban traffic signaling. Bicycles (both self-powered and pedelecs) are the future of urban transport alongside (self-driving) electric cars, buses, and rail services. Green waves make cycling more efficient, attractive, and safer. Instead of fixed ""green wave"" timings or priorities, a work in progress system is presented that learns to increase the flow of bicycle traffic while minimizing the impact on other traffic actors -- and in many use cases also results in improvements in general traffic times. Using low power efficient SoCs -- Tegra X1 -- the ""smarts"" are integrated in traffic lights and provide V2I interfaces -- also to mobile phones of cyclists -- about signal changes and warn of pedestrians or cyclists. Dispensing with inductive loop, magnetometer, or radar-based sensors buried in the pavement makes the system inexpensive. We'll present initial results from pilot testing in a German city.

Level: All
Type: Talk
Tags: Computer Vision and Machine Vision; Deep Learning and AI; AI for In-Vehicle Applications

Day: TBD
Time: TBD
Location: TBD

S7172 - Autonomous Drone Navigation with Deep Learning

Nikolai Smolyanskiy Principal Software Engineer, NVIDIA
Nikolai Smolyanskiy is a principal software engineer at NVIDIA on the computer vision team. Prior to NVIDIA, he worked at Microsoft on various projects including: drones at Microsoft Research, SLAM systems for Hololens, face tracking for Kinect, machine learning for natural language processing and search. Nikolai received his M.S. in applied mathematics and pursued his Ph.D. in minimax optimal control.
Alexey Kamenev Senior Deep Learning and Computer Vision Engineer, NVIDIA
Alexey Kamenev is a senior deep learning and computer vision engineer at NVIDIA on the computer vision team. Prior to NVIDIA, Alexey worked at Microsoft on various projects, including Microsoft Research (CNTK and deep learning), Azure machine learning (Machine Learning Algorithms team), and Bing (Relevance team). He has an M.S. in applied mathematics
Jeffrey Smith Senior Computer Vision Software Engineer, NVIDIA
Jeff Smith is a senior computer vision engineer at NVIDIA on the computer vision team. Prior to NVIDIA, Jeff worked at Microsoft on the Hololens project, and at Industrial Light & Magic as a simulation R&D engineer, where his work won an Academy Award. He has a PhD in Robotics.

We'll present an autonomous drone piloted by a deep neural network (DNN) that can autonomously navigate through a forest by following trails and can avoid obstacles. DNN gets video frames from the onboard drone camera as its input and computes high-level control commands as its output. The control commands are sent to the low-level drone's autopilot for execution. Our DNN runs onboard an NVIDIA® Tegra® TX1 in real time. The drone uses open source PX4 flight stack for the low-level control and ROS for its runtime. We'll present the DNN's architecture, describe how we train it and run it as ROS node. We'll also demo the flight videos and show some qualitative analysis of the autonomous flights.

Level: All
Type: Talk
Tags: Intelligent Machines and IoT; Computer Vision and Machine Vision; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7173 - Concept to Production: An Architectural Design Firm's Jump to Virtualization

Jimmy Rotella Design Application Specialist, CannonDesign
Jimmy Rotella is a design application specialist at CannonDesign. Jimmy's background in technology, architecture, and education uniquely position him to help designers throughout the AEC industry build and realize their digital designs using cutting-edge technology.
Andrew Schilling Director of Information Technology, CannonDesign
Andrew Schilling is the director of Information Technology at CannonDesign, with the central charge of developing and executing CannonDesign Information Technologies strategies, advancing tools, workflows, and emerging technologies that enable design teams to deliver outstanding solutions for clients.

About a year ago, CannonDesign embarked on a journey to relocate and upgrade its entire data center, implementing NVIDIA GRID technology, to allow us to collaborate on architectural and engineering design projects throughout all of our offices worldwide. Now we're using our graphics-intensive applications on virtual desktops in our new data center. The design of the infrastructure and implementation of the migration was not without its hurdles, but we're here to share our journey. We'll give some insight into our designs for the virtual desktops, how the machines performed compared to our initial benchmarks, lessons learned, recommendations of tweaks we made, and a glimpse into some of our future plans. If you're planning a virtual desktop infrastructure, interested in creating a virtual environment designed around graphics-intensive applications, or are looking to upgrade and tweak your current environment, come learn from our journey.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; AEC Industries

Day: TBD
Time: TBD
Location: TBD

S7174 - An Architectural Design Firm's Journey Through Virtual GPU Technology for Global Collaboration

Jimmy Rotella Design Application Specialist, CannonDesign
Jimmy Rotella has a background in technology, architecture, and education, uniquely positioning him to help designers throughout the AEC industry build and realize their digital designs using cutting-edge technology.
Andrew Schilling Director of Information Technology, CannonDesign
Andrew Schilling's central charge is to develop and execute CannonDesign's information technology strategies, advancing tools, workflows, and emerging technologies that enable design teams to deliver outstanding solutions for clients.

Learn the benefits that virtualization provides for an architecture and engineering design firm, along with the journey through the advancements in virtualization technology it took to finally meet the graphics-intensive needs of our design software. We'll share our experiences in how virtualization allows a large company, with over 15 offices and 1,000 people worldwide, to collaborate and work as a single firm. We'll show some cost comparisons with virtualization, along with their management benefits and requirements. We'll also look at the methods we used to set and test metrics specific to our requirements, and follow the results of those metrics through the changes in graphics virtualization technology.

Level: All
Type: Talk
Tags: AEC Industries; Data Center and Cloud Computing; Graphics Virtualization
Industry Segments: Architecture / Engineering / Construction

Day: TBD
Time: TBD
Location: TBD

S7175 - Exploratory Visualization of Petascale Particle Data in NVIDIA DGX-1

Benjamin Hernandez Computer Scientist, Oak Ridge National Laboratory
Benjamin Hernandez is a computer scientist in the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory. His research interests are in the intersection of crowd simulations, scientific visualization, interactive computer graphics, and human computer interaction using HPC systems.

Learn to leverage the visualization capabilities of the NVIDIA® DGX-1™ system to visualize particle data. We'll cover techniques suitable for exploratory visualization such as parallel dataset reading and reduction on demand with ADIOS I/O library, GPU-based optimization techniques for particle rendering such as radar view frustum culling, occlusion culling, texture-less point sprites, and OpenGL near zero driver overhead methods. We'll also include implementation details to take advantage of the eight NVIDIA Pascal™ GPUs included in the NVIDIA DGX-1.

Level: All
Type: Talk
Tags: In-Situ and Scientific Visualization; Real-Time Graphics; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7176 - Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Networks

Jinwei Gu Senior Research Scientist, NVIDIA
Jinwei Gu is a senior research scientist in the Mobile Visual Computing Research group at NVIDIA. Previously, he was a senior researcher in the America Media Lab in Futurewei Technologies. From 2010 to 2013, he was an assistant professor at the Munsell Color Science Laboratory in the Center for Imaging Science at Rochester Institute of Technology. Jinwei received his Ph.D. from Columbia University in 2010, and his bachelor's and master's degrees from Tsinghua University, China in 2002 and 2005. His research interests are computer vision, machine learning, and computational imaging. His current research focuses on deep learning, 3D computer vision, visual SLAM, augmented reality, and multi-camera systems for immersive media.

We propose to use recurrent neural networks for analyzing facial properties from videos. Facial analysis from consecutive video frames, including head pose estimation and facial landmark localization, is key for many applications such as in-car driver monitoring, facial animation capture, and human-computer interaction. Compared with the traditional Bayesian filtering methods for facial tracking, we show RNNs are a more generic, end-to-end approach for joint estimation and tracking. With the proposed RNN method, we achieved state-of-the-art performance for head pose estimation and facial landmark localization on benchmark datasets.

Level: All
Type: Talk
Tags: Media and Entertainment; Deep Learning and AI; AI for In-Vehicle Applications; Computer Vision and Machine Vision

Day: TBD
Time: TBD
Location: TBD

S7177 - Using Containers for GPU-Accelerated Applications

Felix Abecassis Systems Software Engineer, NVIDIA
Felix Abecassis is a systems software engineer at NVIDIA working on making GPU applications easier to deploy and manage in data centers. He focuses on supporting GPU-accelerated machine learning frameworks. He holds an M.S. in computer science from the French engineering school EPITA.e from the French engineering school EPITA.
Jonathan Calmels Systems Software Engineer, NVIDIA
Jonathan Calmels is a systems software engineer at NVIDIA working primarily on GPU data center software and hyperscale solutions for deep learning. Jonathan holds an M.S. in computer science and engineering.

We'll showcase how to leverage GPUs inside Linux containers using NVIDIA-docker. Containerizing GPU applications provides multiple benefits: 1) Developers can have reproducible builds and deploy their software seamlessly. 2) GPU applications can run across heterogeneous OS/driver/toolkit environments with no performance overhead. 3) GPU devices can be isolated and assigned to different users or different tasks. We'll go through the particularities of GPU containers and demonstrate how to use container images, from the most basic NVIDIA(R) CUDA(R) application to the most complicated deep learning frameworks. We may also present other containers technologies besides Docker/NVIDIA-docker, for instance the Singularity project from Lawrence Berkeley National Laboratory, if not already covered by other speakers.

Level: Intermediate
Type: Talk
Tags: Data Center and Cloud Computing; HPC and Supercomputing; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7178 - Bidirectional Recurrent Convolutional Networks and Their Applications to Video Super-Resolution

Yan Huang Research Assistant, Institute of Automation, Chinese Academy of Sciences
Yan Huang works as a research assistant in the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, in Beijing. Yan's research interests include machine learning and pattern recognition. He received his B.S. from the University of Electronic Science and Technology of China in 2012. He has published papers in the leading international journals and conferences such as IEEE Transactions on Multimedia, ICCV, and NIPS.

We'll discuss a fully convolutional version of recurrent neural networks, namely bidirectional recurrent convolutional networks, which can greatly reduce the number of learning parameters from millions to several hundreds. We'll demonstrate its effectiveness by achieving significant performance and running time improvements for the task of video super-resolution. Using GPUs can further accelerate the speed by 20 times.

Level: All
Type: Talk
Tags: Deep Learning and AI; Video and Image Processing

Day: TBD
Time: TBD
Location: TBD

S7181 - Evaluating Windows 10: Learn Why Your Users Need GPU Acceleration

Jason Kyungho Lee Sr. Performance Engineer, NVIDIA GRID, NVIDIA
Jason K Lee is part of the NVIDIA GRID Performance Engineering team, responsible for testing and evaluating the NVIDIA GRID platform, inclduing performance investigation and benchmarking, and developing automation and example code. He is a former solution architect, software engineer, and developer.
Hari Sivaraman Staff Engineer, VMWare
Hari Sivaraman is a staff engineer at VMware, working on virtual desktop infrastructure performance. Hari has broad experience in computer graphics and performance analysis. He has worked at HP and NVIDIA. Hari holds an M.S. in computer science from the Indian Institute of Science and a Ph.D. in computer science from Washington State University.
Lan Vu Performance Engineer, VMWare
Lan Vu is a performance engineer at VMware, focusing on optimizing performance and scalability of virtual desktop infrastructure solutions, including 3D graphics, Horizon View, Horizon Air/DaaS, and more. Previously, Lan worked at Parallel Distributed System Labs, University of Colorado, Denver, with a research focus on high-performance methods in data mining. She holds a Ph.D. in computer science and information systems from the University of Colorado, Denver.
Uday Kurkure Staff Engineer, VMware
Uday Kurkure is a staff engineer at VMware, working on virtual desktop infrastructure performance. Uday has broad experience in computer graphics, ASIC design, and compilers. He has worked at Adobe Systems, Transmeta, Synopsys, and MIPS Computers. Uday holds an M.S. in computer science from Stanford and a B.S. in electronics and telecommunications from the Indian Institute of Technology.

Learn why EVERY remote user should have GPU resources available to them. We'll discuss the advantages end-users experience once their virtual desktops/sessions have GPU capabilities. Recent data from the NVIDIA GRID Performance Engineering team shows a significant impact GPUs like the Tesla M10 has on knowledge workers. The data includes real user testing and scientific data like latency, bandwidth, and CPU utilization, which all play a significant role in the overall user experience.

Level: All
Type: Talk
Tags: Graphics Virtualization; Data Center and Cloud Computing

Day: TBD
Time: TBD
Location: TBD

S7182 - Sparse Persistent RNN

Feiwen Zhu Senior GPU Architect, NVIDIA
Feiwen Zhu is a senior GPU architect on the compute team at NVIDIA, where he has worked for four years. Feiwen has experience in PhysX GRB, medical imaging, and speech. His current focus is on deep learning and speech recognition.

Persistent RNNs can achieve huge speedups compared to GEMM-based RNN implementations. If an RNN is sparse, we can use sparsity to cut off zero elements, so that we can: (1) accelerate an RNN network further, (2) big sparse network can be fitted into small chip, or (3) process more networks in a big chip. The challenges of decoding sparse persistent RNNs are shared memory bank conflict cost and global synchronization overhead. We'll introduce a sparse persistent RNN implementation that includes three optimizations: using LDS.128 optimization and layout optimization to reduce bank conflicts, and using lamport bar to eliminate global synchronization overhead. The final result shows: for 10% sparse case, sparse persistent RNN can achieve >2X speedup of (dense) persistent RNN on GM200 and only uses 18 of 24 GM200 SMs. For 23% sparse case, it can achieve 2.2X performance on GP100 and use only 36 of 56 GP100 SMs.

Level: Beginner
Type: Talk
Tags: Deep Learning and AI; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7184 - How to Bring Engineering Datasets on Head-Mounted Displays

Andreas Mank Team Leader Software Development, ESI Group
Andreas Mank leads the visualization team in the Immersive Experience business unit at ESI, where he's responsible for driving advances in visualization technologies and delivering state-of-the-art, high-performance immersive engineering visualization, as well as high-quality rendering with ESI software products. Andreas has studied media computer science at the University of Applied Sciences in Wedel, Germany. He has over 10 years of experience in virtual reality-related software development. In recent years, he worked as a team leader in research and development.
Ingo Esser Senior Developer Technology Engineer , NVIDIA
Ingo Esser is a senior developer technology engineer in NVIDIA's Professional Solutions Group, where he works to help ISVs improving their rendering algorithms. These ISVs mostly work in the automotive and the oil and gas domains, where either rendering complex surfaces or visualizing large datasets is an issue. He has a diploma in computer science from the Computer Graphics Group at RWTH Aachen University, Germany.

Hear visualization experts explain why people in professional visualization, in particular virtual engineering, are great candidates to unleash the full potential of HMDs and how close today's technology pushes application developers to the finish line of discovering massive datasets with HMDs. Learn about new hardware (NVIDIA Pascal™-powered NVIDIA Quadro® GPUs), extensions, APIs (NVIDIA VRWorks™: NVIDIA SLI® VR, Single Pass Stereo), techniques (GPU culling), and next steps that enable ESI to create amazing VR experiences even with high node and triangle count.

Level: Intermediate
Type: Talk
Tags: Virtual Reality and Augmented Reality; Manufacturing Industries; Real-Time Graphics

Day: TBD
Time: TBD
Location: TBD

S7187 - Artificial Reality? Deep Learning With Synthetic Data From Driving Simulations

Daniel Wiesenhütter Software Engineer, VIRES GmbH
Daniel Wiesenhutter is a software developer at VIRES, a Bavaria-based software company for driving simulations. At VIRES, he first worked on traffic simulation, later transferring to the rendering team and working on projects such as rendering road surfaces and headlight simulation. More recently, he moved to Austria and opened a new branch for VIRES, supervising and working on the improvement of the rendering engine and traffic simulation. Daniel achieved his M.S. in computer science at the University of Applied Sciences in Munich. Under the supervision of Alfred Nischwitz, he worked on clustering lights to accelerate shadow computation.
Bernhard Bieder Software Engineer, VIRES GmbH
Bernhard Bieder is a software developer at VIRES, a Bavaria-based software company for driving simulations. At VIRES, he is working on traffic simulation as well as on Pegasus, a research project for autonomous driving, and OpenCRG, an open file format and API for managing road surfaces on microscopic level. Before VIRES, Bernhard worked at Sproing, a Viennese games developer mostly working on online multiplayer games. Bernhard achieved his MSc in engineering at the University of Applied Sciences FH Technikum Wien.

Learn how to boost your deep learning training process by utilizing features of a driving simulation. Besides a customizable source of video camera input, enhanced driving simulations can also provide information from non-visual sensors like lidar, radar, or ultrasound simultaneously. Train deep learning algorithms with visual, non-visual, or intermediate data like point clouds, bounding boxes, or object lists. Instead of labeling real videos by hand, use the information of the simulation to feedback and correct the results of your neural network. Run your simulation in faster than real time for distributed headless simulations or trigger every frame of the simulation to capture data for further processing. Embed your algorithms within the simulation (software in the loop) and test your AI in unusual situations, which are too risky in reality. Artificial reality? Not perfect, but a perfect complement in developing AI algorithms for autonomous driving.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7190 - Embedded Bayesian Perception and V2X Communications for Autonomous Driving

Christian Laugier First Class Research Director , Inria Grenoble
Christian Laugier is first-class research director at Inria. His research interests mainly lie in the areas of autonomous vehicles, embedded perception and decision-making, and Bayesian reasoning. Christian is a member of several IEEE International Scientific Committees and has co-organized numerous workshops and major IEEE conferences in the field of robotics such as IROS, IV, FSR, or ARSO. He also co-edited several books and special issues in high-impact robotics or ITS journals, such as IJRR, JFR, RAM, T-ITS, and ITSM. Christian recently brought recognized scientific contributions and patented innovations to the field of Bayesian perception and decision-making for autonomous robots and intelligent vehicles. He is an IROS Fellow and the recipient of several IEEE and conferences awards in the fields of robotics and intelligent vehicles, including the IEEE/RSJ Harashima award 2012. Christian has also co-founded four startups and is a scientific advisor for Probayes SA.

We'll present technologies developed by the Inria Chroma team to robustly perceive and interpret dynamic environments using Bayesian systems (such as BOF, HSBOF, and CMCDOT) relying on embedded sensors input and V2X communications (vehicle to vehicle and vehicle to infrastructure). These technologies have initially been developed in collaboration with industrial partners such as Toyota, Renault, or Probayes SA, with the objective to extend the capabilities of current Advanced Driving Assistance Systems (including autonomous driving functionalities). The technology is also currently transfered to industrial mobile robots. We'll show how heterogeneous sensors can be used efficiently, merged, and filtered in real time into probabilistic grids, and how collision risks can be computed in an optimized way on embedded GPUs, NVIDIA Jetson Tegra X1. The perception of the environment can also be distributed between connected cars and perception units using V2X protocols.

Level: Intermediate
Type: Talk
Tags: Self-Driving Cars; Algorithms; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7191 - New Vulkan Rendering Techniques

Christoph Kubisch Senior Developer Technology Engineer, NVIDIA
Highly-Rated Speaker
Christoph Kubisch is a senior developer technology engineer at NVIDIA, where he focuses on advanced Vulkan and OpenGL real-time rendering techniques suitable for CAD/DCC and scientific applications. He collaborates with external partners and NVIDIA's internal teams to optimize current and future rendering algorithms. Previously, Christoph was a researcher on hardware-accelerated visualization techniques for medical datasets at the Otto-von-Guericke University of Magdeburg. He has also worked as a technical artist creating game art, technology, and tools.

We'll present VK_NVX_device_generated_commands and related Vulkan extensions, which allow the GPU to generate the most frequent rendering commands on its own. This means for the first time an open graphics API provides functionality to create compact command buffer streams on the device, avoiding the worst-case state setup of previous indirect drawing methods. Next to details about the extensions, we'll provide techniques about their use in typical application scenarios.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7192 - OmpSs+OpenACC: Multi-Target Task-Based Programming Model Exploiting OpenACC GPU Kernels

Guray Ozen Research Assistant , Barcelona Supercomputing Center
Guray Ozen works on compiler and runtime-based accelerator programming systems as a researcher on the programming models team of the Barcelona Supercomputing Center. The aim of his research is to investigate how to improve parallelisation of existing sequential applications by using static or dynamic compilation pipelines. His research also explores programming languages in order to exploit accelerators. He is also the creator of current implementation of the MACC compiler, which supports OpenMP 4.5 directives for automatic GPU offloading. MACC, yet another research compiler to investigate directive-based OpenMP accelerator model for GPUs, is built on top of the Mercurium source-to-source compiler framework and supports OmpSs and almost all directives of the OpenMP accelerator model for GPU.

Discover how the OmpSs programming model enables you to develop different programming models such as OpenACC, multi-thread programming, CUDA, and OpenCL together while providing a single address space and directionality compiler directives. OmpSs is a flagship project in the Barcelona Supercomputing Center, as well as a forerunner of the OpenMP. We'll present the advantages in terms of coding productivity and performance brought by our recent work integrating OpenACC kernels within the OmpSs programming model, as a step forward to our previous OmpSs + CUDA support. We'll also present how to use hybrid GPU and CPU together without any code modification by our runtime system.

Level: Intermediate
Type: Talk
Tags: Programming Languages; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7193 - Achieving Portable Performance for GTC-P with OpenACC on GPU, Multi-Core CPU, and Sunway Many-Core Processor

Stephen Wang GPU Specalist, Shanghai Jiao Tong University
Stephen Wang is a GPU specialist at the Center for HPC at Shanghai Jiao Tong University. Stephen's research focuses on using OpenACC to achieve portable performance for HPC applications on present-day supercomputers.

Gyrokinetic Toroidal Code developed in Princeton (GTC-P) delivers highly-scalable plasma turbulence simulations at extreme scales on world-leading supercomputers such as Tianhe-2 and Titan. The aim of this work to achieve portable performance in a single source code for GTC-P. We developed the first OpenACC implementation for GPU, CPU, and Sunway processor. The results showed the OpenACC version achieved nearly 90% performance of NVIDIA® CUDA® version on GPU and OpenMP version on CPU; the Sunway OpenACC version achieved 2.5X speedup in the entire code. Our work demonstrates OpenACC can deliver portable performance to complex real-science codes like GTC-P. In additional, we request adding thread-id support in OpenACC standard to avoid expensive atomic operations for reductions.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7194 - Light Baking with IRAY

Martin-Karl Lefrançois DevTech Software Engeneer Lead , NVIDIA
Martin-Karl Lefrancois is a senior software engineer and team lead in the Developer Technology organization at NVIDIA in Berlin. Martin-Karl works with various NVIDIA rendering and core development teams to bring to clients the best rendering experience. Prior to NVIDIA, he worked at mental images to deliver automatic GPU support in mental ray. After graduating with a degree in computer science and mathematics from the University of Sherbrooke in Quebec, he worked as a graphic developer for nearly 10 years at Softimage in Montreal and Tokyo before leading the core game engine team at A2M.

Learn how to have global illumination in a real-time engine using Iray renderer. With Iray Photoreal Renderer, allow your real-time engine to use the full global illumination computation from the most advanced path tracer. The technique uses the properties of the physically based material (MDL) assigned to the object and all the various sources of energy in the scene. Sources can be high-dynamic-range images, built-in sun and sky, implicit lights, such as point, spot, and area lights, and also emissive objects. The method also allows use of light path expression, which can create a light map for a specific group of lights, or excluding objects from the calculation.

Level: All
Type: Talk
Tags: Real-Time Graphics; Rendering and Ray Tracing
Industry Segments: Media & Entertainment; Architecture / Engineering / Construction; Manufacturing; Retail / Etail

Day: TBD
Time: TBD
Location: TBD

S7196 - FMM with Periodic Boundaries Support on GPU

Bartosz Kohnke Software Developer, Max Planck Institute for Biophysical Chemistry
Bartosz Kohnke is a software developer by Max Planck Institute for Biophysical Chemistry in Gottingen in the department of Theoretical and Computational Biophysics. His job is CUDA-parallelization and optimization of the fast multipole method that will become a part of the GROMACS software. Before that Bartosz worked on efficient implementation of super resolution fluctuation imaging algorithms, researching on different parallelization techniques in the Laboratory of Cellular Dynamics at MPI Gottingen. He holds an M.S. in applied computer science from Georg-August-Universtitat Gottingen, Germany, with specialization in scientific computing.

The direct solution of the N-body problem is a simple, yet scientifically important and ubiquitous showcase algorithm for modern GPUs. However, the computational complexity is O(N^2). The fast multipole method is an algorithm that reduces runtime and complexity to optimal O(N) for any required precision. We'll present an optimized, fully NVIDIA(R) CUDA(R)-enabled, templated C++ implementation of the FMM, which considers all stages of the method, from particle input to the forces extraction. We compare different parallelization approaches and show the performance improvement when going from a dynamic parallelization to a presorted list-based approach that fits particular system constraints such as periodic boundary conditions. We'll discuss how to exploit the FMM operators such that both memory access overhead and the number of complex multiplications are minimized. Thereby the kernels are led to the compute bound range, and performance is increased.

Level: Intermediate
Type: Talk
Tags: Computational Physics; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7197 - 4K Video Processing and Streaming Platform on TX1

Tobias Kammacher Researcher, Zurich University of Applied Sciences
Tobias Kammacher has worked for the last three years in the High Performance Multimedia and Data Acquisition Research Group in the Institute of Embedded Systems at the Zurich University of Applied Sciences. He and his colleagues carry out research projects with industry partners, focused on implementing signal and video processing applications on SoC, FPGA, and mixed architectures. Tobias received his B.S. in electrical engineering and M.S. in engineering.

Learn how to build a platform for processing and streaming 4K video on the NVIDIA(R) Jetson(TM) TX1 processor. To achieve real-time video processing, the diverse processing resources of this high-performance embedded architecture need to be employed optimally. The heterogeneous system architecture of the Jetson TX1 allows capturing, processing, and streaming of video with a single chip. The main challenges lie in the optimal utilization of the different hardware resources of the Jetson TX1 (CPU, GPU, dedicated hardware blocks) and in the software frameworks. We'll discuss variants, identify bottlenecks, and show the interaction between hardware and software. Simple capturing and displaying 4K video can be achieved using existing out-of-the-box methods. However, GPU-based enhancements were developed and integrated for real-time video processing tasks (scaling and video mixing).

Level: Intermediate
Type: Talk
Tags: Media and Entertainment; Video and Image Processing; Intelligent Machines and IoT

Day: TBD
Time: TBD
Location: TBD

S7198 - Running Real-Time Processing for Revolutionary 3D Scanning

Artem Yukin President & CEO, Artec 3D
Artem Yukhin is one of the founders of Artec 3D. He set up the company in 2007 and currently serves as its CEO. Artem has 17+ years of experience in international executive management, startup development, product management and launch, human resources and fundraising. He holds 19 patents and patent applications in electronics, optics, and algorithm engineering. Earlier in Artem's career, he founded A4Vision Inc. (2001) and served as its CTO and member of the board of directors. He invented 3D face recognition technology (1999), and turned it into a worldwide recognized biometric solution that became an industry standard (ANSI, 2006).

Experience first-hand how the NVIDIA Jetson platform is powering the future of 3D scanning. The combination of a computer-on-a-chip with a complete software solution has allowed for the creation of the first fully embedded device that can both capture and process all 3D data onboard. Having an all-in-one solution – scanner, processor, and platform – allows for users to instantly see their 3D model being processed and rendered, making scanning nearly as easy as taking a video. The Jetson platform is the only solution able to support onboard real-time processing, while at the same time allowing for the fastest capture speed of any 3D scanner in the world. The audience will receive a live demo of this revolutionary technology, which will have a major impact on engineering, media, art, manufacturing, healthcare, and more.

Level: Intermediate
Type: Talk
Tags: Virtual Reality and Augmented Reality; Manufacturing Industries; Real-Time Graphics; Media and Entertainment
Industry Segments: Manufacturing; Architecture / Engineering / Construction; Media & Entertainment; Healthcare & Life Sciences

Day: TBD
Time: TBD
Location: TBD

S7199 - Interactive HPC: Large Scale In-situ Visualization using NVIDIA Index in ALYA MultiPhysics

Vishal Mehta Senior Engineer, Barcelona Supercomputing Center
Vishal Mehta works as a senior engineer at the Barcelona Supercomputing Center. He is motivated by a co-design approach driven by ambitious applications and influencing the software stack for the development of next-generation, exascale-ready HPC ecosystems. Vishal's fields of interest include computational mechanics, linear algebra, and GPU algorithms for computational science. He has six years of experience in working with GPUs in HPC ecosystem.
Christopher Lux Senior Graphics Software Engineer, NVIDIA IndeX R&D, NVIDIA
Christopher Lux is a senior graphics software engineer at the NVIDIA Advanced Rendering Center. He received is PhD in computer science in 2013 from the Bauhaus-Universität Weimar, Germany. Through his interest in real-time computer graphics and scientific visualization he early on focused his work on the interactive visualization of large-scale datasets from the geo-scientific and medical domain.
Marc Nienhaus Sr. Engineering Manager, Product Technology Lead, NVIDIA IndeX, NVIDIA
Marc Nienhaus is the product technology lead of the NVIDIA IndeX(TM) commercial software at NVIDIA. He manages the NVIDA IndeX software engineering team and is responsible for overall product architecture and applications in various domains. Before joining mental images' R&D rendering department and NVIDIA, Marc was a postdoc at Northwestern University and led research projects at the University of Potsdam. His research interests include parallel and distributed rendering and computing, scientific visualization, GPU-based rendering, and photorealistic and non-photorealistic expressive depictions. He holds a master's in mathematics with a minor in computer science from the University of Muenster and a Ph.D. in computer science from the Hasso Plattner Institute at the University of Potsdam. Marc has published various papers on GPU-based real-time rendering and non-photorealistic rendering.

We'll discuss how NVIDIA IndeX™ Advanced Rendering Tools are helping researchers get more insight through in-situ visualizations. HPC applications have always been centered around large computations, small input, and extremely large simulated output. HPC applications running on big supercomputers are executed using a queuing system, where researchers have to wait a couple of hours before analyzing the outputs. We've designed essential software components that allow in-situ visualizations of sparse volume data from ALYA multiphysics simulation code (Barcelona Supercomputing Center) using NVIDIA IndeX. ALYA multiphysics is one of the two European exascale benchmarks and is used in targeted medicine, cardiac modeling, renewable energy, etc. We'll guide you through techniques that have been used in enabling in-situ rendering and analysis of data.

Level: Intermediate
Type: Talk
Tags: In-Situ and Scientific Visualization; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7200 - Real-Time Anomaly Detection on Video and SCADA with Omni AI Platform

Ming-Jung Seow Chief Scientist, Omni AI
Ming-Jung Seow is chief scientist at Omni AI. Ming-Jung's research interests are machine learning, computer vision, and computational neuroscience. He received his Ph.D. in computer engineering in 2006. He has published over 50 scientific papers.
Rick Spitz CTO, Omni AI
Rick serves as CTO for Omni AI, who provides custom solutions related to AI and machine learning. He holds numerous patents related to transactional video. Rick is a former VP at Apple Computer. In the early 1990's he lead Mac System Software development in Cupertino. In 2004 Rick was CTO cofounder of ReachLocal. They provided online advertising for midsize businesses and went public in 2010. Earlier Rick was CTO of an online game startup, Worldwinner. Prior to Apple, Rick served as Group Manager for VAX/VMS Software at Digital Equipment Corp in New England. He has a bachelor’s degree in Electrical & Computer Engineering from Clemson University, Master of Science degree in computer engineering from University of Massachusetts, and completed Boston University’s Executive Leadership Institute. In the past he served on a National Science advisory council, the Unicode Board of Directors. His interests include outdoor photography, kayaking and worldwide travel with his wife Barbara.
Rick Spitz CTO, Omni AI
Rick serves as CTO for Omni AI, who provides custom solutions related to AI and machine learning. He holds numerous patents related to transactional video. Rick is a former VP at Apple Computer. In the early 1990's he lead Mac System Software development in Cupertino. In 2004 Rick was CTO cofounder of ReachLocal. They provided online advertising for midsize businesses and went public in 2010. Earlier Rick was CTO of an online game startup, Worldwinner. Prior to Apple, Rick served as Group Manager for VAX/VMS Software at Digital Equipment Corp in New England. He has a bachelor’s degree in Electrical & Computer Engineering from Clemson University, Master of Science degree in computer engineering from University of Massachusetts, and completed Boston University’s Executive Leadership Institute. In the past he served on a National Science advisory council, the Unicode Board of Directors. His interests include outdoor photography, kayaking and worldwide travel with his wife Barbara.

The potential information buried in sensors is enormous. There is far too much data from sensors to all be actively monitored and managed by agents. Large-scale autonomous monitoring systems require a significant amount of computing resources. Manual configuring of sensors to detect specific activities can be time consuming. Correlating and fusing different sensor modalities in real time for co-presence for anomaly could be computationally intractable. We'll demonstrate how the Omni AI platform uses NVIDIA GPUs to enable high-performance and high-scalability real-time anomaly detection on thousands of sensors using an unsupervised online machine learning engine with the neuro-linguistic cognitive model.

Level: All
Type: Talk
Tags: Intelligent Video Analytics; Intelligent Machines and IoT; Accelerated Analytics; Deep Learning and AI; Federal

Day: TBD
Time: TBD
Location: TBD

S7201 - Vulkan VR Rendering

Ingo Esser Senior Developer Technology Engineer, NVIDIA
Ingo Esser is a senior developer technology engineer in NVIDIA's Professional Solutions Group, where he helps ISVs improve their rendering algorithms. These ISVs mostly work in the automotive and the oil and gas domains, where either rendering complex surfaces or visualizing large datasets is an issue. Ingo has a diploma in computer science from the Computer Graphics Group at RWTH Aachen University, Germany.

Following up the multi-GPU and VR-related talks from past GTCs, we'll present new Vulkan VR extensions that provide functionality already available in OpenGL and DirectX. We'll discuss the new extensions and their usage, and provide short, concise samples highlighting key components for common use cases. We'll also give a brief update on the new OpenGL multicast extension, which allows for more flexibility and can be applied to more complex rendering pipelines.

Level: All
Type: Talk
Tags: Virtual Reality and Augmented Reality

Day: TBD
Time: TBD
Location: TBD

S7203 - Delivering Immersive Experiences Through GPU Virtualization and Streaming

Jan Wurster Team Leader Software Development, ESI Group
Jan Wurster works in multiple responsibilities as project manager for ESI's VRify cloud initiative and team leader for ESI's Immersive Experience and Simulation team. His development team is responsible for delivering state of the art immersive usability as well as ergonomics and physics simulation capabilities in ESI Group's IC.IDO product. Jan studied computer science in media at the University of Applied Sciences in Furtwangen. He has 14 years of software development experience in numerous aspects of virtual reality software and has led various development teams.

Introducing the transition from traditional workstation to immersive experience workspace, hear about novel NVIDIA and ESI technologies to combine streaming and virtualization for GPUs to provide scalable immersive virtual and augmented reality. We'll discuss the challenges in advancing to the immersive workspace for mobile, desk-side, or team-size immersive experiences through on-premise and cloud-based virtual engineering applications.

Level: Intermediate
Type: Talk
Tags: Manufacturing Industries; Virtual Reality and Augmented Reality; Graphics Virtualization

Day: TBD
Time: TBD
Location: TBD

S7204 - HPC and Deep Learning on Azure

Karan Batta Program Manager, Big Compute/HPC Team, Microsoft
Karan Batta is a program manager in the Big Compute/HPC team in Microsoft's Azure, where he leads the vision and deployment of the new Azure GPU N-Series as part of broader Azure Compute IaaS capabilities. Additionally, he leads the media and entertainment vertical solutions as part of the Azure Batch HPC service.

Learn how you can scale your traditional HPC-based applications or workloads in Azure using powerful NVIDIA(R) Tesla(R)-based GPUs and Azure's low-latency networking. Additionally, learn how our customers are running deep learning and AI workloads using these GPUs in Azure to create the best speech recognition models, natural language processing, and image/object detection for scenarios such as digital assistants or autonomous cars.

Level: All
Type: Talk
Tags: Data Center and Cloud Computing; Deep Learning and AI; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7205 - High-End Design & Visualizations on Azure

Karan Batta Program Manager, Big Compute, HPC Team, Microsoft
Karan Batta is a program manager in the Big Compute/HPC team in Microsoft's Azure, where he leads the vision and deployment of the new Azure GPU N-Series as part of broader Azure Compute IaaS capabilities. Additionally, he leads the media and entertainment vertical solutions as part of the Azure Batch HPC service.

Future high-performance computing systems will enable fast processing of large datasets, as highlighted by President Obama's Executive Order on National Strategic Computing Initiative. Of significant interest is the need for analyzing big graphs arising from a variety of areas from social networks and biology to national security. We'll present our ongoing efforts at the George Washington University in accelerating big graph analytics on GPUs. We've developed a GPU-based graph analytics system that delivers exceptional performance through efficient scheduling of a large number of GPU threads and effective utilization of GPU memory hierarchy. Our systems are one of the best GPU-based implementations, consistently ranking highly on Graph500 and Green Graph500.

Level: All
Type: Talk
Tags: Data Center and Cloud Computing; HPC and Supercomputing; Accelerated Analytics

Day: TBD
Time: TBD
Location: TBD

S7209 - Using NVIDIA FLeX for Real-Time Fluid Simulation in Virtual Surgery

Bradley Hittle Senior Research Software Engineer, Ohio Supercomputer Center
Bradley Hittle is a senior research software engineer at the Ohio Supercomputer Center, specializing in software engineering for the development, support, and evaluation of virtual systems and virtual reality-based simulations for medical applications. Brad's primary areas of research include the integration and evaluation of computer interface technology for virtual simulation, developing tools that aid viewing and interactivity with large data, the development of software and hardware systems for real-time virtual simulations and visualizations, and using GPU technology to provide increased algorithm performance. Brad has contributed extensively to projects funded through ARDF, NIDCD, NIOSH, and the National Institute of Health. His primary areas of expertise are real-time volume visualization, software engineering, computer interface technology for virtual systems, and GPU compute development.

Learn how to use NVIDIA FleX to simulate complex real-time fluid interaction. We'll use our virtual surgical environment to give a detailed overview of techniques and algorithms needed to incorporate FleX into your application. Topics include collision handling with dynamic volumetric data through signed distance field approximation, as well as tricks for emulating diffusion, bleeding, and absorption. We demonstrate the necessity for optimizations in a compute-intensive application through the use of threading and multi-GPU support. A basic understanding of the FleX library is assumed.

Level: Intermediate
Type: Talk
Tags: Healthcare and Life Sciences; Real-Time Graphics
Industry Segments: Healthcare & Life Sciences; Energy / Oil & Gas; Higher Education / Research; Government / National Labs

Day: TBD
Time: TBD
Location: TBD

S7210 - Deep Learning Applications for Embedded Avionics on the Jetson Platform

Aaron Mosher Design and Analysis Engineer, The Boeing Company
Aaron Mosher is a design and analysis engineer at Boeing Research and Technology. Aaron has worked on autonomy and sensor processing technologies for both unmanned ground vehicles and unmanned air vehicles. He holds multiple patents on obstacle detection technology and computer vision. He has a B.S. in computer engineering from the University of Alabama in Huntsville, and an M.S. in systems engineering from Missouri Science and Technology. He has worked on a variety of projects, including ground vehicles, air vehicles, and communications systems.

We'll discuss the uses and tradeoffs of semantic segmentation and detection networks when deployed on the Jetson TX1. There is significant research into deep learning semantic segmentation and detection networks since these can both detect and localize numerous objects within the image. We use FCN (fcn.berkeleyvision.org) as an example of a semantic segmentation network, and the DIGITS DetectNet as an example of a detection network. These networks require significant computing resources for inferencing, and within embedded avionics applications we wish to provide the best tradeoff of performance-per-watt by leveraging these networks on the Jetson TX1. We'll explore characteristics of these deep learning networks, how these deep learning capabilities can be utilized on the Jetson TX1 platform, and characterize their runtime performance on the Jetson TX1 compared to larger GPU systems.

Level: Intermediate
Type: Talk
Tags: Intelligent Machines and IoT; Computer Vision and Machine Vision; Deep Learning and AI; Federal

Day: TBD
Time: TBD
Location: TBD

S7211 - Using Tegra for Real-Time Image Processing

Eric Kelmelis CEO, EM Photonics
Highly-Rated Speaker
Eric Kelmelis is the co-founder and CEO of EM Photonics, a company focused on the development and transition of innovative research and technology in the fields of advanced imaging, high performance computing, and embedded systems. Eric received bachelor's and master's degrees in electrical engineering from the University of Delaware, has more than 70 publications, and holds two patents. He has also served as conference chair at SPIE's Defense, Security, and Sensing symposium since 2010.

ATCOM is an image processing application for removing the scintillation and warping effects in long-range videos. For years, we have used NVIDIA GPUs for achieving real-time performance allowing for the processing of live video streams. With the release of the NVIDIA Tegra X1, we are now exploring the potential of achieving the necessary performance in a mobile platform. In this talk, we discuss our work in achieving real-time performance in a computationally intense streaming video application using NVIDIA GPUs and our initial experience in mapping this work to a Tegra.

Level: All
Type: Talk
Tags: Video and Image Processing

Day: TBD
Time: TBD
Location: TBD

S7215 - Automating VR and Photoreal Imagery From Siemens Teamcenter

Dave Coldron Product Director, Lightwork Design Ltd.
As Lightworks Product Director, Dave Coldron has responsibility for the development of the Iray+ ecosystem, including Iray+ for 3DSMax and the new Iray+ Configurator. With over 20 years of experience in developing integrated systems for the computer graphics industry, Dave knows how to create applications that support the design workflow; focusing on the use of compelling digital content, interactive design, and the user experience.

Learn how manufacturers are automating and in-housing their digital photorealistic and VR/AR visualization pipelines out of Siemens Teamcenter and NX through JT. This is leading to improved efficiency and cost reduction and, crucially, enabling manufacturer control over digital assets that allows them to be repurposed across the business. We'll demonstrate how to set up an automated visual digital pipeline out of Siemens Teamcenter into NVIDIA Iray and Epic Unreal Engine, accounting for configuration rules and buildability.

Level: All
Type: Talk
Tags: Virtual Reality and Augmented Reality; Manufacturing Industries; Rendering and Ray Tracing
Industry Segments: Manufacturing

Day: TBD
Time: TBD
Location: TBD

S7216 - Machine Learning on VMware vSphere with NVIDIA GPUs

Lan Vu Senior Member of Technical Staff, VMware
Lan Vu is working on performance engineering at VMware, focusing on optimizing performance and scalability of virtual desktop infrastructure solutions, including 3D graphics, Horizon View, Horizon Air, and App Volumes. Previously, Lan worked at Parallel Distributed System Labs, University of Colorado, Denver, with a research focus on high-performance methods in data mining and machine learning. She holds a Ph.D. in computer science and information systems from the University of Colorado, Denver.
Uday Kurkure Staff Engineer, VMware
Uday Kurkure works on Virtual Desktop Infrastructure (VDI) Performance at VMware. Uday has broad experience in Computer Graphics, ASIC Design, and Compilers. He has worked at Adobe Systems, Transmeta, Synopsys and MIPS Computers. Uday holds a MS degree in Computer Science from Stanford and a B. Tech. in Electronics and Telecommunications from Indian Institute of Technology.
Hari Sivaraman Staff Engineer, VMware
Hari Sivaraman works on Virtual Desktop Infrastructure (VDI) Performance at VMware. Hari has broad experience in Computer Graphics, and performance analysis. He has worked at HP and NVIDIA. Hari holds a MS degree in Computer Science from the Indian Institute of Science and a Ph.D. in Computer Science from Washington State University.

Efficient deployment of GPU-based machine learning, especially deep learning, in cloud environments is an important focus of research and development. As the leader in cloud infrastructure software, VMware provides multiple solutions that optimize performance and enhance flexibility for machine learning workloads. We'll present the results of our research on machine learning with NVIDIA GPUs on VMware's vSphere platform. Learn different ways to deploy GPU-based workloads developed with popular machine learning frameworks like TensorFlow and Torch in a virtualized environment using VMware DirectPath I/O and NVIDIA GRID vGPU solutions. We'll discuss how to mix workloads to maximize resource utilization and deployment flexibility by running machine learning together with other workloads on the same server. Finally, we'll analyze the performance characteristics of machine learning with GPUs for multiple use cases and at different scales in virtualized cloud data centers.

Level: All
Type: Talk
Tags: Graphics Virtualization; Data Center and Cloud Computing; Deep Learning and AI
Industry Segments: Software; Cloud Services

Day: TBD
Time: TBD
Location: TBD

S7218 - Training of Deep Networks with Half-Precision Float

Boris Ginsburg Deep Learning Engineer, NVIDIA
Boris Ginsburg is a principal engineer working on deep learning algorithms at NVIDIA, which he joined in 2015. For last five years, Boris has worked on distributed deep learning algorithms and hardware accelerators for deep learning. Before that, he worked on hardware accelerators for machine learning, computer vision, and speech recognition; CPU architecture; and wireless networking. He has 60 issued patents and 15 patents applications in the area of CPU, GPGPU, and wireless networking. Boris earned his a Ph.D. in applied math (non-smooth optimization) from Technion in 1997.

We'll describe new algorithms used to train very deep networks with half-precision float. Float16 has two major potential benefits: better training speed and reduced memory footprint. But Float16 has very narrow numerical range (0.00006,65504). This narrow numerical range can result both in overflow ("inf/nan" problem) or underflow ("vanishing gradient") during training of deep networks. We'll describe the new scaling algorithm, implemented in nvcaffe, which prevents these negative effects. With this algorithm, we successfully trained such networks as Alexnet, GoogLeNet, Inception_v3, and Resnets without any loss in accuracy. Other contributors to this work are S. Nikolaev, M. Houston, A. Kiswani, A. Gholaminejad, S. Migacz, H. Wu, A. Fit-Florea, and U. Kapasi.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Algorithms; Performance Optimization
Industry Segments: Software

Day: TBD
Time: TBD
Location: TBD

S7220 - Streaming Graph Analytics on the GPU

Oded Green Research Scientist, Georgia Tech
Oded Green is a research scientist at Georgia Tech in the School of Computational Science and Engineering, where he also received his Ph.D. Oded received his M.S. in electrical engineering and his B.S. in computer engineering, both from the Technion. Prior to working at Georgia Tech, Oded was chief operating officer and research scientist at ArrayFire.

We'll present cuSTINGER - the first dynamic graph data structure for the GPU. We will start off by discussing the internals of the data structure. We'll compare cuSTINGER with CSR, a widely used static graph and matrix data representation, and show how that our dynamic graph data structure is within a few percent of static graph structures. We'll show additional performance results: time to initialize the data structure, time required to modify the graph (due to updates), and the update rate (which represents how many update per second cuSTINGER can deal with). Currently, cuSTINGER can sustain over 10 million updates per second. Lastly, we'll show a novel algorithm for counting triangles in a streaming environment which sustains million of updates per second.

Level: All
Type: Talk
Tags: Accelerated Analytics; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7222 - Deep Learning in the Connected Kitchen

Hristo Bojinov CTO, Innit, Inc.
Hristo Bojinov is CTO of Innit, where he is responsible for applying state of the art computer technology towards the company vision. Hristo works in the areas of computer vision, cloud and big data, as well as mobile and embedded software and hardware development. Previously, Hristo was CEO of Anfacto, a consulting company that specialized in Android OS customization for large wireless carriers. Before that, he was CSO at 3LM (acquired by MMI), the first MDM vendor to offer a full range of native enterprise service capabilities for Android devices. Prior to 3LM, Hristo held a variety of engineering and managerial roles at Facebook, Google, Decru (acquired by Network Appliance), and Oracle. He holds an M.S. and Ph.D. in computer science from Stanford University and S.B. from MIT (Course 6-3).

We'll present Innit's work applying deep learning technology to build a platform that powers the connected kitchen of the near future. We've been carrying out pioneering work in the applications of modern computing technology to tackle problems in the food space, with a specific focus on empowering the very personal relationship between people and food. Throughout the food ritual (from planning and shopping to cooking and serving), Innit connects information about food with personal preferences and needs, and delivers actionable information via multiple channels such as mobile apps and embedded user interfaces at home and at the store. Deep learning makes multiple appearances in this process, from the latest in CNN-based object detection and classification, to using CNN features for image retrieval and matching, to advanced sensing in extreme environments such as an operating oven.

Level: All
Type: Talk
Tags: Intelligent Machines and IoT; Computer Vision and Machine Vision; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7223 - Bring the Power of CUDA to Small Devices

Daniel Lang CTO, Toradex Inc.
Daniel Lang has over 14 years of experience in the embedded computer market, ranging from virtual and industry products to medical devices and super cars. For the last 10 years, Daniel helped to grow Toradex from a Swiss university startup to a global leader in ARM-based computers on modules. Located in Seattle, Wash., he holds the position of the CTO of Toradex Inc. He was born in Switzerland, where he got his university degree in electrical engineering.

Learn how to bring the power of GPUs and CUDA to small machines and IoT edge devices. Experience the development process from proof of concept to a production-ready device. NVIDIA TK1 and Jetson TX1 SoCs allow for the first time the use of high-performance GPGPUs on small, power-constrained devices. The complexity and cost to get from a maker board like the Jetson TK1 to a hardware design ready for customers are for many preventing progress. We'll explain how computer modules like the Jetson X1 module can be used to simplify the process and get you to market faster and cheaper. We'll go step by step through a typical development process. You'll learn what skills and resources you require to create an industrial-grade device. We'll evaluate how this approach compares to other solutions like single board computers and designs from scratch. If you know the power of GPUs, but don't know how to bring it to machines or IoT devices, this talk is for you!

Level: Beginner
Type: Talk
Tags: Intelligent Machines and IoT

Day: TBD
Time: TBD
Location: TBD

S7225 - Tuning Performance on Pascal GPUs: An Introduction to Ali-GPU Assembler and Its Usage in CNN optimization

Kai Chen Developer Engineer, alibaba-group
Kai Chen is a development engineer in the domain-specific computing team at Alibaba. Kai's work and interests include computer architecture and performance optimization. He is one of the developers of Ali-GPU Assembler used in Alibaba. He graduated from the University of Chinese Academy of Sciences.

Learn some advanced skills about performance optimization on NVIDIA GPUs, especially the Pascal family. NVIDIA has provided many powerful tools to analyze and improve the efficiency of CUDA kernels. However, in many specific cases, developers need to do some more detailed adjusting to get expected performance. We'll introduce a native assembler for Kepler, Maxwell, and Pascal architectures used in Alibaba. Also, we'll show turning experiences of CNN and GEMM implementation with this assembler as an example. If you're interested in assembly-level optimization and don't have such a tool in these architectures yet, you shouldn't miss this session.

Level: Advanced
Type: Talk
Tags: Deep Learning and AI; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7226 - VisionBrain: Deep Learning Platform for Customized Visual Recognition in Cloud

Yonghua Lin Senior Manager of Cognitive Platform and Cloud, IBM Research
Yonghua Lin is the founder and leader of SuperVessel cloud from IBM. Meanwhile, she is the senior technical staff member and senior manager of Cognitive Platform and Cloud in IBM Research. Yonghua has worked on system architecture, cloud, and cognitive platform research for more than 15 years. Her work covered all kinds of IBM multicore processors in the past 10 years, including IBM network processor, IBM Cell processor, PRISM, IBM POWER 7, and POWER 8. She was the initiator of mobile infrastructure on cloud from 2007, which has become the Network Function Virtualization today. She led the IBM team that built up the FIRST optimized cloud for 4G mobile infrastructure, and successfully demonstrated in ITU, Mobile World Congress, and elsewhere. She founded SuperVessel cloud to support OpenPOWER research and development in industry. Yonghua has more than 40 patents granted worldwide and publications in top conferences and journals.

We'll dive deep into VisionBrain, a deep learning platform for customized visual recognition in cloud. VisionBrain is developed by IBM Research, accessible on SuperVessel Cloud, and has been used in IBM commercial solutions. The platform aims to provide developer-customized model training and inference API services to support image/video object detection and classification. VisionBrain is based on container cloud and uses Marathon+Mesos for resource management. We'll focus on: (1) the architecture of VisionBrain, including user-defined data preprocessing, training, and inference with GPU-enabled container cloud, (2) novel deep learning technologies to enable customized model training with high accuracy and short training duration for visual recognition, and (3) how to do the intelligent GPU scheduling in container cloud for different workloads, and meet commercial SLA and high-availability requirements.

Level: All
Type: Talk
Tags: Data Center and Cloud Computing; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7227 - Optimization Performance for Speech Recognition of LSTM Algorithm on NVIDIA GPU

Liang You Leader of Alibaba Cloud HPC, Alibaba Cloud
Liang You is in charge of the High Performance Computing Team at Alibaba Cloud, a subsidiary of Alibaba Group. He focuses on platform construction and performance optimization for High Performance Computing domains including Machine Learning, Weather Forecasting and so on. Before joining Alibaba, Liang worked as senior technical engineer at Intel Software & Service Group.

We'll introduce some advanced skills about performance optimization for speech recognition of the LSTM algorithm. Speech recognition needs each LSTM forwarding channel to return results in a very short time. However, LSTM algorithm's implementation has really low efficiency on NVIDIA GPUs. Especially when processing multiple forwarding channels, different channels running on the same GPU will impact each other's performance dramatically. We optimize the performance and finally get amazing performance improvement for reducing the running time for each forward processing tunnel and enhance the number of forward processing tunnels.

Level: All
Type: Talk
Tags: Performance Optimization; Deep Learning and AI; Federal

Day: TBD
Time: TBD
Location: TBD

S7230 - Using Genetic Algorithms to Optimize Recurrent Neural Networks

Joseph Schneible Engineering Group Manager, Technica Corporation
Joseph Schneible is a software engineer leading the Independent Research and Development team at Technica Corporation in Dulles, Virginia. His research focuses on a systems-based approach to optimizing graph analytics and machine-learning algorithms for commodity hardware. Joseph holds a Ph.D. in physics from Syracuse University, where his research focused on parallel simulations of quantum field theory. Prior joining Technica, he performed postdoctoral research as a member of the High Performance Computing Lab at George Washington University. His research interests include the use of GPUs to accelerate simulations and analytics.

One of the more challenging tasks in deep learning is the design and tuning of neural networks for specific tasks. This process is often more of an art form than science, requiring expertise in deep learning and a significant amount of time for trial and error. We'll present the use of genetic algorithms to automate the process of tuning the hyper-parameters of recurrent neural networks, including the size of the network, the number of time-steps through which to back propagate and the learning rate. This approach allows us to take advantage of the model parallelism of GPU-based neural network training while also taking advantage of the data parallelism of genetic algorithms. We show that this approach reduces the barrier to entry to using neural networks and is faster than other automated network tuning approaches.

Level: Intermediate
Type: Talk
Tags: Algorithms; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7231 - Upscaling Beyond Super-Resolution Using Novel a Deep-Learning System

Pablo Navarrete Michelini Image Processing Researcher, BOE Technology Group Co., Ltd.
Pablo Navarrete Michelini has worked as an image processing researcher at BOE Technology Group Co., Ltd. since 2014. His major interest is in image and video processing algorithms for enhancement and compression, and especially those based on multi–level techniques. Pablo worked as a senior software engineer on video transcoding products at Yuvad Technologies (Beijing, China), from 2011-2013. He was an assistant professor in the Department of Electrical Engineering at the University of Chile from 2008–2011. He has been visitor student research collaborator at Princeton University, and an intern with the International Center for Numerical Methods in Engineering at the Technical University of Catalonia. Pablo received a B.S. in physics in 2000, a B.S. in electrical engineering in 2001, and an electrical engineer degree in 2002, all from the University of Chile in Santiago. He received his Ph.D. in electrical engineering from Purdue University in 2008.

We'll introduce a new deep-learning system that can upscale low-resolution images (for example, SD input) and generate much larger images of high quality (for example, 8K output). Our design incorporates the ability to adaptively select the best upsampling approach and can be effectively trained with standard optimization algorithms. We'll introduce a new approach to visualize the structure of our trained networks.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Media and Entertainment; Video and Image Processing

Day: TBD
Time: TBD
Location: TBD

S7232 - Processing the Next Generation of Angstrom-Scale Microscopy

Lance Wilson Senior HPC Consultant, Monash University
Lance Wilson is a mechanical engineer, who has been making tools to break things for the last 20 years. Now he supports the research community of Monash University in Melbourne, Australia.

Imaging datasets are becoming larger and larger as new generation equipment provides higher definition imaging and scanning modalities. Part of analysing these datasets involves choosing the optimal hardware and software. We'll look at the design choices and workflow made for processing cryo-electron microscopy data with results from an NVIDIA DGX-1 and cloud-provisioned HPC.

Level: All
Type: Talk
Tags: Computational Biology; Healthcare and Life Sciences; Video and Image Processing

Day: TBD
Time: TBD
Location: TBD

S7234 - Performance Optimization

Ken Jackson SVP, Real-Time and Linux, Concurrent
Kenrick R. Jackson joined Concurrent in 1977 and currently serves as senior vice president, Linux & Real-Time. Ken has previously held various senior executive positions in marketing, development, business development, professional services, and customer support at Concurrent. He holds a B.S. in mechanical engineering, a B.S. in electrical engineering, and an M.S. in computer engineering from Florida Atlantic University.

Mission-critical applications require hard real-time and a high level of determinism to ensure their success. NVIDIA Jetson is an ideal platform to support many applications that require hard real-time and a high level of performance. Our real-time Linux operating system and development tools provide the necessary environment for the end user to guarantee his application will run in the allotted time and reduce the development time and cost associated with his project by empowering the user to debug both CPU and GPU simultaneously.

Level: All
Type: Talk
Tags: Intelligent Machines and IoT; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7235 - Cognition System that Combines AI, RGB, and 3D for Self-Driving Cars

Youval Nehmadi CTO, VayaVision
Youval Nehmadi is the CTO at VayaVision. Previously, he worked at Applied Materials, where he served in several managerial R&D and business roles for 20 years. Youval holds a Ph.D. and M.S. in electrical and computer engineering from the Ben Gurion University of the Negev (Beer Sheva, Israel), an EMBA from Northwestern and Tel Aviv Universities (Tel Aviv, Israel), and a B.S. in physics from the Technion Institute (Haifa, Israel).
Ido Goren SW Manager, VayaVision
Ido Goren is veteran manager of software and multidisciplinary teams. He has served in R&D management roles at multiple startup companies and brings in-depth knowledge of complex real-time and embedded systems in the wireless and data communication fields. He holds a B.S. in computer science from Hebrew University of Jerusalem.

Learn how to use GPUs for 3D and camera deep learning fusion applications for autonomous driving. We'll describe real-time GPU applications that use AI in combination with RGB and 3D information for self-driving cars' cognition systems. Imaging cameras provide high resolution only in 2D and they are inadequate for depth and distance measurement. While lidar has relatively low resolution, it can provide 3D information. Smart fusing of both RGB and 3D information in combination with AI software enables the building of ultra-high reliability classifiers. This facilitates the required cognition application for semi-autonomous and fully autonomous driving.

Level: All
Type: Talk
Tags: AI for In-Vehicle Applications; Self-Driving Cars; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7238 - GPU Acceleration of Airway Reconstruction Guided Deep Learning on Lung Cancer Detection

Yuwei Chang Student, National Taiwan University, ESOE, Scientific Computing & Cardiovascular Simulation Laboratory
Yuwei Chang is a undergraduate student in National Taiwan University. His research interests include deep learning, image segmentation with high performance computing, and software-defined network. Yuwei has been working on the topic of image processing in medical domains for a year.

Recent research in deep learning has reached a state-of-art accuracy in various domains, including image classification, voice recognition, natural language processing, music generation, drug discovery, and genomics. In the diagnoses of lung diseases, the structure of the airway is critical for doctors to recognize abnormal sites such as cancer or tumors. The process of 3D airway reconstruction can work as feature extraction to help the recognition of benign tumors. While both the reconstruction and the deep learning requires a large computational resource and memory usage, these tasks are also time-consuming. With the advent of ever-improving GPUs, parallel programing can largely enhance the performance of lung cancer detection.

Level: Intermediate
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7240 - Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs

Jeffrey Kelling Scientist, Helmholtz-Zentrum Dresden-Rossendorf
Jeffrey Kelling is a scientist in the Computational Science group at Helmholts-Zentrum Dresden-Rossendorf, Germany, concerned with high performance computing.

We'll present a method for highly efficient lattice Monte Carlo simulations with correlation-free updates. Achieving freedom from erroneous correlations requires random selection of lattice sites for updates, which must be restricted by suitable domain decomposition to create parallelism. While approaches based on caching limit the number of allowed states, the multisurface-type approach presented here allows arbitrarily complex states. The effectiveness of the method is illustrated in the fact that it allowed us to solve a long-standing dispute around surface growth under random kinetic deposition in the KPZ-universality class. The method has also been applied to Potts models and is suitable for spin-glass simulations, such as those required to test quantum annealers, like D-Wave.

Level: All
Type: Talk
Tags: Computational Physics; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7241 - Spectral Clustering of Large Networks

Alexandre Fender Software Engineer, NVIDIA
Alexandre Fender has worked at NVIDIA for three years as a software engineer, focusing on emerging applications in graph analytics and sparse iterative methods. Alex is involved in the development of CUDA libraries such as nvGRAPH. In parallel, he is finishing his Ph.D. on accelerated iterative eigenvalue solvers based on Krylov methods for networks analysis.
Maxim Naumov Senior Research Scientist, NVIDIA
Highly-Rated Speaker
Maxim Naumov is a senior research scientist at NVIDIA. His interests include parallel algorithms, numerical linear algebra, optimization, and graphs. He also contributes to the Data Analytics nvGRAPH library. Maxim has led the development of the AmgX library, which provides distributed Algebraic Multigrid, Krylov and Relaxation-based schemes. He has worked on the cuBLAS, cuSPARSE, and cuSOLVER(RF) libraries that are part of the CUDA Toolkit. Previously, Maxim held different positions on NVIDIA's CUDA Platform and Emerging Applications teams and Intel's Microprocessor Technology Lab and Computational Software Lab. He received his Ph.D. in computer science, with specialization in computational science and engineering, in 2009 and his B.S. in computer science and mathematics in 2003, from Purdue University - West Lafayette.

We'll explore techniques for expressing graph clustering as an eigenvalue problem. Attendees will learn how to express different metrics, including minimum balanced cut, modularity, and Jaccard, through associated matrices and how to use their eigenvectors to find the clustering of the graph into multiple partitions. We'll also show how to take advantage of efficient implementation of Lanczos and LOBPCG eigenvalue solvers and k-means algorithm on the GPU to compute clustering using our general framework. Finally, we'll highlight the performance and quality of our approach versus existing state-of-the-art techniques.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Algorithms; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7243 - Lightweight Compression Methods Achieving 120GBps and More

Piotr Przymus Postdoc, Aix-Marseille University, France
Piotr Przymus obtained his Ph.D. in 2014 from the University of Warsaw, Poland. Currently he is a postdoc at Aix-Marseille University, France. Piotr has been an active researcher and promoter of GPUs since 2011.

We'll investigate new approaches to parallel lossless lightweight compression methods based on fixed-length minimum bit encoding for GPU processors. We'll discuss various memory access patterns and develop a new optimal memory organization. By utilizing new inter-thread and inter-warp communication abilities, we propose algorithms that suit the GPU architecture better. As a result, we significantly improve compression ratio and bandwidth. This allows for many new applications in computational clusters as well as in computational algorithms. Our claims are supported by tests conducted using simulated data and TPC-H database benchmarking tools.

Level: Beginner
Type: Talk
Tags: In-Situ and Scientific Visualization; HPC and Supercomputing; Media and Entertainment; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7244 - From Desktop to Cloud to Embedded GPUs: Designing, Training and Compiling Vision Algorithms and Deep Learning Using MATLAB

Girish Venkataramani Development Manager, MathWorks
Girish Venkataramani is product development manager at MathWorks with over 15 years of experience in designing optimizing compilers for embedded systems. At MathWorks, he leads a product development team who deliver key technical innovation in code-generation products that enable deployment of MATLAB and Simulink applications on to embedded hardware platforms like ARM processors, FPGAs, and GPUs. He graduated from Carnegie Mellon University with a Ph.D. in electrical and computer engineering. His research interests are in the fields of optimizing compiler design, computer architecture (GPUs, FPGAs, embedded CPUs), computer vision, and deep learning deployment.
Avinash Nehemiah Product Marketing Manager, MathWorks
Avinash Nehemiah is a product marketing manager for computer vision and automated driving at MathWorks. He has 10 years of experience in computer vision. Previously, he led a team that created an embedded computer vision-based solution for patient safety in hospital rooms. Avinash has a master's degree in electrical and computer engineering from Carnegie Mellon University, where his research focused on object recognition in radar imagery.
Joss Knight Senior Developer, MathWorks Ltd
Joss Knight researched visual geometry and robot navigation at Oxford University’s Robotics Research Group, before moving to the video games industry as Head of Research for animation middleware developer NaturalMotion, investigating intelligent character simulation. He now works in the MathWorks UK office as senior developer for the GPU and distributed algorithms capabilities in MATLAB.

Learn how to adopt a MATLAB-centric workflow to design, develop, and deploy your computer vision and deep learning applications on to GPUs whether on your desktop, a cluster or on embedded Tegra platforms, including Jetson TK1/TX1 and DRIVE PX boards. The workflow starts with algorithm design in MATLAB, which enjoys universal appeal among engineers and scientists because of its expressive power and ease of use. The algorithm may employ deep learning networks augmented with traditional computer vision techniques and can be tested and verified within MATLAB. Next, those networks are trained using MATLAB's GPU and parallel computing support either on your desktop, a local compute cluster, or in the Cloud. Next Finally, a compiler auto-generates portable and optimized CUDA code from the MATLAB algorithm, which is then cross-compiled and deployed to the Tegra board. We'll use examples of common computer vision algorithms and deep learning networks to describe this workflow, and we'll present their performance benchmarks, including training with multiple GPUs on an Amazon P2 cloud instance.

Level: Intermediate
Type: Talk
Tags: Tools and Libraries; Deep Learning and AI; Intelligent Machines and IoT

Day: TBD
Time: TBD
Location: TBD

S7247 - High-Bandwidth 3D Image Compression to Boost Predictive Life Sciences

Jeffrey Kelling Scientist, Helmholtz-Zentrum Dresden-Rossendorf
Jeffrey Kelling is a scientist in the Computational Science group at Helmholts-Zentrum Dresden-Rossendorf, Germany, concerned with high performance computing.

Modern microscopes easily produce large data volumes (terabyte datasets) at high rate (1,000 megabytes/s is no exception) that makes using them almost impossible. Once an acquisition is started, it typically has to be stopped again as the hard drives run full. We'll share how GPUs helped us bring this nightmare to an end. We'll introduce our open-source package, called sqeazy, that is capable of compressing microscopic data at faster speeds than a hard drive can spin. We show how GPUs provided a crucial boost in this endeavor and we'll share what technical challenges we overcame interfacing with modern video encoding libraries, like libavcodec of ffmpeg. Finally, we'll discuss how NVENC provides portable performance that helps scientists to observe living developing specimens over long time spans. This may be the foundation for modern predictive biology of the 21st century. Join us for a tour on how modern media technology straight from Hollywood can boost science!

Level: Intermediate
Type: Talk
Tags: Video and Image Processing; Healthcare and Life Sciences

Day: TBD
Time: TBD
Location: TBD

S7248 - GPU Computing for the Construction Industry: AR/VR for Learning, Planning, and Safety

Kyle Szostek Sr. Virtual Construction Engineer, Gilbane Building Company
Kyle Szostek is a senior virtual design and construction engineer who has been with Gilbane Building Company for the last four years, managing virtual design and construction services for over $2 billion of construction projects. He's focused his work on research and development of collaborative BIM workflows, visualization techniques, and AR/VR tools. With a background in 3D art and a bachelor's of architecture from the University of Arizona, Kyle brings a unique 3D visualization skillset to Gilbane's VDC team.
Ken Grothman Sr. Virtual Construction Engineer, Gilbane Building Company
Ken Grothman is a senior virtual design and construction engineer who has been with Gilbane Building Company for two years, involved in over $1 billion of construction work, including high-end corporate, medical facilities, and mission-critical data infrastructure. Ken specializes in laser scanning and reality capture, and is an active member in the industry's laser scanning community. With a background in design|build architecture and a master's of architecture from the University of Kansas, Ken brings a pragmatic, problem-solving skillset to Gilbane's VDC team.

We'll dive headfirst into some of the current challenges of the construction industry, how we're addressing them, and how we're planning to utilize virtual/augmented reality and real-time GPU computing to address them. To optimize the construction of a building, site logistics must be planned, and all systems analyzed and coordinated to confirm constructability. Along with the use of building information modeling (BIM) and the advent of inexpensive GPU and AR/VR hardware, we're building tools to redefine the planning and analysis process for construction management. No longer are virtual and augmented reality systems just for entertainment; they can help us plan faster, help confirm our client's design goals, and facilitate stronger communication among our team members before and during the construction process.

Level: Beginner
Type: Talk
Tags: Virtual Reality and Augmented Reality; AEC Industries
Industry Segments: Architecture / Engineering / Construction

Day: TBD
Time: TBD
Location: TBD

S7249 - Accelerated Event Selection at the CMS Experiment

Felice Pantaleo Senior Fellow, CERN
Felice Pantaleo is a particle physicist. He received his Ph.D. from the University of Hamburg, CERN, DESY. Felice has worked in the field of high-throughput computing for six years, and today he is working on real-time event reconstruction and selection for the CMS experiment at the Large Hadron Collider at CERN, Switzerland.

In 2019 the Large Hadron Collider will undergo upgrades to increase the luminosity by a factor two if compared to today's nominal luminosity. Currently CMS software parallelization strategy is oriented at scheduling one event per thread. However tracking timing performance depends from the factorial of the pileup leading the current approach to increase latency. When designing an HEP trigger stage, the average processing time is a main constraint and the one-event-per-thread approach will lead to a smaller than ideal fraction of events for which tracking is run. GPUs are becoming wider, with millions of threads running concurrently, and their width is expected to increase in the following years. A many-threads-per-event approach would scale with the pileup offloading the combinatorics to the number of threads available on the GPU. We'll discuss the real-time event selection, how to avoid recurrent data movements between host and device, and how to increase throughput density.

Level: All
Type: Talk
Tags: Computational Physics; Astronomy and Astrophysics

Day: TBD
Time: TBD
Location: TBD

S7252 - An Efficient Connected Components Algorithm for Massively Parallel Devices

Jayadharini Jaiganesh Graduate Student, Texas State University
Jayadharini Jaiganesh is a graduate research assistant in the Efficient Computing Laboratory at Texas State University. She is pursuing a master's degree in computer science. She received her bachelor's degree with distinction from Madras Institute of Technology, Anna University, India. Her research interests include parallelization and performance optimization of irregular graph algorithms for GPUs and CPUs. Jayadharini is the recipient of a Graduate College scholarship and a Graduate Research Excellence award, both from Texas State. She was also a recipient of research funding from the Centre of Technology and Development Transfer, Government of Tamilnadu, India.

Learn how to efficiently parallelize connected components, an important irregular graph algorithm. Our CUDA implementation is asynchronous, lock free, converges rapidly, and employs load balancing. It is faster than other GPU codes on all 18 real-world and synthetic graphs we tested. We'll describe how to parallelize this graph algorithm by exploiting algorithmic properties, discuss important optimizations to improve the efficiency, and compare the performance with some of the fastest prior GPU implementations of connected components.

Level: Advanced
Type: Talk
Tags: Algorithms; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7253 - Kokkos Hierarchical Task-Data Parallelism for C++ HPC Applications

H. Carter Edwards Principle Member Technical Staff, Sandia National Laboratories
Highly-Rated Speaker
H. Carter Edwards is the principal investigator and architect for the Kokkos project at Sandia National Laboratories. Carter has over three decades of experience in modeling and simulation software development and over two decades of experience in HPC, parallel processing, and C++ software development. For the last several years, his HPC focus has been on algorithms and programming models for thread-scalable and performance portable parallelism across next-generation platform node architectures. Carter has a B.S. and M.S. in aerospace engineering and a Ph.D. in computational mathematics. He represents Sandia on the ISO C++ language standard committee.

The Kokkos library provides C++ HPC applications with a performance portable programming model for disparate manycore architectures such as NVIDIA® Pascal™, AMD Fusion, and Intel Xeon Phi. Until last year Kokkos supported only composition of data parallel patterns (foreach, reduce, and scan) with range and hierarchical team parallel execution policies. Our latest parallel pattern is a dynamic, directed acyclic graph (DAG) of heterogeneous tasks where each task supports internal data parallelism. At GTC16 we presented preliminary results based upon just-in-time access to an early release of NVIDIA CUDA® 8. We've had a year to mature this highly challenging task-DAG capability and present results using the NVIDIA Pascal GPU.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing; Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7254 - GPU-Enabled Differential Dependency Network Analysis of Large Datasets

Gil Speyer Senior Postdoctoral Fellow, The Translational Genomics Research Institute
Gil Speyer researches computational biology in the Integrated Genomics Division at the Translational Genomics Research Institute, where he develops statistical and numerical applications for large biological datasets. He received his doctorate in electrical engineering at Arizona State University, and he has worked at the ASU Advanced Computing Center and the Mayo Clinic.

We present EDDY-GPU, a GPU-accelerated algorithm to identify pathways enriched with differential dependencies between two conditions. High sensitivity has been one benefit of this statistical rigor yet at considerable computational cost, which limits the size of data for EDDY analysis. However, the ample and regular compute, coupled with small memory footprint, positioned EDDY as an ideal candidate for GPU-acceleration. Now complete, EDDY-GPU exhibits two orders of magnitude in performance enhancement. Such improvement provides new opportunities for EDDY-GPU such as 1) TCGA pan-cancer analysis to identify pathways perturbed by multiple mutation compared to wild-type, and 2) personalized target discovery of an individual tumor patient enabled by single cell RNAseq profiles of tumor sample.

Level: Intermediate
Type: Talk
Tags: Computational Biology

Day: TBD
Time: TBD
Location: TBD

S7255 - cuTT: A High-Performance Tensor Transpose Library for GPUs

Antti-Pekka Hynninen Developer Technology Engineer, NVIDIA
Antti-Pekka Hynninen is a developer of technology engineer at NVIDIA, where he focuses on the GPU performance of molecular dynamics applications. Previously, Antti-Pekka worked at Oak Ridge National Laboratory and at the National Renewable Energy Laboratory, where he was responsible for GPU porting and performance optimization of NAMD and CHARMM applications. Antti-Pekka holds a Ph.D. in physics from Utrecht University and did post-doctoral research at Princeton University on Monte Carlo simulations of charged colloids.

We'll introduce cuTT, a tensor transpose library for GPUs that on average achieves over 70% of the attainable memory bandwidth, independent of tensor rank. Tensor transposing is important in many applications such as multi-dimensional Fast Fourier Transforms and deep learning, and in quantum chemistry calculations. Until now, no runtime library existed that fully utilized the remarkable memory bandwidth of GPUs and could perform well independent of tensor rank. We'll describe two transpose algorithms, "Tiled" and "Packed," which achieve high-memory bandwidth in most use cases, as well as their variations that take care of many important corner cases. We'll also discuss a heuristic method based on GPU performance modeling that helps cuTT choose the optimal algorithm for the particular use case. Finally, we'll present benchmarks for tensor ranks 2 to 12 and show that cuTT, a fully runtime library, performs as well as an approach based on code generation.

Level: Intermediate
Type: Talk
Tags: Algorithms; Performance Optimization; Tools and Libraries; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7256 - Learning Light Transport the Reinforced Way

Ken Dahm Research Scientist, NVIDIA
Ken Dahm is a research scientist at the NVIDIA Advanced Rendering Center in Berlin. His areas of interest include medical imaging, deep learning, reinforcement learning and function approximation in computer graphics, combined density estimation and bidirectional path tracing using implicit multiple importance sampling, GPU-assisted ray front rendering, GPU-based ray marching with distance fields, and parallel algorithms for partition caches for divide-and-conquer ray tracing. Ken has an M.S. in visual computing from Saarland University.

We show that the equations for reinforcement learning and light transport simulation are related integral equations. After a brief introduction of reinforcement learning and light transport simulation we visualize the correspondence between the equations by pattern matching. Based on this correspondence, a scheme to learn importance during sampling path space is derived. The new approach is demonstrated in a consistent light transport simulation algorithm that uses reinforcement learning to progressively learn probability density functions for importance sampling. Furthermore we show that our method is easy to integrate into any existing path tracer and can greatly increase rendering efficiency.

Level: Intermediate
Type: Talk
Tags: Rendering and Ray Tracing; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7258 - Speed Up Deep Learning Service: When GPU meets Container Cloud

Yubo Li Research Staff Member, IBM
Yubo Li is from IBM Research China, Beijing. He is working with GPU enablement on Kubernetes/Mesos/OpenStack cloud and architecture design for deep learning (DL). He is the chief architect for GPU acceleration service on SuperVessel, an open-access cloud with OpenStack and OpenPOWER machines. He now is working on GPU enablement on container clound infrastructures, including Kubernetes, Mesos and Marathon, etc.
Seetharami Seelam Research Staff Member, IBM
Seetharami Seelam is a Research Staff Member at the T. J. Watson Research Center. His current research interests include developing technology to deliver hardware, middleware, containers, and applications as-a-service on the cloud.

Time witnesses the revolution of infrastructure changing for on-cloud services. Micro-service with container (Docker) infrastructure is widely used in production deployment. Open-source container cloud management frameworks, notably, Mesos and Kubernetes, are widely used. We'll discuss our efforts and progress to enable GPU on container clouds with Mesos and Kubernetes. For Mesos, we bring GPU-enabled Mesos/Marathon infrastructure to Bluemix China Public as commercial PaaS for third-party's cognitive services deployment. In community, we contributed our work back and completed the preliminary GPU support on Mesos 1.0 and Marathon 1.3 with both Mesos container and Docker container, in collaboration with Mesosphere and NVIDIA teams. For Kubernetes, we completed several major GPU features extension to support IBM's internal cognitive infrastructure, and prepare to contribute back to community soon.

Level: All
Type: Talk
Tags: Data Center and Cloud Computing; Deep Learning and AI; Federal

Day: TBD
Time: TBD
Location: TBD

S7260 - Microswimmers on Speed: Simulating Spheroidal Squirmers on GPUs

Elmar Westphal Scientific Programmer, Forschungszentrum Jülich GmbH
Highly-Rated Speaker
Elmar Westphal has been working as a programmer and cluster architect at Forschungszentrum Juelich for more than 15 years. He's most recently ported simulation programs from different fields of computational physics to single- and multi-GPU systems and developed CUDA-based building blocks, libraries, and applications, mostly for molecular dynamics and micromagnetism simulations.

Accurately simulating the movement of even small creatures like paramecium in a solvent requires an enormous amount of calculation. Most of these calculations are needed to simulate hydrodynamic interactions. Even small setups may contain millions of "water particles" surrounding the actual objects of our studies. These particles move and interact - with each other and the microswimmers - and we have to keep track of their properties like angular momentum as well as potential collisions between fluid particles and the actual swimmers. Using a GPU-based implementation of the multi-particle collision dynamics method allows us to perform the hydrodynamic interactions with very high performance and the combination of C++ 11 and managed memory helped to implement additional GPU- and CPU-based processing with much less effort than it would have taken not long ago.

Level: All
Type: Talk
Tags: Computational Fluid Dynamics; Computational Physics; Computer Aided Engineering

Day: TBD
Time: TBD
Location: TBD

S7261 - A New Approach to Active Learning by Query Synthesis Using Deep Generative Networks

Jia-Jie Zhu Postdoctoral Fellow, Boston College
Jia-Jie Zhu is a postdoctoral research fellow at Boston College. His research interests include machine learning, deep learning and optimization. His recent work focuses on applying deep learning techniques to semi-supervised learning tasks. In 2010, Dr. Zhu graduated from Fudan University with a B.Sc. degree in mathematics. He received his Ph.D. degree in mathematics under the supervision of Prof. William Hager at the University of Florida in 2015. Outside his own research area, Zhu is also interested in reinforcement learning and game theory.

We'll introduce a new active learning algorithm that is made practical using GPUs. Active learning concerns carefully choosing training data to minimize human labeling effort. In a nutshell, we apply deep generative models to synthesize informative "queries" that, when answered by a human labeler, allow the learner to learn faster. The learning is "active" in the sense that these questions are synthesized in an online manner adaptive to the current knowledge, thus minimizing the number of queries needed. Unlike traditional supervised machine training, our training is performed mostly on machine-synthesized data. To our knowledge, this is the first work that shows promising results in active learning by query synthesis.

Level: Intermediate
Type: Talk
Tags: Computer Vision and Machine Vision; Deep Learning and AI; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7262 - A General Framework for Hybrid Stochastic Model Calibration on the GPU

Mark York Senior Quantitative Analyst, Renaissance Risk Management Labs
Mark York holds a Ph.D. in physics from McGill University, having specialized in mathematical physics (specifically numerical solutions/approximate solutions in Yang Mills theory) and possessing a strong background in C. Following this, he worked for over a year as a postdoctoral fellow at the Interdisciplinary Institute for Technological Innovation (3IT), shifting focus to applied research in photovoltaic technologies. His expertise at 3IT was numerical simulations of the characteristics (conversion efficiency, voltage, current, etc.) of photovoltaic devices, having contributed to an ongoing string of papers reporting the highest monochromatic light to electricity conversion efficiency (69.5%) for a multi-junction PV device in the literature. Mark recently left academia to pursue a software startup focusing on the application of GPU technology to quantitative finance.

We'll present an overview of a GPU-based approach to calibrating hybrid models in finance, that is, multi-factor correlated stochastic processes to market data (term structure and volatility surfaces). Examples of such models range from the relatively benign 3-factor JY inflation model, to single currency and forex equity baskets, up to a completely general basket of rate/inflation/equity/forex/credit processes described by a global correlation matrix. Due to the inherently multi-threaded nature of Monte Carlo path generation, and the availability of cuRAND, a GPU implementation vastly outperforms CPU or PDE solvers, which are plagued by high dimensionality. Details of the algorithm, as well as a demonstration and analysis of timings and memory limitations will be covered.

Level: Intermediate
Type: Talk
Tags: Finance

Day: TBD
Time: TBD
Location: TBD

S7263 - Bayesian Inference and Markov Chain Monte Carlo Algorithms on GPUs

Alexander Terenin PhD Student, UC Santa Cruz
Alexander Terenin is a Ph.D. student in statistics and applied mathematics at the University of California, Santa Cruz. His research focuses on Bayesian statistics at scale, especially Markov Chain Monte Carlo methods in novel hardware environments such as compute clusters and GPUs that are found in the big data setting. Prior to attending UCSC, he completed his bachelor's degree with a double major in statistics and psychology at the University of California, Santa Barbara, where he graduated with highest honors and was selected to be commencement speaker at graduation. His data science experience includes over a year at eBay, Inc., where he worked on natural language processing tasks for improving its search engine. He is the author of three papers available on arXiv and currently under review, and has given five presentations at leading international research conferences in statistics.
David Draper Professor, UC Santa Cruz
David Draper is a professor of statistics in the Department of Applied Mathematics and Statistics at the University of California, Santa Cruz. From 2012 to 2015, he was also a distinguished statistical scientist, visiting professor, and senior director of the Center of Excellence in Statistical Research (CESR) at eBay Research Labs in San Jose, Calif. David founded CESR in 2013. In 2015, he worked on a short-term data-science project as senior principal research scientist at Amazon Analytics in Seattle, Wash. He is a fellow of the American Association for the Advancement of Science, the American Statistical Association, the Institute of Mathematical Statistics, and the Royal Statistical Society. From 2001 to 2003, he served as president-elect, president, and past president of the International Society for Bayesian Analysis.

We'll discuss the Bayesian statistical paradigm and Markov Chain Monte Carlo (MCMC) algorithms - the cornerstone of modern Bayesian computation. Scalable MCMC for big datasets and complex models is currently an open research question. Using GPUs provides a promising and largely unexplored avenue for accelerating these algorithms, but is nontrivial, because MCMC is inherently sequential and has traditionally been considered difficult to parallelize. We'll show how Gibbs sampling, a widely used MCMC algorithm, can be effectively parallelized on GPUs for a large class of exchangeable hierarchical Bayesian models. Participants will learn the mathematical and hardware/software challenges in bringing GPUs to the Bayesian community. Background in Bayesian statistics or MCMC is not assumed.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Algorithms; Federal

Day: TBD
Time: TBD
Location: TBD

S7265 - Quasi-Recurrent Neural Networks - A Highly Optimized RNN Architecture for the GPU

Stephen Merity Senior Research Scientist, Salesforce Research
Stephen Merity is a senior research scientist at Salesforce Research, joining as part of the MetaMind acquisition. His recent publications have involved adding memory and attention mechanisms to neural networks. Prior to joining MetaMind, Stephen worked on big data at Common Crawl, data analytics at Freelancer.com, and online education at Grok Learning. Stephen holds a master's degree in computational science and engineering from Harvard University and a bachelor of information technology from the University of Sydney. Stephen can be found on Twitter at @smerity.
James Bradbury Research Scientist, Salesforce Research
James Bradbury is a research scientist at Salesforce Research, focusing on deep learning models for natural language processing. He is a graduate of Stanford University's linguistics program, and joined MetaMind—now Salesforce Research—in 2015. He is a coauthor of three deep learning research papers and a contributor to the Chainer neural network framework. James can be found on Twitter at @jekbradbury.

We introduce quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that provides predictive accuracy equal or better than cuDNN LSTMs while being up to 16 times faster at train and test time than the highly optimized NVIDIA cuDNN LSTM implementation. This is possible by constructing an RNN architecture tailored to achieve high throughput on an NVIDIA GPU using convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function written in CUDA, which applies in parallel across channels. We'll discuss in detail the design choices of the QRNN, including how to investigate GPU efficiency using the NVIDIA Visual Profiler, and finally our experiments on language modeling, sentiment classification, and character-level neural machine translation that show the advantages and viability of QRNNs as a basic building block for a variety of sequence tasks.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Tools and Libraries; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7266 - Higher Performance LBM Simulation on GPUs

Wei Ge Professor, Institute of Process Engineering
Wei Ge is a professor at the Institute of Process Engineering in Beijing. Wei is mainly engaged in multi-scale simulation of multi-phase systems, including fluidization, micro-/nano-flow and transport, granular and porous media flows. As project leader, he developed the Mole series multi-scale supercomputing systems to bridge simulations of molecular details to reactor performance. Wei is now working on virtual process engineering, trying to establish digital counterparts of real engineering processes through accurate realtime simulation and interactive realistic visualization on supercomputers. He has been the principal investigator of the GCOE at CAS-IPE. He is the author of over 140 journal papers and five monographs. He won the Outstanding Youth in Basic Science Award of Zhou Guangzhao Foundation, the P&G Outstanding Youth in Particuology, and the National Science Fund for Distinguished Young Scholars. He is associate editor of Chemical Engineering Science and director of the State Key Laboratory of Multi-Phase Complex Systems.

The Lattice Boltzmann method has been used widely in the simulations of turbulence, porous media flow, and multiphase flows. It's efficient for its high parallelism and scalability, however, due to the low ratio of computational operations to memory access requirements, LBM simulations are memory-bound and their actual performance is typically 10 to 15% of the peak performances for both CPUs and GPUs. We'll introduce our efforts to boost its performace by reducing the memory access and increase the computational operations by considering more complex physical processes and integrating statistical and visualization operations for interactive dynamic simulation of multiphase flows. The direct numerical simulation of gas-solid flow is carried out using NVIDIA Tesla K80 and P100 GPUs with encouraging results.

Level: Advanced
Type: Talk
Tags: Computational Fluid Dynamics; Computer Aided Engineering

Day: TBD
Time: TBD
Location: TBD

S7267 - Automatic Compiler-Based Optimization of Graph Analytics for the GPU

Sreepathi Pai Postdoctoral Research Fellow, The University of Texas at Austin
Sreepathi Pai is a postdoctoral research fellow at the Institute for Computational Engineering and Sciences at the University of Texas, Austin. Sreepathi's research interests are in compilers and computer architecture for high-performance computer systems that feature accelerators like GPUs. Most recently, his work has revolved around optimizing compilers for high-performance irregular/graph algorithms on GPUs, including developing performance models for graph algorithms on GPUs. This work has led to the LonestarGPU 2.0 suite of GPU graph benchmarks and the IrGL compiler for irregular GPU algorithms. He obtained his Ph.D. in 2015 from the Indian Institute of Science, Bangalore, where he worked on schemes for minimally redundant automatic memory transfers between the CPU and GPU, and on improving concurrent execution capabilities in GPUs.

Learn how to use IrGL, our newly developed language and compiler, to obtain high-speed graph algorithm implementations without writing a lot of low-level NVIDIA(R) CUDA(R). IrGL can be used for parallel graph algorithm research, graph analytics, and graph database query processing. IrGL performance for graph algorithms meets or exceeds the performance of low-level handwritten CUDA code because our optimizing compiler automatically tackles three key challenges encountered in writing graph algorithms -- atomics, load imbalance due to serialization of loops, and kernel launch throughput -- freeing up the programmer to focus on higher-level optimizations. We'll introduce the IrGL language, its compiler, and how they can use IrGL to target problems with irregular data-parallelism.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Programming Languages; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7268 - Fast GPU Monte Carlo Simulation for Radiotherapy, DNA Ionization and Beyond

Nick Henderson Research Associate, Stanford University
Highly-Rated Speaker
Nick Henderson is a research associate and instructor with the Institute for Computational and Mathematical Engineering at Stanford University. His research is focused on high performance computing, GPU-accelerated simulation, and mathematical optimization.
Shogo Okada Research Associate, Kobe University
Shogo Okada has received the Ph.D. degree in science from Kobe University, Kobe Japan, in 2011. He worked at KOBELCO Research Institute INC, Kobe Japan, as a researcher in 2011-2014. In 2014, he joined KEK-CRC, Tsukuba Japan, as a Postdoctoral Research Fellow and dedicated developing GPU Monte Carlo radiation simulator based on Geant4 for medical and biological application, named MPEXS and MPEXS-DNA, collaborating with SLAC, Stanford University, and CENBG. In GTC Japan 2016, he made a presentation about a CUDA implementation for radiation simulation at DNA cellular level at the poster session and received the NVIDIA award. In 2017, he started working at Kobe University as an assistant professor. He continues developing GPU applications running on the MPEXS framework.

Learn about techniques used to accelerate a Monte Carlo particle physics simulator. The strategies discussed include sorting to minimize thread divergence and data structures for efficient memory access. The software, named MPEXS, is primarily focused on X-ray radiotherapy and has been recently extended to cellular and DNA levels. Simulation of DNA ionization is particularly challenging, because large numbers of low energy particles have to be managed. Implementation of these strategies has both improved the run-time performance and reduced the memory usage. The results from the performance analysis are likely to be of use in other domains that rely on discrete event simulation. Extension of physics coverage for proton and carbon therapy and neutron radiation protection is envisioned.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Algorithms; Healthcare and Life Sciences

Day: TBD
Time: TBD
Location: TBD

S7270 - Parallel Recursive Filtering for Image Processing

Diego Nehab Associate Professor, National Institute for Pure and Applied Mathematics (IMPA)
Diego Nehab is an associate professor at the National Institute for Pure and Applied Mathematics (IMPA) in Rio de Janeiro, Brazil. Before joining IMPA in 2010, Diego worked as a postdoctoral researcher at Microsoft Research in Redmond, Wash. He is interested in most topics related to computer graphics, but focuses on parallelism, real-time rendering, and image processing. He received B.Eng. and M.S. degrees in computer science from PUC-Rio in 2000 and 2002, respectively, and a Ph.D. in computer science from Princeton University in 2007.

We'll present a variety of recent advances in parallel recursive filtering on the GPU. Recursive filtering is one of the key operations in image processing. It can be used, for example, to invert the effect of convolutions, to enable highest-quality image interpolation and antialiasing for rendering, and in the fastest implementations of image blurring. We'll cover the content of two SIGGRAPH Asia publications. One focuses on how to break the dependency chain to increase the amount of exposed parallelism while minimizing bandwidth requirements. The other describes the first method to enable filtering of infinite input extensions exactly. The resulting algorithms offer a complete solution to recursive filtering on the GPU. Our implementations are available for free in open source.

Level: All
Type: Talk
Tags: Video and Image Processing; Algorithms; Media and Entertainment

Day: TBD
Time: TBD
Location: TBD

S7272 - Urban Scale Crowd Data Analysis, Simulation, and Visualization

Isaac Rudomin Senior Researcher, Barcelona Supercomputing Center
Isaac Rudomin is a senior researcher at the Barcelona Supercomputer Center, which he joined in 2012. His focus is on crowd rendering and simulation including generating, simulating, animating, and rendering large and varied crowds using GPUs in consumer-level machines and in HPC heterogeneous clusters with GPUs. Previously, Isaac was on the faculty at Tecnologico de Monterrey Campus Estado de Mexico. He finished his Ph.D. at the University of Pennsylvania under Norman Badler on the topic of cloth modeling. Dr. Dmitri Terzopoulos was a member of the committee.

We'll dive deep into how we use heterogeneous clusters with GPUs for accelerating urban-scale crowd data analysis, simulation, and visualization. Our main contributions are the development of new behavior models that conform to real data, the ability to scale the system by adding computing resources as needed without making programming modifications and the combination of analysis, simulation, and visualization techniques that help us achieve large-scale crowd simulations with realistic behavior.

Level: All
Type: Talk
Tags: In-Situ and Scientific Visualization; HPC and Supercomputing; Computational Physics; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7274 - Similarity Mapping with Enhanced Siamese Network for Multi-Object Tracking

Minyoung Kim Senior Research Engineer, Panasonic Silicon Valley Laboratory
Minyoung Kim is a senior research engineer at Panasonic Silicon Valley Laboratory. She has been working on deep learning projects related to ADAS, such as real-time object detection and tracking, in which she trains deep neural networks, implements new algorithms, builds optimized systems, and researches new technologies. Minyoung received her master's in computer science, specializing in artificial intelligence, from Stanford University in 2009.

We'll describe how an enhanced Siamese neural network can be used for similarity mapping ultimately for multiple object tracking, By fusing both appearance and geometric information into a single enhanced Siamese neural network, trainable end-to-end on a single GPU machine, the object tracking system achieves competitive performance in both speed and accuracy on several benchmarks. We'll show a video demonstration of the system on a laptop PC with a GeForce GTX 1070.

Level: Intermediate
Type: Talk
Tags: Self-Driving Cars; Deep Learning and AI; Computer Vision and Machine Vision

Day: TBD
Time: TBD
Location: TBD

S7275 - Real-Time Monitoring of Financial Risk Management on GPU

Jie Zhang Software engineer , Shanghai Clearing House
Jie Zhang is a software engineer working for Shanghai Clearing House, the exclusive CCP for OTC market in China, and the first joint-stock company and sub-institution of the People's Bank of China (PBC). Based on a strong sense of risk prevention and an effective risk management framework, Shanghai Clearing House offers centralized and standardized RMB and foreign currencies clearing services for financial market spot and derivatives transactions as well as RMB cross-border transactions approved by PBC. Jie is focused on implementing financial computing engines for securities and derivatives. He has been involved in the development of risk management modules in interest rate swap centralized clearing system and credit default swap centralized clearing system. He is now the leader of "the evaluation system for fixed income security" that supports security evaluation and China Credit Bond index in Shanghai Clearing House.

Option-embedded bonds pricing and Value at Risk (VaR) computations have become hotspots in financial risk management since the 2008 financial crisis. The goal of this work is to implement real-time option-embedded bonds pricing and VaR computations in the production system of Shanghai Clearing House, the exclusive central counterparties for the over-the-counter market and the major securities settlement system in China. We developed both CUDA and OpenACC implementations on GPU. The results showed the CUDA versions achieved 60x speedup for option-embedded bonds pricing and 10x speedup for VaR. In addition, the OpenACC versions can deliver portable performance for both option-embedded bonds pricing and VaR computations.

Level: All
Type: Talk
Tags: Finance

Day: TBD
Time: TBD
Location: TBD

S7277 - Computer Virtual Experiment on Fluidized Beds Using GPU Accelerated CFD-DEM Method

Ji Xu Associate processor, Institute of Process Engineering, Chinese Academy of Sciences
Ji Xu is an associate professor at the Institute of Process Engineering Chinese Academy of Sciences.

Learn how to use GPUs to accelerate CFD-DEM, the computational fluid dynamics - discrete element method, to achieve computer virtual experiment on fluidized beds in the chemical engineering field. We'll discuss how to organize the gas- and solid-phase equations solved concurrently by CPUs and GPUs in a heterogeneous supercomputing system. With systematic optimization of the model, numerical method, software, and hardware, we can simulate lab- to pilot-scale fluidized beds at quasi-realtime speed, and conduct demos of such systems. Our method realizes some real applications tthat need very long time simulations.

Level: All
Type: Talk
Tags: Computer Aided Engineering; Computational Fluid Dynamics; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7278 - Machine Vision for Large Scale Aerial Images

Zbigniew Wojna Co-Founder, Tensorflight
Zbigniew Wojna has worked 8 months at DeepMind on machine vision projects for DeepMind Applied Health team, and 4 months between Google Brain / Machine Perception / StreetSmart teams at Google. Before DeepMind he was an engineer at NVIDIA, Microsoft, and Google. Zbigniew built the best single model in ImageNet 2015 competition (biggest machine vision benchmark). This year he won MS COCO detection challenge 2016 (the biggest detection benchmark). He has also built a text transcription system for Street Name Street Signs from Google StreetView Imagery. He is responsible for providing highly accurate machine vision models. Zbigniew currently lives in London.

Tensorflight is building a digital brain capable of understanding the world from the sky. Its approach is based on deep convolutional neural networks, inspired by visual processing in the human brain. The company partners with DroneDeploy, the leading platform for collecting aerial imagery via drones. Our first machine vision solution allows us to count different types of objects such as trees, crops, cars, livestock, etc. It is based on state of the art research with ideas not yet published. Models are easily scalable, and deployed on the cloud with almost real-time analysis. We'll tell you about the models Tensorflight uses and how we have solved problems of processing maps that are 30,000 x 30,000 in pixel size.

Level: All
Type: Talk
Tags: Intelligent Machines and IoT; Deep Learning and AI; Computer Vision and Machine Vision; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7280 - CLBlast: A Tuned BLAS Library for Faster Deep Learning

Cedric Nugteren GPU / deep learning specialist, TomTom
Cedric Nugteren lives and breathes GPU technology. Cedric ported image processing apps to CUDA during his M.S. studies as early as 2008 and received his Ph.D. in 2014 after publishing 15 peer-reviewed GPU-related articles. He interned at ARM's Mali GPU compiler group in 2012 and NVIDIA's cuFFT team in 2014. After his Ph.D., he worked as a GPU-expert at the Dutch supercomputer center and as a computer vision R&D performance engineer at Blippar. Currently he uses deep learning and GPUs for autonomous driving and HD-mapping at TomTom in Amsterdam. His free time is also well spent: he works on a tuned C++11 version of the OpenCL BLAS library with half-precision FP16 support.

We'll demonstrate how to accelerate dense linear algebra computations using CLBlast, an open-source OpenCL BLAS library providing optimized routines for a wide variety of devices. It is targeted at deep learning training and inference and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the convolutional layers: the computational heart of all deep-learning frameworks (TensorFlow, Caffe, etc.). CLBlast has three main advantages over other BLAS libraries: 1) it can be explicitly tuned for specific matrix-sizes and hardware platforms, 2) it runs on less common devices (and it is fast), such as embedded and low-power GPUs, and 3) it can perform operations in half-precision FP16 format, saving precious bandwidth, time, and power.

Level: Intermediate
Type: Talk
Tags: Tools and Libraries; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7281 - Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster

Jonas Markussen PhD student, Simula Research Laboratory
Jonas Markussen is a Ph.D. student at Simula Research Laboratory and the University of Oslo, Norway. Jonas's research interests are distributed processing, computer networks, and high-speed interconnects. At Simula, he is involved in the Unified PCIe IO project, a project in collaboration with Dolphin Interconnect Solutions.

Learn how GPUs can be time-shared between multiple hosts connected in a PCIe cluster using a method called device lending. Unlike approaches for sharing GPUs that typically require specific programming models, device lending makes a GPU appear to the operating system as if it is locally installed. This allows the GPU to be controlled and used by a remote host without any modifications to existing software. We'll present how device lending is implemented using standard PCIe and non-transparent bridging. As a proof-of- concept, we accelerate EIR, a computer-aided medical diagnosis system using machine learning and computer vision to do polyp detection, from being an offline tool to giving real-time feedback by dynamically borrowing remote GPU resources.

Level: Intermediate
Type: Talk
Tags: Data Center and Cloud Computing; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7282 - GPU-Accelerated Convolutional Neural Networks for Protein-Ligand Scoring

David Koes Assistant Professor, University of Pittsburgh
David Koes has a longstanding interest in low-level computer systems. He worked as a compiler developer for Green Hills software prior to receiving is Ph.D. in computer science from Carnegie Mellon University in 2009. After completing his thesis, "Towards a More Principled Compiler," which explored novel approaches to optimal backend compiler optimization, he switched research directions to pursue computational drug discovery. Since launching this new research direction, he developed a number of immediately useful, innovate applications that enable interactive drug discovery: Pharmit, smina, 3Dmol.js, PocketQuery, shapedb, Pharmer, ZINCPharmer, and AnchorQuery. These technologies work together to make the "big data" of chemical space accessible to any researcher with a web browser. Using these tools, he participated in a number of applied drug discovery projects, including the winning Teach-Discovery-Treat entry, which achieved a 33% hit rate against the anti-malarial DHODH enzyme.

We'll describe a convolutional neural network that takes as input a comprehensive 3D representation of a protein-ligand interaction and predicts whether the ligand (a small molecule, like a drug) binds to the protein. We'll provide a brief orientation in structure-based drug design, describe how we effectively use the GPU to efficiently train, evaluate, and visualize our neural networks, and discuss preliminary results and current limitations. Our CNN scoring function outperforms the conventional AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.

Level: Intermediate
Type: Talk
Tags: Computational Biology; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7285 - Unified Memory on the Latest GPU Architectures

Nikolay Sakharnykh Senior Developer Technology Engineer, NVIDIA
Nikolay Sakharnykh is a senior developer technology engineer at NVIDIA, where he works on accelerating HPC and graph analytics applications on GPUs.

Learn about the new features of Unified Memory programming model for heterogeneous architectures. We'll deep dive into architecture and software changes related to Unified Memory, what it means for developers, and how it enables easier data management and new capabilities for your applications. We'll cover in detail Unified Memory features such as on-demand paging, memory oversubscription, memory coherence, and system-wide atomics. Use cases in HPC, deep learning, and graph analytics will be provided along with initial performance results. We'll also discuss common pitfalls and optimization guidelines so you can take full advantage of Unified Memory to increase your productivity.

Level: All
Type: Talk
Tags: Programming Languages; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7286 - A High-Quality and Fast Maximal Independent Set Algorithm for GPUs

Martin Burtscher Professor, Texas State University
Highly-Rated Speaker
Martin Burtscher is a professor in the Department of Computer Science at Texas State University. He received the B.S./M.S. degrees from ETH Zurich and Ph.D. from the University of Colorado at Boulder. His research focuses on the parallelization of complex algorithms for GPUs, high-speed data compression, and energy-efficiency optimization. Martin is a distinguished member of the ACM and a senior member of the IEEE. He has co-authored over 100 peer-reviewed publications, including a book chapter in NVIDIA's GPU Computing Gems, is the recipient of an NVIDIA Academic Partnership award, and is the principal investigator of a CUDA Teaching Center.

Learn how to efficiently parallelize Maximal Independent Set computations for GPUs. Our CUDA implementation is at least three times faster than the leading GPU codes on every one of the 16 real-world and synthetic graphs we tested. Moreover, it produces a larger maximal independent set in all but one case. It is asynchronous, atomic free, and requires fewer than 30 kernel statements. We'll present the included code optimizations to achieve heretofore unreached performance and describe how to exploit monotonicity to minimize the memory footprint of this important irregular graph algorithm.

Level: Advanced
Type: Talk
Tags: Algorithms; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7289 - 3D Human Motion Capture from 2D Video Using Cloud-Based CNNs

Paul Kruszewski Founder, wrnch
As a serial entrepreneur, Paul Kruszewski has been hustling and hacking in visual computing since he was 12, when he leveraged a $250 livestock sale into a $1,000 TRS-80 Color Computer. Soon after he wrote his fist video game. Paul went on to obtain a Ph.D. in the probabilistic algorithmic analysis from McGill University. In 2000, he founded AI.implant and developed the world's first real-time navigation middleware for 3D humans. AI.implant was acquired in 2005 by Presagis, the world's leading developer of software tools for military simulation and training. In 2007, he founded GRIP and developed the world's first brain authoring system for video game characters. GRIP was acquired in 2011 by Autodesk, the world's leading developer of software tools for digital entertainment. In 2014, he founded wrnch to democratize computer vision technology.

This talk provides a brief overview of how to apply GPU-based deep learning techniques to extract 3D human motion capture from standard 2D RGB video. We describe in detail the stages of our CUDA-based pipeline from training to cloud-based deployment. Our training system is a novel mix of real world data collected with Kinect cameras and synthetic data based on rendering thousands of virtual humans generated in the Unity game engine. Our execution pipeline is a series of connected models including 2D video to 2D pose estimation and 2D pose to 3D pose estimation. We describe how this system can be integrated into a variety of mobile applications ranging from social media to sports training. A live demo using a mobile phone connected into an AWS GPU cluster will be presented.

Level: Beginner
Type: Talk
Tags: Media and Entertainment; Game Development; Deep Learning and AI; Computer Vision and Machine Vision; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7290 - GPU-Accelerated Natural Language Processing

Guillermo Moliní CTO, Wavecrafters
Guillermo Molini is a computer scientist at Wavecrafters Madrid, specializing in CUDA and machine learning, with an emphasis lately in natural language processing. Guillermo is a versatile programmer with strong experience in C, C++, and parallelization with CUDA, and has worked developing applications for the cloud. He started his career as a software tester at BMW.

We'll give an introduction into natural language processing on GPUs. So far, GPUs are not used in big data as much as they should. We'll show how GPUs can bring deep learning techniques into production for large big data systems. We'll discuss some of the possible use cases of NLP, and w'll see why the techniques used up until now havent been enough. We'll talk about vector embeddings, and see in a live demo why they do convey the semantic information we're looking for when processing language.

Level: All
Type: Talk
Tags: Deep Learning and AI; Accelerated Analytics; Federal

Day: TBD
Time: TBD
Location: TBD

S7293 - Detecting Topological Changes in Dynamic Delaunay Triangulations Using CUDA

Massimo Bernaschi Prof., National Research Council of Italy
Highly-Rated Speaker
Massimo Bernaschi is chief technology officer of the Institute for Applied Computingwith at the National Research Council (CNR) of Italy. He is also an adjunct professor of systems programming at Sapienza University in Rome, and a trainer in digital forensics at Sapienza and Modena universities. Before joining CNR in 1998, Massimo worked for 10 years at the IBM European Center for Scientific and Engineering Computing, where he developed the IBM PVMe product and received two Outstanding Technical Achievement Awards. His main scientific interests are parallel computing; modeling of complex systems (finance and biology); systems and network security; and high performance computing. He is the author of about 150 papers in peer-reviewed journals and international conferences. Massimo started working with CUDA in 2008. In 2012 he was named a CUDA Fellow. He has been a finalist in the Gordon Bell challenge in 2010, 2011, 2013, and 2015.

Learn how to detect topological changes that occur in dynamic 2D Delaunay triangulations using CUDA. We'll present a novel, unified approach that can be applied in all those cases (pedestrian tracking, flocking, moving bubbles, etc.) where objects are triangulated starting from a density map. Topological changes are detected comparing two subsequent triangulations and they show up as "flipped-edges." We'll show new physics results due to the unprecedented statistics of detection of irreversible topological changes, occurring in the triangulation of the droplets of a Lattice Boltzmann emulsion, allowed by our implementation. Such changes are associated to the so-called plastic events that are responsible for the complex behavior of emulsions possessing both liquid and solid features at the same time. In our implementation, we used a suitable mix of in-house developed CUDA kernels and primitives from existing CUDA libraries.

Level: Intermediate
Type: Talk
Tags: Algorithms; Computational Physics; Computational Fluid Dynamics; Computer Aided Engineering

Day: TBD
Time: TBD
Location: TBD

S7294 - Controlling Hundreds of GPU-Powered Plasma-Physics Simulations with Machine Learning Algorithms

Remi Lehe Postdoctoral Fellow, Lawrence Berkeley National Laboratory
Remi Lehe is a postdoctoral researcher at Lawrence Berkeley Laboratory, where he works on large-scale plasma simulations and advanced spectral algorithms. He is also involved in several open-source scientific projects, including the particle-in-cell (PIC) codes Warp and FBPIC. Remi graduated in physics from Ecole normale superieure, Paris, and obtained a Ph.D. from Ecole Polytechnique, France, where he studied plasma-based particle accelerators. His work on these accelerators is largely based on PIC simulations, and in particular he developed an alternative finite-difference Maxwell solver, which is now implemented in several PIC codes used by research teams throughout the world (Osiris, PIConGPU, Warp, Calder).

Better hardware and algorithms have made plasma-physics particle-in-cell codes much faster. Instead of running individual simulations, it's now common to explore the space of physical parameters with large sets of simulations. However, predefined regularly spaced parameter scans can be inefficient and expensive. Instead, we use an adaptive algorithm that learns from previous simulations and determines the most promising parameters to try next. We illustrate this method on the problem of electron injection in laser-wakefield acceleration. Using hundreds of GPU-powered simulations with the code FBPIC on the Titan cluster at ORNL, the algorithm quickly focuses on the most relevant regions of the explored parameter space.

Level: Intermediate
Type: Talk
Tags: Computational Physics

Day: TBD
Time: TBD
Location: TBD

S7295 - Are We Done with Object Recognition? The R1-Robot Perspective.

Giulia Pasquale Ph.D. Candidate, Istituto Italiano di Tecnologia
Giulia Pasquale is a Ph.D. candidate in deep learning for robotic vision at the Istituto Italiano di Tecnologia, iCub Facility, and at the University of Genoa, Italy, Department of Informatics, Bioengineering, Robotics, and Systems Engineering (DIBRIS), under the supervision of Professor Lorenzo Natale and Professor Lorenzo Rosasco. In 2010, Giulia graduated with honours in biomedical engineering at the University of Genoa and in 2013 she was awarded with honors an M.D. in bioengineering, neuroengineering curriculum, at the same university. Her research activity involves machine learning, computer vision, robotics, and GPU computing, focusing on visual recognition for robotic systems. She is particularly interested in representation learning, incremental and lifelong learning, and perceptual and implicit learning, with the ultimate goal of developing reliable, adaptive, and efficient visual recognition systems for autonomous machines.

Today Deep Learning achieved stunning results in visual recognition as such to raise the question of whether this problem is actually solved. Should this be the case, the advantages for robotics could be dramatic. Indeed, the lack of reliable visual skills is a major bottle neck for robots deployment in everyday life. With this respect in mind, we started an effort to quantify the benefits and limits, if any, of DL in the context of robot vision. By exploiting R1, our latest humanoid equipped with an NVIDIA(TM) Jetson TX1 (R), we investigated key differences between robot vision and other applications where DL typically excels, as image retrieval. Our study identified critical issues to be tackled via computer vision and machine learning, while taking advantage of a robot platform. Our results confirm the huge impact of DL, testified by the great real-time recognition capabilities of R1, while pointing at specific open challenges that need to be addressed for seamless deployment in robotics.

Level: Intermediate
Type: Talk
Tags: Intelligent Machines and IoT; Computer Vision and Machine Vision; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7296 - CloudLighting: Merging GPU-based HPC with Cloud Services

Anne C Elster Professor of High Performance Computing, Norwegian University of Science &Technoloy /Univ. of Texas at Austin
Anne Elster is a professor at the Norwegian University of Science and Technology, a GPU Research Center, where she runs the Heterogeneous and Parallel Computing Lab (HPC-Lab). She is also a visiting scientist at the University of Texas at Austin, a CUDA Teaching Center.

Learn how you can integrate GPU-enabled HPC and Cloud computing using building on recent container technologies and integration. This presentation will highlight the efforts we are doing as part of the EU Horizon 2020 project CloudLighting where ww look at how to integrate Heterogenous Computing with Cloud technologies.

Level: Intermediate
Type: Talk
Tags: Data Center and Cloud Computing; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7297 - VKHL - A High Level Vulkan Framework to Simplify Your Life

Markus Tavenrath Senior Developer Technology Engineer, NVIDIA, NVIDIA
Markus Tavenrath is a senior developer technology engineer at NVIDIA, focused on rendering technologies that bring high interactivity to complex scenes. This work includes both GPU and CPU strategies to solve typical scene-graph operations related to rendering. In addition, Markus works on various Vulkan projects, including Vulkan-Hpp, the C++ bindings for Vulkan. Previously, he implemented parts of OptiX, improved SceniX, NVIDIA's scene-graph technology, and developed several ray-tracing demos. He also worked in close cooperation with external partners to improve rendering performance and scene-graph usability. Upon first joining NVIDIA, Markus worked primarily on GPU ray-tracing for SceniX. He finished his studies in computer science with a focus on computer graphics in 2008, and was one of the first using ray-tracing on CUDA for his diploma thesis.

We'll provide an overview of VHKL, a framework that can simplify your Vulkan development by using features like RAII, resource tracking, and providing interfaces for functionality like sub-allocation.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics

Day: TBD
Time: TBD
Location: TBD

S7298 - Blasting Sand with NVIDIA CUDA: MPM Sand Simulation for VFX

Gergely Klar Software Engineer, DreamWorks Animation
Gergely Klar received his Ph.D. from the University of California, Los Angeles. During his graduate studies he worked on a range of physically based animation projects, including MPM, SPH, and FEM simulations. Gergely joined the DreamWorks Animation's FX Research and Development team, where he is helping artists create more magnificent effects. He is a Fulbright Science and Technology alumnus, avid sailor, and father of two.
Ken Museth Senior Principal Engineer, DreamWorks Animation
Ken Museth is the manager and senior principle engineer of research and development in visual effects at DreamWorks Animation. He invented VDB, which is the enabling technology of OpenVDB, an open sourced library for efficient storage and simulation of VFX, which is setting a new standard in the movie industry. For this he was awarded a technical academy award in 2015. Prior to joining DreamWorks Animation in 2009 he worked on VFX in live action at Digital Domain for three years, was a full professor in computer graphics at Linkoping University for five years, and a senior research scientist at Caltech for five years. During the last period he also worked at NASA's Jet Propulsion Laboratory on trajectory design for the "Genesis" space mission. Since early 2014 Ken has also worked part-time for SpaceX on CFD simulations of the new Raptor engine powering the main stage of the Interplanetary Transport System. Ken holds a PhD in theoretical physical chemistry from Copenhagen University.

We'll present our challenges and solutions for creating a material point method (MPM)-based simulation system that meets the production demands of fast turnaround for artistic look development. Our method fully utilizes the GPU and performs an order of magnitude faster than the latest published results. With this improvement, the technique's main limiting factor - its speed - has been eliminated, making MPM appealing for a wider range of VFX applications. Practitioners in computational physics and related fields are likely to benefit from attending the session as our techniques are applicable to other hybrid Eulerian-Lagrangian simulations.

Level: Intermediate
Type: Talk
Tags: Media and Entertainment; Computational Physics
Industry Segments: Media & Entertainment; Games

Day: TBD
Time: TBD
Location: TBD

S7303 - Finding Parallelism in General-Purpose Linear Programming

Daniel Thuerck Ph.D. Student, Technical University Darmstadt, Graphics, Capture and Massively Parallel Computing
Daniel Thuerck is a first year Ph.D. student at GCC, TU Darmstadt. He earned his B.S. in computer science and M.S. in computational ngineering with a strong focus on optimization, both at TU Darmstadt. Daniel's research focuses mainly on parallel optimization algorithms, especially with an application in visual computing. He has interned with NVIDIA research twice in the past year and currently works on parallel linear programming.
Maxim Naumov Senior Research Scientist, NVIDIA
Highly-Rated Speaker
Maxim Naumov is a senior research scientist at NVIDIA. His interests include parallel algorithms, numerical linear algebra, optimization, and graphs. Maxim contributes to data analytics nvGRAPH library and has led the development of the AmgX library, which provides distributed Algebraic Multigrid, Krylov and Relaxation-based schemes. He has also worked on the cuBLAS, cuSPARSE, and cuSOLVER(RF) libraries that are part of the CUDA toolkit. In the past, Maxim held different positions at NVIDIA, including on the CUDA Platform and Emerging Applications teams, and at Intel in the Microprocessor Technology Lab and Computational Software Lab. Maxim received his Ph.D. in computer science, with a specialization in computational science and engineering, in 2009 and his B.S. in computer science and mathematics in 2003, all from Purdue University - West Lafayette.

Get to know two different techniques in retrieving parallelism hidden in a general purpose linear programs (LPs) that are broadly used in operations research, computer vision, and machine learning. With conventional solvers often being restricted to serial computation, we'll show two ways of retrieving inherent parallelism, using: (1) parallel sparse linear algebra techniques with an interior-point method, and (2) a higher-level automatic LP decomposition. After a quick introduction to the topic, we'll present details and results for a diverse range of applications on the GPU.

Level: Intermediate
Type: Talk
Tags: Algorithms; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7304 - Augumented Real-Time Mapping at Traffic Intersections

Menna El-Shaer Doctoral Student/Researcher, The Ohio State University
Menna El-Shaer is an electrical and computer engineering Ph.D. student at the Ohio State University. Menna got her B.S. in computer engineering from Ain Shams University, Cairo, Egypt, in 2009. She has extensive computer programming experience and worked on multiple computer vision and image processing projects sponsored by the Office of Naval Research, the National Institute of Health, and the National Science Foundation. Menna has also taught multiple computer programming and hardware design at Wright State and Ohio State Universities.

Detecting objects whether they are pedestrians, bicyclists, or other vehicles at a traffic intersection is essential to ensure efficient traffic flow through the intersection and the safety of all traffic participants. We'll present methods to reconstruct the traffic scene from the vehicle's point of view using multiple cameras placed on the vehicle in real time. We'll use a mixture of deep semi-supervised learning models to infer the objects from the scene. We'll also present how we optimized our models to run on the Tegra SoC used in NVIDIA's Jetson TX1 and the DRIVE PX platforms. Participants are expected to be familiar with basic probability concepts and GPU programming with CUDA.

Level: Advanced
Type: Talk
Tags: Self-Driving Cars; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7305 - Super GPU: Make Programming of Multi-GPU Systems Easy

Michael Frumkin Sr. Compute Architect, NVIDIA
Michael Alex Frumkin is a senior computer architect on performance optimization and analysis of large-scale applications at NVIDIA. Previously, Michael worked at Google, Intel, and NASA working on traffic management, performance optimization of multicore systems, and benchmarking large-scale systems. He holds an M.S. in mathematics from Moscow State University and a Ph.D. in computer sciences from the Graduate School of The Soviet Academy of Sciences. He is an author of more than 70 scientific papers and holds two patents.

Learn a natural way to program multi-GPU systems. The super-GPU programming concept is a natural extension of NVIDIA® CUDA® tiling hierarchy into multi-GPU systems. It allows you to write super-kernels that run on super-GPUs. Tiling simplifies the challenging problem of programming a multi-GPU system that requires coordination of multiple kernels running on nodes connected via a heterogeneous network. We'll illustrate a super-GPU programming model on several applications and benchmarks, including SpMV, Integer Sort, Transpose, FFT, GEMM, and RTM. Use of super GPU provides super-linear speedup of SpMV due to better utilization of L2 of several GPUs. For Sort, FFT, and GEMM, the speedup is close to linear. Multi-GPU Transpose attains the limit imposed by the interconnecting network.

Level: Intermediate
Type: Talk
Tags: Tools and Libraries; HPC and Supercomputing; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7307 - Real-Time Volume Rendering of Medical Data for a Virtual Environment

Philippe Cattin Prof. Dr., Head of the Department of Biomedical Engineering, University of Basel, Allschwil, Switzerland
Philippe Cattin is the head of the recently founded Department of Biomedical Engineering at the University of Basel. He is also the founder of the Medical Image Analysis Center at the Medical Faculty of the University of Basel. Philippe's research interests include medical image analysis, image-guided therapy, and robotics-guided laser osteotomy. In 2015, he was promoted to associate professor at the University of Basel, after having served as an assistant professor since 2007. From 2003 to 2007, he was a postdoctoral fellow with the Computer Vision Laboratory at ETH Zurich. He received his Ph.D. in robotics from ETH Zurich, Switzerland, in 2003, and his M.S. in computer science in 1995. He received his B.S. from the University of Applied Science in Brugg/Windisch in 1991. As a principal investigator, Philippe has finished many projects in these areas and published over 100 papers, patents, and book chapters.

Complex surgical interventions often demand for a careful pre-operative planning. As of today, surgeons still have to manually segment the anatomical structures of interest for planning the intervention. This is a tedious and error prone task. The real-time volume rendering technique presented in this talk has the potential the completely replace the segmentation and thus simplifying surgical planning substantially.We'll present the HTC Vive-based virtual reality room that displays medical data using our fast proprietary volume renderer running on a single GPU. Besides implementation aspects such as real-time shadow casting, we'll show possible applications of such a VR room in the medical field and beyond. We'll give an outlook on how this technology could change the future of medical data visualization. Possible future applications of this technology also include collaborative data visualization with for example experts sitting hundreds of miles away or even in space.

Level: All
Type: Talk
Tags: Virtual Reality and Augmented Reality; Healthcare and Life Sciences

Day: TBD
Time: TBD
Location: TBD

S7310 - 8-Bit Inference with TensorRT

Szymon Migacz CUDA Library Software Engineer, NVIDIA
Szymon Migacz has worked on the CUDA libraries team at NVIDIA since 2015. His main focuses include CUDA math library, cuRAND, and cuFFT. Recently he has been working on accelerating deep learning algorithms, including inference in reduced numerical precision.

Traditionally, convolutional neural networks are trained using 32-bit floating-point arithmetic (FP32). By default, inference on these models employs FP32 as well. We'll describe a method for converting FP32 models to 8-bit integer (INT8) models. Our method doesn't require re-training or fine-tuning of the original FP32 network. A number of standard networks (AlexNet, VGG, GoogLeNet, ResNet) had been converted from FP32 to INT8. Converted models achieve comparable Top 1 and Top 5 inference accuracy. The methods are implemented in TensorRT and can be executed on GPUs that support new INT8 inference instructions.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Tools and Libraries; Self-Driving Cars; Federal
Industry Segments: Automotive; Cloud Services

Day: TBD
Time: TBD
Location: TBD

S7312 - ADAS Computer Vision and Augmented Reality Solution

Sergii Bykov Technical Lead, Luxoft
Sergii Bykov is a software engineer at Luxoft. He specializes in automotive computer vision and augmented reality projects, and research and development in parallel computing, computer vision, and machine learning (for example, neural networks for object recognition). Sergii received his M.S. at the National Technical University of Ukraine in computer system analytics.

Learn how combining machine learning, computer vision, and real-time signal processing with GPU computing helps to create a next-generation informational ADAS experience. Computer vision and augmented reality (CVNAR) is a real-time software solution that encompasses a set of advanced algorithms to help create mixed augmented reality for the driver, utilizing vehicle sensors, map data, telematics, and navigation guidance. The broad range of features includes augmented navigation, visualization, additional information in case of advanced parking assistance and adaptive cruise control, driver infographics and lane keeping, driver health monitoring, support of low visibility mode and autonomous driving. Our approach augments drivers' visual reality with supplementary objects in real time, and works with various output devices such as head unit displays, digital clusters, and head-up displays.

Level: Intermediate
Type: Talk
Tags: Computer Vision and Machine Vision; AI for In-Vehicle Applications; Self-Driving Cars

Day: TBD
Time: TBD
Location: TBD

S7313 - AirVision: AI Based, Real-Time Computer Vision System for Drones

Mindaugas Eglinskas CEO, Magma Solutions, UAB
Mindaugas Eglinskas is the CEO at Magma Solutions, with over 18 years of experience in computer vision, neural networks, and development of mission-critical systems. He leads the project for Lithuanian Ministry of Defence for creating next-generation unmanned aerial vehicle prototype system. Magma Solutions is responsible for the development of a real-time, on-board object recognition system, and visual position estimation in a GPS-denied environment.

Modern computing hardware and NVIDIA(R) Jetson(TM) TX1 performance create new possibilities for drones and enable autonomous AI systems, where image processing can be done on-board during flight. We'll present how Magma Solutions developed the AirVision system to cover advanced vision processing tasks for drones, e.g., image stabilization, moving object detection, tracking, and classification using deep neural networks, and visual position estimation using preloaded maps. We'll describe how Magma Solutions used software frameworks Caffe with cuDNN, OpenVX /NVIDIA VisionWorks(TM), and NVIDIA CUDA(R) to achieve real-time vision processing and object recognition. The AirVision system is in part developed with Lithuanian Ministry of Defence funding and is being used as a tactical UAV system prototype.

Level: Intermediate
Type: Talk
Tags: Intelligent Machines and IoT; Computer Vision and Machine Vision; AI for In-Vehicle Applications

Day: TBD
Time: TBD
Location: TBD

S7314 - Fast Flow-Based Distance Quantification and Interpolation for High-Resolution Density Distributions

Steffen Frey Postdoc, University of Stuttgart, Visualization Research Center
Steffen Frey is a postdoc in Thomas Ertl's lab at the Visualization Research Center of the University of Stuttgart. His research primarily focuses on performance-related aspects in scientific visualization. Steffen has made research contributions in situ visualization, (dynamic) parameter tuning and performance prediction, the analysis of time-dependent data, image-based visualization, as well as scheduling for interactive visualization. He also serves on numerous committees in the field. He has a diploma in computer science and a Ph.D. in visualization, both from the University of Stuttgart.

We'll discuss our GPU-targeted algorithm design for the efficient computation of distances and interpolates between high-resolution density distributions (based on the Earth Mover's Distance / the Wasserstein metric). We particularly focus on the changes - and their rationale - to transition from our previous multicore approach to a manycore design (utilizing NVIDIA® CUDA®, CUB, and Thrust) that yields a massive improvement in performance. Expressive distances and interpolates are a crucial building block for numerous applications in computer vision, computer graphics, and visualization, and we'll give examples from different areas to demonstrate both utility and performance of our improved approach.

Level: Intermediate
Type: Talk
Tags: In-Situ and Scientific Visualization; Rendering and Ray Tracing

Day: TBD
Time: TBD
Location: TBD

S7316 - Real-Time Robotics Control and Simulation for Deformable Terrain Applications Using the GPU

Daniel Melanz Robotics Research Engineer, Energid
Highly-Rated Speaker
Daniel Melanz works as a robotics research engineer at Energid. His technical area of focus is modeling and simulation using high performance computing with an emphasis on terramechanics and multiphysics. Daniel has a Ph.D. in mechanical engineering from the University of Wisconsin - Madison.

" When we pick up a coffee cup, our brains run a simulation first. Ceramic mug or paper cup? Is it piping hot? Is the lid on tight? Robots need to simulate their movements, too. And with our Actin Simulation software, we make easy the hard problem of integrating vision systems, coordinating between multiple robots, performing kinematics and rigid-body dynamics calculations: All in the service of ensuring the system is running in its trimmest and most efficient configuration. The GPU-based particle system simulator allows for complex simulations of granular material in Actin. Simulations like these have the potential to predict the transient soil behavior exhibited under severe vehicle maneuvering in robotics operations, some of which include tire spinning in sand and sinkage caused by large slips. This simulator is tightly integrated with Actin's dynamic simulation system for robotic manipulators and is implemented using NVIDIA's CUDA platform for processing on compatible GPUs. "

Level: Intermediate
Type: Talk
Tags: Intelligent Machines and IoT; Computational Physics; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7317 - Improving Network Accuracy With Augmented Imagery Training Data

Theodore Hromadka Senior Software Engineer, Integrity Applications Incorporated
Theodore V. Hromadka III is a senior software engineer at Integrity Applications Incorporated, working on the HPCMP Portal project. His research areas include machine learning, high performance computing, and mobile app development. He is a graduate student in computer science at University of California, San Diego.
Niels Olson Pathology Resident, Naval Medical Center San Diego
Dr. Niels Olson is a pathology resident at Naval Medical Center San Diego. His research includes prostate cancer biomarkers and image analysis. Niels earned his M.D. at Tulane. He majored in physics at the Naval Academy and served as a surface warfare officer. His hobbies are scientific Python and surfing.

One of the biggest challenges in machine learning today is producing the training data. We'll compare different methods for augmenting a medical imagery training dataset for supervised learning. The different augmentation methods are assessed with respect to their impact on cost, network accuracy, and overfitting. We'll focus on prostate cancer data from the Joint Pathology Center, which is being used in the White House Cancer Moonshot project.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Healthcare and Life Sciences; Medical Imaging; Federal

Day: TBD
Time: TBD
Location: TBD

S7319 - New Approaches to the Direct Solution of Large-Scale Banded Linear Systems

Samuel Rodriguez Bernabeu Research Engineer , Barcelona Supercomputing Center (BSC)
Samuel Rodriguez Bernabeu is an aerospace engineer with strong knowledge in applied math, parallel programming models, and computer architecture. Samuel works at the Spanish National Supercomputing Institute (BSC) on the design and development of new algorithms for solving large-scale systems of linear equations on supercomputers using direct methods. Before that, he was responsible for the optimization of critical on-board software components for different aerospace projects. His research interests lie in the fields of numerical linear algebra, accuracy and stability of numerical methods, and parallel computing.

We approach the problem of solving large-scale, extremelly ill-conditioned banded linear systems using direct methods. Unlike traditional approaches, we focus on limiting the memory footprint of the algorithms rather than the FLOP count. To reduce the memory demand, BLAS-3 pre- and post-processing of the linear system are required. While this increases considerably the number of calculations required to solve the system, most of this work can be done very efficiently on the GPU. In this way, using GPUs allows us to solve much larger problems than state-of-the-art banded direct solvers on modern architectures. We'll present results for problems arising from realistic oil and gas scenarios, and we'll show that these techniques allow us to solve systems of tens of millions of equations using significantly less memory than currently available direct banded solvers.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7320 - Optimizing Efficiency of Deep Learning Workloads through GPU Virtualization

Tim Kaldewey Performance Architect, IBM Watson
Tim Kaldewey is the performance architect for the Watson Innovations (R&D) group at IBM. Before joining IBM, Tim worked at Oracle's special projects group as a senior researcher with a focus on high-performance data management. He joined IBM Research in 2010 and moved to his current position in the Watson Group when it was established in 2014. He received his Ph.D. in computer science from the University of California Santa Cruz in 2010. Tim has published over two dozen articles, which include two best paper awards at major conferences. He is also an adjunct faculty at the University of Pennsylvania, where he teaches selected GPU acceleration topics.
David K. Tam Performance Analyst, IBM
David Tam is a performance analyst in the Power Systems Performance Department at the IBM Canada Lab. David has worked on performance analysis tools development, performance optimization of Watson Services on Power Systems, and performance analysis of hardware-accelerated databases. In 2016, he received an IBM Outstanding Technical Achievement Award for his work on deep optimization of Watson Services on Power Systems. David received B.A., M.S., and Ph.D. in computer engineering from the University of Toronto in 1999, 2003, and 2010, respectively. He is author or co-author of eight technical papers.
Michael Gschwind Chief Engineer, Machine Learning & Deep Learning, IBM
Michael Gschwind is chief engineer for machine learning and deep learning at IBM Systems, where he leads the development of hardware/software integrated products for cognitive computing. During his career, Michael has been a technical leader for IBM's key transformational initiatives, leading the development of the OpenPOWER hardware architecture as well as the software interfaces of the OpenPOWER software ecosystem. In previous assignments, he was a chief architect for Blue Gene, POWER8, POWER7, and Cell BE. Michael is a fellow of the IEEE, an IBM Master Inventor, and a member of the IBM Academy of Technology.

Cognitive applications are reshaping the IT landscape with entire data centers designed and built solely for that purpose. Though computationally challenging, deep learning networks have become a critical building block to boost accuracy of cognitive offerings like Watson. We'll present a detailed performance study of deep learning workloads and how sharing accelerator resources can improve throughput by a factor of three, effectively turning a four GPU commodity cloud system into a high-end, 12-GPU supercomputer. Using Watson workloads from three major areas that incorporate deep learning technology (language classification, visual recognition, and speech recognition), we document effectiveness and scalability of this approach.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Performance Optimization
Industry Segments: Cloud Services

Day: TBD
Time: TBD
Location: TBD

S7321 - Accelerating Document Retrieval and Ranking for Cognitive Applications

Tim Kaldewey Performance Architect, IBM Watson
Tim Kaldewey is the performance architect for the Watson Innovations (R&D) group at IBM. Tim received his Ph.D. in computer science from the University of California, Santa Cruz, in 2010. He joined IBM Research in 2010 and moved to his current position in the Watson Group when it was established in 2014. Before joining IBM, he worked at Oracle's special projects group as a senior researcher with a focus on high-performance data management. Tim has published over two dozen articles, which include two best paper awards at major conferences. He is also an adjunct faculty at the University of Pennsylvania, where he teaches selected GPU acceleration topics.
David Wendt Programmer, IBM
David Wendt is a senior staff member of the IBM Watson Performance team in Research Triangle Park, NC. He has a Master's degree in Electrical Engineering from Johns Hopkins University. He is also an IBM Master Inventor with 12 granted US patents in software development.

Based on a comprehensive performance study of Watson workloads, we'll deep dive into optimizing critical retrieve and rank functions using GPU acceleration. The performance of cognitive applications like answering natural language questions heavily depends on quickly selecting the relevant documents needed to generate a correct answer. While analyzing the question to determine appropriate search terms, weights, and relationships is relatively quick, retrieving and ranking a relevant subset from millions of documents is a time-consuming task. Only after completing it can any advanced natural language processing algorithms be effective.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Federal

Day: TBD
Time: TBD
Location: TBD

S7322 - Persistent Kernel: Real-Time, Low-Latency and High-Performance Computation on Pascal

Julien Bernard Software Engineer, Laboratoire d'études spatiales et d'instrumentation en astrophysique
Julien Bernard has been a research associate at LESIA - Observatoire de Paris since 2015. Julien is responsible for the design and development of accelerated solutions in real-time environments within the Green Flash European Project (Horizon 2020). He graduated with honors in embedded software engineering from École d'Ingénieur Denis Diderot (EIDD - Engineering school) at Paris University.

Learn how to design real-time, low-latency, and high-throughput systems based on GPUs and using GPU-Direct for efficient data transfer. We'll demonstrate how persistent kernel provides the ability to handle continuous data stream without any intermediary, bypassing the CPU execution, and reduce latency and jitter. We'll also see how this strategy is used on our NVIDIA DGX-1-based demonstrator in the context of Green Flash, an European project that aims to build a prototype for the next-generation real-time controller targeting the European Extremely Large Telescope's adaptive optics instrumentation.

Level: Intermediate
Type: Talk
Tags: Astronomy and Astrophysics; Performance Optimization; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7324 - Bringing NVIDIA GPUs to the PGAS/OpenSHMEM World: Challenges and Solutions

Dhabaleswar K. (DK) Panda Professor and University Distinguished Scholar, The Ohio State University
Highly-Rated Speaker
Dhabaleswar K. (DK) Panda is a professor and University Distinguished Scholar of Computer Science and Engineering at Ohio State University. D.K. has published over 400 papers in major journals and international conferences. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP, and RoCE) open-source software package, developed by his research group, is used by more than 2,675 organizations in 83 countries. This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade, including the current No. 1. More than 402,000 downloads of this software have taken place from the project's website alone. He is an IEEE fellow and a member of ACM.

Learn about techniques and solutions that bring GPU computing to the world of partitioned global address space (PGAS) models, especially with the emerging OpenSHMEM paradigm. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop parallel applications with dynamic and irregular communication patterns. However, the existing OpenSHMEM standards do not support direct communication on GPU memory. We'll discuss simple extensions to the OpenSHMEM model to address this issue. We'll also present challenges and solutions in designing NVIDIA® CUDA®-aware runtimes to support these extensions and optimize data movement using CUDA IPC and GPUDirect RDMA features. And we'll demonstrate the impact of these concepts to application performance.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing; Programming Languages; Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7325 - Behavioral Additive Manufacturing: Adaptive 3D Printing Using Multi-Agent Systems and Deep Learning

Alisa Andrasek Director, University College London, Wonderlab/Biothing
Alisa Andrasek is an architect and designer, whose work focuses on robotics and computational processes for innovation in architecture, design, construction, and novel manufacturing. She is a director of Biothing, co-director of Bloom Games, co-founder of AI-Build, and director of Wonderlab Research at the UCL Bartlett. Alisa holds a professorship at the European Graduate School and has taught at the AA DRL, Columbia GSAPP, UPenn, and RMIT. She has received numerous awards and her work is part of the permanent collections at the Centre Pompidou Paris, New Museum NY, Storefront NY, FRAC Collection, and TB-A21 Vienna.

We'll introduce autonomously constructed architecture by using multi-agent systems (MAS) and deep learning. 3D printing path adapts in real time to the unpredictable material behavior, by using an NVIDIA Jetson card on an industrial robotic arm. We'll explain path generation, real-time visual tracking of material, recomputing of robotic targets, and finally experiments with real-time MAS adaptation for emergent stable structures through code and video recordings of 3D printing process and its printed structures.

Level: All
Type: Talk
Tags: Intelligent Machines and IoT; Computer Vision and Machine Vision; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7326 - Real-Time Vertical Relief Profile and Free Space Estimation for Low-Altitude UAV-Based Sprayer

Victor Sakharchuk AI lead, Kray Technologies
Victor Sakharchuk is the AI lead at Kray Technologies, which he joined in 2015. He develops industrial robotic and drone-based sprayers for crop protection. Victor joined a volunteer team of UAV development for the Ukrainian armed forces as a lead developer of a GPS-free optical navigation system in 2014. He previously worked on projects on 3D rendering, image and video processing, 3D reconstruction for mobile and desktop platforms at ScopicSoftware, in Massachusetts. From 2009 to 2010, he worked as a developer of electronics hardware and PC interfaces on digital scanning microscopes project at Microptik BV, Netherlands. He graduated from Volyn State University, Ukraine, with an M.S. in theoretical physics.

We'll explore how autonomous drone on-board vertical relief profile and free space estimation algorithms can benefit from using modern mobile GPUs like the NVIDIA(R) Tegra(R) X1 for a safe and precise flight at extremely low altitude and high speeds. We'll discuss how GPUs can speed up computer vision algorithms, allowing real-time performance on drone on-board computers. We'll also look at using direct parallelism of NVIDIA CUDA(R) cores for speeding up dynamic programming-based occupancy grid segmentation and vertical relief profile estimation as piecewise linear approximation. Additionally, we'll show how shared memory usage can help in speeding up dense correlation stereo algorithm allowing real-time performance at a subpixel level.

Level: Intermediate
Type: Talk
Tags: Intelligent Machines and IoT; Computer Vision and Machine Vision

Day: TBD
Time: TBD
Location: TBD

S7328 - The NVIDIA Iray Light Transport Simulation and Rendering System

Alexander Keller Director of Research, NVIDIA
Alexander Keller is a director of research at NVIDIA, leading advanced rendering research. Before, he had been the chief scientist at Mental Images, responsible for research and conception of products and strategies, including the design of the NVIDIA Iray renderer. Prior to industry, Alexander worked as a full professor for computer graphics and scientific computing at Ulm University, where he co-founded the UZWR (Ulmer Zentrum fur wissenschaftliches Rechnen) and received an award for excellence in teaching. Alexander holds a Ph.D. in computer science, has authored more than 27 granted patents, and has published more than 50 papers, mainly in the area of quasi-Monte Carlo methods and photorealistic image synthesis using ray tracing.
Carsten Wächter Senior Software Engineer, NVIDIA
Carsten Wachter is a senior software engineer at NVIDIA, based in Berlin, and one of the leading contributors to the NVIDIA Iray rendering system and co-writer of its prototype. Carsten is an expert in GPU programming, quasi-Monte Carlo methods, and light transport simulation, including ray tracing. He holds a Ph.D. in computer science, which he received in 2007 from the University of Ulm, Germany, for his dissertation "Quasi-Monte Carlo Light Transport Simulation by Efficient Ray Tracing." His diploma thesis treated "Realtime Ray Tracing."

We reason about the design decisions that led to the system architecture of NVIDIA Iray. The scalable parallelization from single devices to clusters of GPU systems required new approaches to motion blur simulation, anti-aliasing, and fault tolerance, which are based on consistent sampling that at the same time enables push-button rendering with only a minimal set of user parameters. We then dive into technical details about light transport simulation, especially on how Iray deals with geometric light sources, importance sampling, decals, and material evaluation in order to be efficient on GPUs. It is remarkable how well the physically based system extends to modern workflows like, for example, light path expressions and matte objects. The separation of material definition and implementation has been key to the superior performance and rendering quality and resulted in the emerging standard MDL (material definition language).

Level: Advanced
Type: Talk
Tags: Rendering and Ray Tracing

Day: TBD
Time: TBD
Location: TBD

S7329 - Open-Source Tools for GPU Programming Assignments in Large Classroom Settings

Abdul Dakkak Ph.D. Student, University of Illinois Urbana-Champaign
Abdul Dakkak is a computer science Ph.D. student at the University of Illinois Urbana-Champaign. Abdul got his undergraduate degree in pure math from the University of Toledo. His research interests include computers, high-level programming languages, and distributed application orchestration.
Carl Pearson Ph.D. Student, University of Illinois - Urbana Champaign
Carl Pearson is an electrical and computer engineering Ph.D. student at the University of Illinois Urbana-Champaign. He got his undergraduate degree in engineering from Harvey Mudd College. His research interests include high-level kernel programming languages, distributed applications, and workload characterization. When he is not doing research, he is helping run the local chapter of Amnesty International.
Cheng Li Ph.D. Student, University of Illinois - Urbana Champaign
Cheng Li is a computer science Ph.D. student at the University of Illinois Urbana-Champaign. She got her bachelor's and master's degrees from the University of Michigan. Her research interests include hardware/software co-design that improves application performance, energy efficiency, and system utilization by employing heterogeneous parallel computing techniques, such as GPU.

Teaching using GPUs is a challenge because of the need for special hardware and software resources. This is exacerbated when class enrollment is in the thousands. This talk showcases open source tools developed at University of Illinois - Urbana Champaign, and divulges some insights gathered while teaching to thousands of students from over 130 countries. Two tools will be presented: 1) WebGPU -- an online portal for GPU programming where students are presented with labs that test NVIDIA® CUDA® concepts. The students develop their code within the browser. The system then autogrades the submission. 2) RAI is an interactive command line tool used for project submissions. Students specify steps to run their project, which is deployed to a worker node and run within a container. Both systems are designed for fault tolerance, scalability to thousands of concurrent submissions, resilience to buffer overflows, and expansion using on premise or cloud compute resources.

Level: Beginner
Type: Talk
Tags: Tools and Libraries; Data Center and Cloud Computing; Programming Languages

Day: TBD
Time: TBD
Location: TBD

S7330 - Sentiment Analysis Through the Use of Unsupervised Deep Learning

Stephen McGough Senior Lecturer (Associate Professor), Durham University
Stephen McGough is a senior lecturer (equivalent to an associate professor in the U.S.) in computing sciences at Durham University, UK. Stephen obtained his Ph.D. in the area of parallel simulation and has worked for many years in the areas of parallel computing and simulation. This has lead to over 50 publications in the area of parallel computing, including receiving the NVIDIA best paper award at HiPC 2012. His research focuses on the use of novel computing technologies to solve real-world challenges. This has lead to him being a key player as part of the NVIDA CUDA Research Centre.

It is estimated that 85% of worldwide data is held in unstructured/unlabelled formats - increasing at a rate of roughly 7 million digital pages per day. Exploiting these large datasets can open the door for providing policy makers, corporations, and end-users with unprecedented knowledge for better planning, decision making, and new services. Deep learning and probabilistic topic modeling have shown great potential for analysing such datasets. This analysis helps in: discovering anomalies within these datasets, unravelling underlying patterns/trends, or finding similar texts within a dataset. We'll illustrate how we can use a combined unsupervised deep learning and topic modeling approach for sentiment analysis requiring minimal feature engineering or prior assumptions, and outperforming the state of the art approaches to sentiment analysis.

Level: All
Type: Talk
Tags: Accelerated Analytics; Deep Learning and AI; Federal

Day: TBD
Time: TBD
Location: TBD

S7331 - Massively Parallel Landscape-Evolution Modelling using General Purpose Graphical Processing Units

Stephen McGough Senior Lecturer (Associate Professor), Durham University
Stephen McGough is a senior lecturer (equivalent to an associate professor in the U.S.) in computing sciences at Durham University, UK. Stephen obtained his Ph.D. in the area of parallel simulation and has worked for many years in the areas of parallel computing and simulation. This has lead to over 50 publications in the area of parallel computing, including receiving the NVIDIA best paper award at HiPC 2012. His research focuses on the use of novel computing technologies to solve real-world challenges. This has lead to him being a key player as part of the NVIDA CUDA Research Centre.

Landscape Evolution Modeling (LEM) is used to understand how landscapes evolve over millions of years. It is based on a regular grid of cells, each representing a height in the landscape. For each simulated year, we follow the processes of computing: how water flows between cells, the summation of water flowing through cells and the amount of erosion/deposition. Traditionally, due to computational complexity, such simulation models have only been performed on trivially small landscapes of 5,000 cells taking 5 hours to compute 1,000 years. However, researchers wish to perform simulations on massive landscapes (50+ million) over millions of years. We demonstrate here PARALEM, a GPGPU-enabled LEM capable of two to three orders of magnitude speedup in comparison to the best-in-class LEM software.

Level: All
Type: Talk
Tags: Computational Physics; HPC and Supercomputing; Earth Systems Modeling

Day: TBD
Time: TBD
Location: TBD

S7332 - Accelerated Astrophysics: Using NVIDIA® DGX-1™ to Simulate and Understand the Universe

Brant Robertson Associate Professor of Astronomy and Astrophysics, University of California, Santa Cruz
Brant Robertson is an Associate Professor in the Department of Astronomy and Astrophysics at the University of California, Santa Cruz. His research interests include theoretical topics related to galaxy formation, dark matter, hydrodynamics, and numerical simulation methodologies. Brant was previously an assistant professor at the University of Arizona from 2011-2015, held a Hubble Fellowship in the Astronomy Department at the California Institute of Technology from 2009-2011, and a Spitzer and Institute Fellowship at the Kavli Institute for Cosmological Physics and Enrico Fermi Institute at the University of Chicago from 2006-2009. Brant earned his Ph.D. in astronomy from Harvard University in 2006, and received his B.S. in physics and astronomy at the University of Washington, Seattle in 2001. He can be found on Twitter at @brant_robertson.

Get an overview of how GPUs are used by computational astrophysicists to perform numerical simulations and process massive survey data. Astrophysics represents one of the most computationally heavy sciences, where supercomputers are used to analyze enormous amounts of data or to simulate physical processes that cannot be reproduced in the lab. Astrophysicists strive to stay on the cutting edge of computational methods to simulate the universe or process data faster and with more fidelity. We'll discuss two important applications of GPU supercomputing in astrophysics. We'll describe the astrophysical fluid dynamics code CHOLLA that runs on the GPU-enabled supercomputer Titan at Oak Ridge National Lab and can perform some of the largest astrophysical simulations ever attempted. Then we'll describe the MORPHEUS deep learning framework that classifies galaxy morphologies using the NVIDIA DGX-1 deep learning system.

Level: All
Type: Talk
Tags: Astronomy and Astrophysics; HPC and Supercomputing
Industry Segments: Higher Education / Research; Government / National Labs

Day: TBD
Time: TBD
Location: TBD

S7334 - Computational Focus-Tunable Near-eye Displays

Nitish Padmanaban PhD Student, Stanford Computational Imaging Lab
Nitish Padmanaban is a second year Ph.D. student in the Stanford Computational Imaging lab. Nitish did his undergraduate degree at UC Berkeley focusing on signal processing, and now works on opto-computational displays for VR. In particular, he has spent the last year working on building and evaluating displays to alleviate the vergence-accommodation conflict, and is now investigating the role of the vestibular system in causing simulator sickness in VR.

We'll explore unprecedented display modes afforded by computational focus-tunable near-eye displays with the goal of increasing visual comfort and providing more realistic and effective visual experiences in virtual and augmented reality. Applications of VR/AR systems range from communication, entertainment, education, collaborative work, simulation, and training to telesurgery, phobia treatment, and basic vision research. In every immersive experience, the primary interface between the user and the digital world is the near-eye display. Many characteristics of near-eye displays that define the quality of an experience, such as resolution, refresh rate, contrast, and field of view, have been significantly improved over the last years. However, a pervasive source of visual discomfort prevails: the vergence-accommodation conflict (VAC). Further, natural focus cues are not supported by any existing near-eye display.

Level: All
Type: Talk
Tags: Virtual Reality and Augmented Reality

Day: TBD
Time: TBD
Location: TBD

S7339 - A Sleepless Eye on Patient Monitors: Real-Time AI in Healthcare

Adam Lichtl Founder & CEO, Delta Brain Inc.
Adam Lichtl is the founder and CEO of Delta Brain Inc., a Los Angeles-based artificial intelligence healthcare startup focused on bringing reliable pattern recognition to continuous patient monitoring data. He received his B.S. in physics from Caltech, and a Ph.D. in computational physics from Carnegie Mellon University, where he also spent his evenings earning his MBA from the Tepper School of Business. After completing a RIKEN/BNL research fellowship at Brookhaven National Laboratory, he worked as a vice president of Quant Strategies at Morgan Stanley, and as director of research at a large aerospace company.
Kevin Lung Co-Founder & Director of Engineering, Delta Brain Inc.
Kevin Lung is co-founder and director of engineering at Delta Brain Inc. Previously, he worked at a large aerospace company, where he was a key developer in the creation of high-performance rocket engine simulations. Kevin received his B.S. in engineering physics from UC Berkeley, and his Ph.D. in particle physics from UCLA. His skillset includes software engineering, statistical optimization, and machine learning.

Critical medical decisions are made each second, and are often informed by the real-time interpretation of complex or subtle patterns in continuous patient monitoring data. Manual review is intermittent and imperfect, but traditional automation attempts have been unreliable and often suffer from high false positive rates, limiting their practical utility in clinical settings. Recent advances in deep learning algorithms and GPU acceleration enable the creation of streaming systems that reliably, continuously, and tirelessly pick out patterns and trends to support timely and appropriate clinical decisions for the benefit of the patient. We'll describe the purpose, design, and impact of one such system, as created by Delta Brain Inc.

Level: Intermediate
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7340 - Hydra: A Framework for Data Analysis in Massively Parallel Platforms

Antonio Augusto Alves Junior Post-doc, University of Cincinnati
Antonio Augusto started his activities in research as an undergraduate student in physics, studying the modeling of dissipative quantum systems. As a masters student, Antonio undertook a thesis in theoretical physics devoted to the studies of the electromagnetic field confined in cavities with moving boundaries. His collaboration with the LHCb experiment at CERN began in 2005 when he obtained a Ph.D. position at the Centro Brasileiro des Pesquisas Fisicas in Rio de Janeiro, joining the local group involved in the construction the detector. In 2009, Antonio started a two-year INFN fellowship for foreign physicists at the "Sezione di Roma." His skills in physics together with the ability to deal with software and to develop specific analysis tools and methodologies were determinant for forming a close and effective working group. In 2015, he joined the LHCb group at the University of Cincinnati to work on the development of software for data analysis.

We'll discuss Hydra, a templatized header-only, C++11-compliant library for data analysis on massively parallel platforms targeting, but not limited to, the field high-energy physics research. Hydra supports the description of particle decays via the generation of phase-space Monte Carlo, generic function evaluation, data fitting, multidimensional adaptive numerical integration, and histograming.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Accelerated Analytics; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7341 - Using OpenACC for NGS Techniques to Create a Portable and Easy-to-Use Code Base

Sunita Chandrasekaran Assistant Professor, University of Delaware
Sunita Chandrasekaran is an assistant professor at the University of Delaware with the Computer & Information Sciences Department. Her research interests include exploring suitability of high-level programming models and runtime systems for HPC and embedded platforms along with exploring challenges while migrating scientific applications to such systems. Her research publications include developing and using parallel programming models, building performance and power modeling for GPUs, constructing compiler and runtime frameworks, and adapting scientific applications on parallel computing platforms. Sunita holds a Ph.D. from NTU, Singapore with specialization on designing software for FPGAs.

Happy with your code but re-writing every time a hardware platform changes? Know NVIDIA(R) CUDA(R) but want to use a higher-level programming model? OpenACC is a directive-based technique that enables more science and less programming. The model facilitates reusing code base on more than one platform. This session will help you: (1) Learn how to incrementally improve a bioinformatics code base using OpenACC without losing performance (2) Explore how to apply optimization techniques and the challenges encountered in the process. We'll share our experience using OpenACC for DNA Next Generation Sequencing techniques.

Level: Intermediate
Type: Talk
Tags: Computational Biology; Programming Languages; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7342 - GPU Data Mining in Neuroimaging Genomics via Explorative Analysis and Interactive Visualization

Robert Zigon Sr Staff Research Engineer, Beckman Coulter
Bob Zigon is senior staff research engineer at Beckman Coulter, a life sciences company, where he has worked for 14 years. Bob was the technical lead at Beckman Coulter on a GPU-accelerated product called Kaluza, which is used in leukemia and lymphoma research around the world. He has undergraduate degrees in computer science and mathematics, and well as a master's degree in computer science, from Purdue University, where he is currently enrolled in the Ph.D. program for computer science. His interests are in high performance computing, machine learning, and numerical analysis.

Large datasets of imaging and genomic data have become available for research into the correlation between genome and brain structure for Alzheimer's disease. We'll present a GPU-enabled tool that permits interactive correlation between the attributes of the MRI voxels and single nucleotide polymorphisms in DNA sequences of Alzheimer's patients. The system runs on a desktop PC and is several orders of magnitude faster than the Matlab version.

Level: Intermediate
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Accelerated Analytics; Computational Biology; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7343 - Optimizer's Toolbox: Fast CUDA Techniques for Real-Time Image Processing

Sarah Kabala High-Performance Graphics Engineer, Aechelon Technology, Inc.
Sarah Kabala is a graphics engineer at Aechelon Technology, pioneers in image generator hardware and software for aircraft simulators and geospecific Earth database creation. Her development work includes real-time image-processing kernels for filter effects and object tracking and artist-friendly tools to detect features in aerial imagery with machine learning. Following a passion for applied programming, Sarah left her hometown Ph.D. program at Iowa State University to join Aechelon. Raised by a blind parent, she began thinking about vision and imagery at an early age. In her free time, Sarah plays Tetris and builds Duplos with her nephews.

Take your kernels to the next level with performance-enhancing techniques for all levels of the CUDA memory hierarchy. We'll share lessons gleaned from implementing demanding image-processing algorithms into the real-time visual simulation world. From CPU prototype to optimized GPU implementation, one algorithm saw 150,000X speedup. Techniques to be presented include: instantaneous image decimation; CDF via warp shuffle; block and grid shapes for easy-to-program cache optimization; designing XY-separable kernels and their intermediate data; and sliding window tradeoffs for maximum cache locality. Straightforward examples will make these optimizations easy to add to your CUDA toolbox.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Video and Image Processing; Performance Optimization
Industry Segments: Defense; Software; Manufacturing; Media & Entertainment; Aerospace

Day: TBD
Time: TBD
Location: TBD

S7344 - Kokkos – The C++ Performance Portability Programming Model

Christian Trott Senior Member of Technical Staff, Sandia National Laboratories
Christian Trott is a high performance computing expert with experience in designing and implementing software for GPU and MIC compute-clusters. Christian's prior scientific work focused on computational material research using Ab-Initio calculations, molecular dynamic simulations, and Monte Carlo methods. As of 2015, Christian is a senior member of technical staff at the Sandia National Laboratories. He is a core developer of the Kokkos programming model with a large role in advising applications on adopting Kokkos to achieve performance portability for next-generation supercomputers. He earned a Ph.D. from the University of Technology Ilmenau in theoretical physics.
H. Carter Edwards Principal Member of Technical Staff, Sandia National Laboratories
Highly-Rated Speaker
H. Carter Edwards is the principal investigator and architect for the Kokkos project at Sandia National Laboratories. Carter has over three decades of experience in modeling and simulation software development and over two decades of experience in HPC, parallel processing, and C++ software development. For the last several years, his HPC focus has been on algorithms and programming models for thread-scalable and performance portable parallelism across next-generation platform node architectures. Carter has a B.S. and M.S. in aerospace engineering and a Ph.D. in computational mathematics. He represents Sandia on the ISO C++ language standard committee.

Kokkos is a programming model developed at Sandia National Laboratories for enabling application developers to achieve performance portability for C++ codes. It is now the primary programming model at Sandia to port production-level applications to modern architectures, including GPUs. We'll discuss the core abstractions of Kokkos for parallel execution as well as data management, and how they are used to provide a critically important set of capabilities for the efficient implementation of a wide range of HPC algorithms. We'll present performance evaluations on a range of platforms to demonstrate the state of the art of performance portability. This will include data from Intel KNL-based systems as well as IBM Power8 with NVIDIA® NVLink™-connected NVIDIA Tesla® P100 GPUs. We'll also provide an overview of how Kokkos fits into the larger exascale project at the Department of Energy, and how it is used to advance the development of parallel programming support in the C++ language standa

Level: All
Type: Talk
Tags: HPC and Supercomputing; Programming Languages

Day: TBD
Time: TBD
Location: TBD

S7345 - High-Performance Broadcast Designs for Streaming Applications on Multi-GPU InfiniBand Clusters

Dhabaleswar K. (DK) Panda Professor and Distinguished University Scholar, The Ohio State University
Highly-Rated Speaker
Dhabaleswar K. (DK) Panda is a professor and University Distinguished Scholar of Computer Science and Engineering at Ohio State University. D.K. has published over 400 papers in major journals and international conferences. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP, and RoCE) open-source software package, developed by his research group, is used by more than 2,675 organizations in 83 countries. This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade, including the current No. 1. More than 402,000 downloads of this software have taken place from the project's website alone. He is an IEEE fellow and a member of ACM.

Learn recent developments in middleware design to boost performance of GPU-based streaming applications. Several runtimes already support and optimize GPU communication using various NVIDIA® CUDA® features. Similarly, some runtimes use InfiniBand hardware multicast to boost broadcast performance for host-based communications. We'll focus on the challenges in combining and fully utilizing GPUDirect RDMA (GDR) and hardware InfiniBand multicast technologies in tandem to design support for high-performance heterogeneous broadcast operation for streaming applications. Further, we present associated challenges and designs in supporting reliability for clusters with multi-HCA and multi-GPU configurations. Performance evaluation of the proposed designs on various system configurations will be presented and analyzed.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing; Tools and Libraries; Data Center and Cloud Computing

Day: TBD
Time: TBD
Location: TBD

S7346 - Real Time American Sign Language Video Captioning Using Deep Neural Networks

Syed Ahmed Research Assistant, Rochester Institute of Technology
Syed Tousif Ahmed is majoring in computer engineering at RIT and works there as a research assistant in the Future Everyday Technology Lab and in the Center on Access Technology. Syed's interests lie in computer vision, machine learning, embedded systems, and cryptography.

We'll demonstrate how to build an end-to-end video captioning system using deep neural networks. The specific application we'll discuss is an American Sign Language video captioning system. We'll discuss implementation details of the neural network with popular frameworks, like TensorFlow and Torch, as well as how to deploy the system on embedded platforms, like the NVIDIA® Jetson™TX1 and NVIDIA SHIELD™ tablet, to achieve real-time captioning of live videos.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Computer Vision and Machine Vision; Media and Entertainment
Industry Segments: Media & Entertainment

Day: TBD
Time: TBD
Location: TBD

S7347 - A Deep Hierarchical Model for Joint Object Detection and Semantic Segmentation

Zhao Chen Machine Learning Software Intern, NVIDIA
Zhao Chen received his B.A. in physics and mathematics from Harvard University in 2011, and is expected to receive his Ph.D. in physics with a minor in statistics from Stanford University in 2017. His graduate work with large X-ray imaging experiments has led to articles in top physics journals, but also fanned his interest in big data. His first foray into deep learning and AI came from the machine learning course at Stanford, where he won a best project award for his work in segmenting glioblastoma brain tumors from MRI scans. Since then, he has worked on various deep learning projects that range from computer vision techniques for video game AI to deep RNNs for Twitter analytics. He also enjoyed an internship with the Deep Learning Applied Research group at NVIDIA, where he built deep computer vision models to help cars learn to see. He is currently looking forward to receiving his degree and dedicating his time fully to the exciting field of AI and machine learning research.

How do we tackle multiple vision tasks from within the same deep neural network? We'll address this problem by proposing a neural network architecture that can simultaneously segment and detect objects within an image. We'll begin with a brief overview of deep learning as applied to computer vision, and various popular methods for object detection and semantic segmentation. We'll then propose our model: a hierarchical architecture that explicitly allows fine-grain information from one task to aid in the performance of coarser tasks. We'll show that our multi-task network outperforms and is faster than networks trained to tackle each task independently. We'll then visualize our network results on the Cityscapes data set and discuss potential applications of our ideas, especially in the context of autonomous driving.

Level: All
Type: Talk
Tags: Computer Vision and Machine Vision; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7348 - Deep Learning in Ford's Autonomous Vehicles

Bryan Goodman Manager and Technical Leader, Autonomous Vehicles Machine Learning, Ford Motor Company
Bryan Goodman leads Ford Motor Company's machine learning for autonomous vehicles department, including groups in Michigan, California, and Israel. Bryan has applied machine learning and optimization methods to business analytics problems at Ford for 15 years. He also started a group working on machine learning applications for autonomous vehicles and advanced driver assistance systems. Bryan received his B.S. in mathematics and chemistry from Hope College, in Holland, Michigan, and his Ph.D. in physical chemistry and computational science and engineering from the University of Illinois.

We'll provide an overview of some of the models Ford is using to fuse sensor information, and give some examples of the performance optimization. Ford is using deep learning for autonomous vehicle perception across a multitude of sensors. It is important that these models have optimized performance to process high-resolution images, lidar point clouds, and other sensor inputs in a timely fashion. Ford is exploring a variety of methods to push the run-time performance to new limits and maximize the use of the resources available. These include modifying the underlying models, the data structures, and the inference engine itself.

Level: All
Type: Talk
Tags: Self-Driving Cars; Performance Optimization; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7349 - Getting Started with GPUs for Linux Virtual Desktops on VMware Horizon

Tony Foster Principal Technical Marketing Engineer for EUC Solutions, Dell EMC
Tony Foster, AKA the WonderNerd, has been involved in the virtualization industry since 2005. He has architected, deployed, and supported virtualization systems for organizations large and small in the public and private sectors. In recent years he has branched out into GPU architectures and is exploring ways to leverage them in the data center. Tony has been at the forefront of designing GPU solutions at Dell EMC for the last five years. This includes significant contributions to the VCE Technology Extension for Compute offered by the company. Tony is also active in the social communities as a VMware vExpert, VMware EUC Champion, and an NVIDIA GRID Community Advisor. You can often find him presenting at conferences around both virtualization and GPUs.
Trey Johnson Sr. Solutions Architect, Dell EMC
Trey Johnson started his EUC involvement in a newly created role of PC Specialist, where he helped automate the AutoCAD lab by implementing diskless workstations at a small community college in Ocala, Florida. From there, he moved to the University of Florida and focused on general systems design and administration with a heavy emphasis on EUC automation starting with Novel ZENworks. After moving to Shands HeathCare at UF, he took over the Citrix farm and ZENworks environments; in addition, he implemented VMware in 2004. After working with Rapid Applications and Ron Oglesby, he returned to UF to implement virtual desktops in 2008. Trey has implemented and managed desktops as a service (DaaS) for cloud providers and continued to expand his knowledge while working for VCE, EMC, and Dell in the converged solutions area.

You've just been tasked with building a Linux VDI environment for an engineering team with graphics requirements. Now what? Join an NVIDIA GRID Community Advisor to learn the basics of setting up Linux VDI desktops with GPU capabilities and see the results we captured when we built it in the lab. This is a session for those wanting to get started with Linux virtual desktops that need GPU capabilities.

Level: Beginner
Type: Talk
Tags: Graphics Virtualization

Day: TBD
Time: TBD
Location: TBD

S7350 - Deep Learning as a Service: Experiences in Building GPU-Enabled HPC Clusters

Brian Belgodere Research Software Engineer, IBM Research
Brian Belgodere is a software engineer at IBM Research, working on the Cognitive Compute Cluster developing and building tightly coupled development tools for composing cognitive solutions. Brian has worked in distributed systems, security and compliance, service management, and systems automation. Previously, he worked for IBM's Global Business Services Division and worked as part of the Spatial Analytics and Smarter Water Practices. Brian holds a B.S. in finance and in economics from Carnegie Mellon University and a J.D. from the University of Pittsburgh.

Conducting deep learning research and development requires a combination of cutting-edge hardware, elastic software frameworks, and a collaborative research community. We'll provide the scaffolding for participants to construct an enterprise-scale, GPU-enabled high performance computing solution for machine learning and data science by drawing on the experiences gained while IBM Research built its Cognitive Computing Cluster. We'll start by discussing how to build a secure, shared-resource computing cluster optimized for deep learning. Next, we'll cover how to provide deep learning frameworks supporting speech, vision, language, and text processing and their underlying primitives. Finally, we'll discuss how to build a best practice knowledge base to improve research quality and accelerate discovery.

Level: Intermediate
Type: Talk
Tags: Data Center and Cloud Computing; HPC and Supercomputing; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7351 - Applying GPU Technology to Combat System Integration and Maintenance

Christopher Crouch Associated Member of Engineering Staff, Lockheed Martin
Christopher Crouch is an Associate Member of the Engineering Staff in the Ship Integration and Test organization at Lockheed Martin Rotary and Mission Systems (RMS). He joined the graphics team at Lockheed Martin’s Surface Navy Innovation Center (SNIC) in 2015. Chris applies 3D graphics technologies such as OpenGL, DirectX, Iray, Augmented Reality, and Virtual Reality to many different engineering disciplines throughout Lockheed Martin. He holds a Bachelor’s of Science in Computer Science from Rowan University.
Rich Rabbitz Principal Member of Engineering Staff, Lockheed Martin
Rich Rabbitz is a principal member of the engineering staff in the Ship Integration Test organization at Lockheed Martin Rotary and Mission Systems. Rich has led the graphics group in the Lockheed Martin's Surface Navy Innovation Center since its opening in 2014. He applies 3D graphics technologies such OpenGL, CUDA, OptiX, Iray, augmented reality, and virtual reality to many different engineering disciplines throughout Lockheed Martin. Rich holds an M.S. in engineering in computer graphics from the University of Pennsylvania. He is also a professor in the computer science department at Rowan University.

Lockheed Martin Rotary and Mission Systems has a rich history of integrating combat systems into naval ships and buildings. The integration of complex radar and support systems into modern war-fighting entities demands the use of a unique set of design and simulation tools to verify and optimize engineering designs before production begins. After the combat system is in the field, it is important to equip the warfighter with informative training and maintenance systems. The goal is to keep the combat system fully operational at all times. GPU technologies such as OpenGL, CUDA, OptiX, and Iray, along with virtual reality and augmented reality, make these unique design and maintenance environments possible. These design practices are being examined in the Surface Navy Innovation Center through dedicated research for domestic and international combat system integration and maintenance.

Level: All
Type: Talk
Tags: Virtual Reality and Augmented Reality; Manufacturing Industries; Federal

Day: TBD
Time: TBD
Location: TBD

S7354 - NUFFT Volume Reconstruction for Synchrotron MicroTomographic Data Using GPUs

Dinesh Kumar Computational Post Doctorate Fellow, Lawrence Berkeley National Laboratory
Dinesh Kumar is a computational postdoctorate fellow at Lawrence Berkeley National Laboratory. Dinesh is involved in developing HPC tools for analyzing synchrotron X-ray data, such as tomography and scattering. His previous work includes non-rigid image registration for GYN cancer patients at Virginia Commonwealth University and simulation fo geophysical mass flows over natural terrains during his Ph.D. work at the University at Buffalo.

We'll discuss a GPU implementation of non-uniform fast Fourier transforms-based volume reconstruction of the synchrotron tomographic data. A Python interface manages the workflow, either using a GUI or CLI.

Level: All
Type: Talk
Tags: In-Situ and Scientific Visualization; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7355 - Learning Large-Scale Multimodal Data Streams: Ranking, Mining, and Machine Comprehension

Winston Hsu Professor, National Taiwan University
Winston Hsu is a researcher dedicated to large-scale image/video retrieval/mining, visual recognition, and machine intelligence. He is keen to advance research toward business deliverables via academia-industry collaborations and co-founding startups. Winston is a professor in the Department of Computer Science and Information Engineering, National Taiwan University, a visiting scientist at Microsoft Research (2014) and IBM T.J. Watson Research (2016) for visual cognition, and co-leads the Communication and Multimedia Lab (CMLab). He is the director and principal investigator for the NVIDIA AI Lab, the first in Asia. Winston received his Ph.D. from Columbia University, New York, in 2007. Before that, he was a founding engineer in CyberLink Corp. He serves as an associate editor for IEEE Multimedia Magazine and IEEE Transactions on Multimedia. He also has lectured several highly rated technical tutorials in ACM Multimedia 2008/2009, SIGIR 2008, and IEEE ICASSP 2009/2011.
Hung-Yi Lee Assistant Professor, National Taiwan University
Hung-yi Lee is an assistant professor of the Department of Electrical Engineering, National Taiwan University, with a joint appointment at the Department of Computer Science & Information Engineering. He was the keynote speaker of a deep learning workshop held by Academia Sinica and the distinguished lecturer of the International Congress on Big Data Taipei Satellite Session. He gave a six-hour tutorial, "Understanding Deep Learning in One Day," three times. He is developing a series of deep learning techniques that let machines understand spoken content. He has the following breakthroughs in 2016: (1) Developing a new task, TOEFL listening comprehension test by machine, and proposing a new deep learning model to tackle the task, (2) Proposing to apply deep reinforcement learning on spoken content retrieval, and (3) Proposing a new deep learning framework, Audio Word2Vec, to discovery word patterns from audio signals without any supervision.

We'll demonstrate how to design the end-to-end neural networks for leveraging large-scale multimodal data streams for ranking (recommendation), mining human behaviors/interests, and machine comprehension jointly from different modalities such as images, videos, audios, and 3D models. We'll present effective neural networks for considering both sequential (temporal) and spatial (convolutional) variations and numerous strategies for cross-modal learning. We'll show how to tackle the cross-domain problems (for example, images vs. 3D models, audio vs. text), how to leverage freely available web data for training in a semi-supervised or unsupervised manner. We'll describe breakthroughs in 3D model retrieval, human activities understanding from social media, listening comprehension test, and more.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Signal and Audio Processing

Day: TBD
Time: TBD
Location: TBD

S7356 - MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning

Dhabaleswar K. (DK) Panda Professor and University Distinguished Scholar, The Ohio State University
Highly-Rated Speaker
Dhabaleswar K. (DK) Panda is a professor and University Distinguished Scholar of Computer Science and Engineering at Ohio State University. D.K. has published over 400 papers in major journals and international conferences. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP, and RoCE) open-source software package, developed by his research group, is used by more than 2,675 organizations in 83 countries. This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade, including the current No. 1. More than 402,000 downloads of this software have taken place from the project's website alone. He is an IEEE fellow and a member of ACM.
Khaled Hamidouche Research Scientist, The Ohio State University
Khaled Hamidouche is a research scientist in the Department of Computer Science and Engineering at Ohio State University. Khaled is a member of the Network-Based Computing Laboratory, led by Dr. D.K. Panda. His research interests include high-performance interconnects, parallel programming models, accelerator computing, and high-end computing applications. His current focus is on designing high-performance unified MPI, PGAS, and hybrid MPI+PGAS runtimes for InfiniBand clusters and their support for accelerators. Khaled is involved in the design and development of the popular MVAPICH2 library and its derivatives MVAPICH2-MIC, MVAPICH2-GDR, and MVAPICH2-X. He has published over 50 papers in international journals and conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences, and is a member of ACM.
Hari Subramoni Research Scientist, Ohio State University
Dr. Hari Subramoni is a research scientist in the Department of Computer Science and Engineering at the Ohio State University, USA. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data and cloud computing.

Learn about the latest developments in MVAPICH2-GDR library that helps MPI developers to exploit maximum performance and scalability on HPC clusters with NVIDIA GPUs. Multiple designs focusing on GPUDirect RDMA(GDR)_Async, non-blocking collectives, support for unified memory and datatype processing will be highlighted to boost the performance of HPC applications. Furthermore, targeting emerging deep learning frameworks, we'll present novel designs and enhancements to the MVAPICH2-GDR library to accommodate the large message and dense GPU computing requirements of the DL frameworks. Using a co-designed scheme between MVAPICH2-GDR and the Caffe workflow, we'll present OSU-Caffe, which supports an MPI-based distributed and scalable DL framework. Performance and scalability numbers of OSU-Caffe for various system configurations and datasets will also be presented.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing; Deep Learning and AI; Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7357 - Warping & Blending for Multi-Display System using NVIDIA DesignWorks

Doug Traill Manager, Pro Visualization Solution Architecture team, NVIDIA
Highly-Rated Speaker
Doug Traill leads the Professional Visualization Solution Architecture team at NVIDIA. He manages a team of leading industry solutions architects who provide technical sales leadership for NVIDIA's Quadro, GRID, and Advanced Rendering products. Prior to NVIDIA, Doug worked at Silicon Graphics for nine years in various technical roles, including solutions architect and visualization product manager. During his career, Doug has helped design and build some the world's largest visualization centers. He holds a B.S. in electronic systems and microprocessor engineering from the University of Glasgow, U.K., as well as an M.S. of telecommunications business management from King's College London, U.K.

We'll describe how to scale up from one to many displays for high-end visualization. You'll learn about NVIDIA's Warp and Blend APIs on Windows and Linux that allows you to create a truly seamless logical display comprised of many individual display outputs. With this capability, you can project graphics onto curved surfaces and implement the correct transformation entirely on the GPU without any external hardware to get the correct display transformations.

Level: Intermediate
Type: Talk
Tags: Large Scale and Multi-Display Visualization

Day: TBD
Time: TBD
Location: TBD

S7359 - Performant Deep Reinforcement Learning: Latency, Hazards, and Pipeline Stalls in the GPU Era ... and How to Avoid Them

Mark Hammond CEO & Founder, Bonsai
Mark Hammond is the founder and CEO of Bonsai. Mark has a deep passion for understanding how the mind works and has been thinking about AI throughout his career. Upon graduating from Caltech with a degree in computation and neural systems, Mark went on to positions at Microsoft and numerous startups and academia, including turns at Numenta and the Yale neuroscience department.

The headache of latency, hazards, and pipeline stalls has reared its head again, taking a new form in the GPU era. In the realm of deep reinforcement learning, stateful, interactive simulation-based workloads push this to the extreme, necessitating a handoff to the simulator on every iteration - and that simulator may not even be running on the same machines as the deep reinforcement learning model! We'll explore lessons learned on how to avoid these performance degrading modern hazards. Attendees will learn tricks and techniques - including approaches to pool multiple concurrent simulations for use with single networks - that they can employ in their own systems to increase performance with their deep reinforcement learning workloads.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Deep Learning and AI; Performance Optimization; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7360 - Deep Learning in Business Conversation Analysis

Wonkyum Lee S/W Engineer, Gridspace
Wonkyum Lee is lead speech engineer at Gridspace, where he leads speech recognition and speech signal analysis projects. He recently graduated Carnegie Mellon University with research experience of deep learning in speech processing. He has led multiple research projects such as GPU-based ASR and keyword search and noisy robust speech recognition. He also has research experience in signal processing in wireless communication and defense systems.
Anthony Scodary EVP of Engineering, Co-founder, Gridspace
Anthony Scodary previously worked at NASA's Jet Propulsion Laboratory, where he was the instrument engineer for the REMS and APXS instruments on the Curiosity Mars rover from development through surface operations. He also worked on the Juno mission to Jupiter, unmanned Global Hawk missions, and the Phoenix lander. Anthony has an engineering bachelor's from Stanford in physics and a master's in aeronautics and astronautics.

Gridspace uses GPU-accelerated deep learning to analyze conversational speech on phone calls. We'll outline our DNN-based approach as well as several commercial applications of call grading. Our GPU-based software stack provides a novel way to process large-scale speech data. Results from a recent case study show call grading to be as accurate as human call grading and highly scalable in production. Deep call analysis with 100% coverage has never been achieved before. Also we'll discuss how this system can be improved by training continuously without expert supervision.

Level: Intermediate
Type: Talk
Tags: Finance; Deep Learning and AI; Signal and Audio Processing; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7362 - Benchmarking the New Unified Memory of CUDA 8

Frank Zhao Software Architect, Dell EMC
Frank Zhao is a software architect and innovator in Dell EMC's Office of the CTO. Frank has many years of engineering experience in storage, data protection and now leads relevant advanced development projects about heterogeneous computing (GPU+CPU) in cloud, micro-service, big data in-memory analytics, and more. Frank has delivered a number of invited tech talks at Hadoop Summit, LinuxCon, ContainerCon, SNIA SDC, and more. He is an ACM member and a named inventor of four granted U.S. patents with an additional 40 pending.
Yifan Sun College Coop Student, Dell EMC
Frank Zhao is a software architect and innovator in Dell EMC's Office of the CTO. Frank has many years of engineering experience in storage, data protection and now leads relevant advanced development projects about heterogeneous computing (GPU+CPU) in cloud, micro-service, big data in-memory analytics, and more. Frank has delivered a number of invited tech talks at Hadoop Summit, LinuxCon, ContainerCon, SNIA SDC, and more. He is an ACM member and a named inventor of four granted U.S. patents with an additional 40 pending.

We'll evaluate CUDA 8's new unified memory's impact to applications with benchmarks and share practices on how to tune or build high-performance apps. Since CUDA 6, unified memory has aimed at simplifying the programmability of heterogeneous memory management while maintaining good performance. However, practical limitations prevent applications from fully taking advantage of it. The CUDA 8 release highlights an updated unified memory that both simplifies programmability and improves performance, especially when married with the new Pascal GPU architecture. We'll evaluate the new system, benchmark its performance, and share our best practices in tuning code, which could be good reference for app developers. In addition, we'll explore options and solutions on moving/exchanging data efficiently between heterogeneous devices, such as NVMe/NVRAM in modern data center or cloud environments.

Level: Intermediate
Type: Talk
Tags: Data Center and Cloud Computing

Day: TBD
Time: TBD
Location: TBD

S7363 - Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

Cheng-Han Du Postdoctoral Researcher, National Taiwan University
Cheng-han Du is a postdoctoral researcher in the Institute of Applied Mathematical Sciences, National Taiwan University. His research interests are high performance computing and hardware acceleration for deep learning and photonic simulation. Cheng-han received his B.S. from the Department of Computer Science and Information Engineering, National Taiwan University, in 2007, and his M.S. and Ph.D. from the Graduate Institute of Photonics and Optoelectronics at the same university in 2009 and 2014, respectively.

We propose several techniques for efficient multi-GPU acceleration in direct linear system solver, which is particularly designed for finite-difference frequency-domain analysis of photonic structures. The algorithm is based on compressed hierarchical Schur method (CHiS), where redundant computation can be avoided with knowledge of duplicated physical structures and numerical elimination process. Since many high-intensity matrix computations are the major workloads in the CHiS algorithm, they can be divided into multiple panels and processed by multiple GPUs. Our implementation uses multithreading to control multiple GPUs. Performance analysis shows that the workload division yields significantly better scale-up results with 4 GPUs compared with naive GPU acceleration.

Level: Advanced
Type: Talk
Tags: Computational Physics; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7364 - Using Deep Learning for Earlier Detection of Acute Infarction of the Brain

Barbaros Erdal Assistant Professor, The Ohio State University Wexner Medical Center
Barbaros Erdal received his Ph.D. in electrical and computer engineering in 2012 from The Ohio State University and began working as an assistant professor at OSU College of Medicine, Department of Radiology. He serves as the assistant chief of medical imaging informatics at OSU Wexner Medical Center, where he also serves as director of radiology computing and imaging information Sciences. He is an active member of many professional societies, such as the Radiological Society of North America and the Society of Imaging Informatics in Medicine, where he teaches courses on topics such as: image processing, business intelligence, and artificial intelligence.

We'll discuss how deep learning algorithms could be successfully used in the medical imaging domain. We'll cover how to address challenges and avoid pitfalls: (1) inherited from medical image acquisition protocols, (2) in handling 16- vs. 8-bit images, and (3) in handling 2D vs. 3D datasets using autonomous detection of acute infarction of the brain as a use case. Examples will be given using the NVIDIA(R) DIGITS(TM) platform. It is assumed that registrants are already familiar with fundamentals of image processing and deep learning.

Level: Intermediate
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7366 - Building a GPU-enabled OpenStack Cloud for HPC

Blair Bethwaite Senior HPC Consultant, Monash University
Blair Bethwaite has worked in distributed computing at Monash University, in Melbourne, Australia, for 10 years, OpenStack for the last four. Having served as team lead, architect, administrator, user, researcher, and occasional hacker, Blair's unique perspective as a science power-user, developer, and system architect has helped guide the evolution of the research computing engine central to Monash's 21st Century Microscope.

M3 is the latest generation system of the MASSIVE project, an HPC facility specializing in characterization science (imaging and visualization). Using OpenStack as the compute provisioning layer, M3 is a hybrid HPC/cloud system, custom-integrated by Monash's R@CMon Research Cloud team. Built to support Monash University's high-throughput instrument processing requirements, M3 is half-half GPU-accelerated and CPU-only. We'll discuss the design and tech used to build this innovative platform as well as detailing approaches and challenges to building GPU-enabled and HPC clouds.

Level: All
Type: Talk
Tags: Data Center and Cloud Computing; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7367 - GPU-Accelerated Similarity Searching in a Database of Short DNA Sequences

Richard Wilton Associate Research Scientist, Johns Hopkins University
Highly-Rated Speaker
Richard Wilton works on petabyte-scale databases in the Institute for Data Intensive Engineering and Science in the Department of Physics and Astronomy at Johns Hopkins University. He designed and implemented data-transformation workflows for the Pan-STARRS astronomical survey database. He is the lead developer of Arioc, a GPU-based short-read DNA sequence aligner that is a key component in the preparation of data for the NIH-funded Terabase Search Engine project.

The challenge: do interactive similarity searching in a SQL database that contains billions of short DNA sequences. The response: this database query is amenable to GPU acceleration because efficient numerical computation can be carried out in parallel on large numbers of independent data items. Implementation details and performance will be discussed, with emphasis on the integration of GPU computation with the database server environment.

Level: Intermediate
Type: Talk
Tags: Computational Biology; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7368 - PowerAI: A Co-Optimized Software Stack for AI on Power

Michael Gschwind Chief Engineer, Machine Learning & Deep Learning, IBM
Michael Gschwind is chief engineer for machine learning and deep learning at IBM Systems, where he leads the development of hardware/software integrated products for cognitive computing. During his career, Michael has been a technical leader for IBM's key transformational initiatives, leading the development of the OpenPOWER hardware architecture as well as the software interfaces of the OpenPOWER software ecosystem. In previous assignments, he was a chief architect for Blue Gene, POWER8, POWER7, and Cell BE. Michael is a fellow of the IEEE, an IBM Master Inventor, and a member of the IBM Academy of Technology.

We'll introduce PowerAI and the S822LC for HPC. PowerAI is an optimized software stack for AI designed to take advantage of Power processor performance features, and in particular of the new NVLink interface between Power and the NVIDIA Tesla P100 GPU accelerator, first introduced with S822LC for HPC. We'll introduce performance enhancements of the PowerAI, including IBM Caffe with its performance optimization centered at enhance communications and other enhancements to frameworks, libraries, and the deep learning ecosystem for Power. With its high-performance NVLink connection, the new generation S822LC for HPC server is the first that offers a sweet spot of scalability, performance, and efficiency for deep learning applications. Together, these hardware and software enhancements enabled the first release of PowerAI to achieve best in industry training for Alexnet and VGGnet.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Tools and Libraries; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7369 - In the Blink of an Eye: High-Performance Analytics on NVIDIA GPUs

Partha Sen CEO, Fuzzy Logix
A passion for solving complex business problems using quantitative methods, data mining, and pattern recognition began as a hobby before leading Partha Sen to found Fuzzy Logix and develop its flagship product, DB Lytix, in 2007. Before Fuzzy Logix, Partha held senior management positions at Bank of America, where his achievements included leading the initiative to build a quantitative model-driven credit rating methodology for the commercial loan portfolio. In the portfolio strategies group, Partha led a team to devise strategies for hedging the credit risk for the bank's commercial loan portfolio and for minimizing the impact of mark-to-market volatility of the portfolio of hedging instruments. Prior to Bank of America, Partha held managerial positions at Ernst and Young and Tata Consultancy Services. He has a bachelor of engineering, with a major in computer science and a minor in mathematics, from the Indian Institute of Technology. He also has an MBA from Wake Forest.

By combining the enormous storage capacity of massively parallel processing (MPP) databases and Hadoop platforms with GPU-based analytics, a new architecture of fast Accelerated Analytics--with the ability to scale--will be discussed. We'll focus on benchmarks and early customer results in the use of high-performance, parallelized analytics on NVIDIA chips. We'll show how this GPU environment can be linked to the millions or billions of rows of data in databases or Hadoop clusters. We'll also cover solutions to business problems that were previously considered unsolvable using conventional CPU-based analytics. Use cases from finance, retail, healthcare, and manufacturing industries will be described.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Algorithms; Finance; Federal

Day: TBD
Time: TBD
Location: TBD

S7370 - Efficient Maximum Flow Algorithm and Applications

Hugo Braun Intern, NVIDIA
Hugo Bruan is a software engineer intern at NVIDIA, working on graph analytics applications on GPUs.
Nikolay Sakharnykh Senior Developer Technology Engineer, NVIDIA
Nikolay Sakharnykh is a senior developer technology engineer at NVIDIA, where he works on accelerating HPC and graph analytics applications on GPUs.

Maximizing data flow is one of the most important graph problems and has numerous applications across various computational domains: transportation networks, power routing, image segmentation, social network clustering, and recommendation systems. There are many efficient algorithms that have been developed for this problem, most of them trying to minimize computational complexity. However, not all these algorithms map well to massively parallel architectures like GPUs. We'll present a novel GPU-friendly approach based on the MPM algorithm that achieves from 5 to 20 times speedup over the state-of-the-art multithreaded CPU implementation from Galois library on general graphs with various diameters. We'll also discuss some real-world applications of the maximum flow problem in computer vision for image segmentation and in data analytics to find communities in social networks.

Level: All
Type: Talk
Tags: Accelerated Analytics; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7372 - Functional Safety: Towards the Development of ISO26262 Compliant GPU Applications

Richard Bramley GPU Architecture: Functional Safety Architect, NVIDIA
Richard Bramley joined NVIDIA last year working in the GPU compute architecture department in charge of functional safety for GPU products. Richard spent the first part of his career working as a silicon systems architect at ST Microelectronics in the areas of video compression, consumer electronics, mobile phone chipset architecture, and ADAS and functional safety products. He has a Ph.D. in signal processing architectures from the University of Birmingham in the U.K.

We'll consider what it means to be compliant with prevailing functional safety standards, and learn the basics of functional safety and how the prevailing standard, ISO26262, can apply to GPUs and GPU programming. Functional safety is an important consideration for many applications of GPU computing, autonomous driving, robotics, and medical are just a few examples. Very often the development of the application takes precedence, leaving functional safety considerations to the end of the development or even worse an afterthought. If functional safety is considered and planned from the start, the results can be better and the price tag much lower. We'll explain the support that NVIDIA has implemented inside GPUs for functional safety and the various tools and methodologies that are available to support ISO26262 compliance. One part of the talk will concentrate on hardware and systems, the other on software development.

Level: All
Type: Talk
Tags: Self-Driving Cars; Computer Vision and Machine Vision; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7373 - Deep Neural Networks for Non-Equilibrium Molecular Dynamics

Jonathan Belof Physicist, Lawrence Livermore National Laboratory
Jon Belof is a staff scientist, group leader, and ASC/PEM program leader at Lawrence Livermore National Laboratory, where he leads projects to develop predictive material models for use in multi-physics simulations. Jon was a postdoctoral fellow in high-energy density physics at LLNL from 2010-2011, where he designed and executed experiments to probe phase transformation and plasticity at high strain rate on the National Ignition Facility, the Omega laser, and at the Los Alamos Neutron Science Center. His prior postdoctoral work at the University of South Florida was focused on applications of quantum chemistry to materials-based chemical weapon detection. Jon's doctoral dissertation was on the development of statistical mechanics methods to better rationally design nanomaterials. Prior to his academic work, Jon served as a consultant and advisor to the U.S. defense and intelligence communities.
Edward W. Lowe, Jr. (Will) Senior Data Scientist , FitNow, Inc
Will is the senior data scientist at FitNow, the creators of the Lose It! weight loss mobile app. He is the inventor of the Snap It algorithm for food object recognition and calorie estimation from food images. Will was an NSF CyberInfratructure Fellow in Transformative Computational Sciences at Vanderbilt University from 2011-2014 where he directed the development of a GPU-accelerated machine learning suite for computer aided drug design and cheminformatics. He was a trainee in the NIH Integrative Training in Therapeutic Discovery Program from 2009-2011 at Vanderbilt University where he developed GPU-accelerated pharmacophore mapping algorithms which leveraged machine learning. Will's doctoral dissertation was an experimental and computational study of the radical mechanism of a quantum tunneling enzyme.

Molecular dynamics simulation of matter far from equilibrium presents one possible approach to the discovery of non-equilibrium constitutive relations but are limited to coarse-grained hamiltonians that include electronic effects only implicitly. We'll explore the possibility that deep neural networks -- when trained over the appropriate atomic states -- may provide the hamiltonian for a molecular dynamics simulation, thus providing a sub-grid representation of variables at spatial and temporal scales that cannot otherwise be explicitly resolved. The advent of GPU-accelerated training of deep neural networks, and specifically recent improvements to the CuDNN library, now makes it feasible to handle the large and high dimensional datasets incumbent to such systems. Finally, we'll elucidate a few of the challenges inherent in DNN-coupled dynamics, such as obeying the constraints of momentum and energy conservation.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Computational Physics; Computational Biology
Industry Segments: Government / National Labs; Higher Education / Research

Day: TBD
Time: TBD
Location: TBD

S7374 - The Next Evolution of Big Data: GPU Acceleration

Keith Kraus Senior Applied Solutions Engineer, NVIDIA
Keith Kraus is a senior applied solutions engineer for NVIDIA. Prior to NVIDIA, Keith was an associate principal engineer in the Accenture Cyber Security Lab. Over the past year, Keith has done extensive data engineering, systems engineering, and data visualization work in the cybersecurity domain. His main focus is on building a GPU-accelerated big data solution for advanced threat detection and cyber-hunting capabilities. Previously, Keith was a member of a research team that built a tool designed to optimally place automated defibrillators in urban environments. Keith graduated from Stevens Institute of Technology with a B.Eng. in computer engineering and an M.Eng. in networked information systems.
Joshua Patterson Applied Solutions Engineering Director , NVIDIA
Joshua Patterson is the Director of Applied Solution Engineering at NVIDIA and a former White House Presidential Innovation Fellow. Prior to NVIDIA, Josh worked with leading experts across public sector, private sector, and academia to build a next generation cyber defense platform. His current passions are graph analytics, machine learning, and GPU data acceleration. Josh also loves storytelling with data, and creating interactive data visualizations. Josh holds a B.A. in economics from the University of North Carolina at Chapel Hill and an M.A. in economics from the University of South Carolina Moore School of Business.

Traditionally, enterprises have used complex big data lambda architectures to handle massive quantities of data in a way that balances data latency and throughput. Learn how GPU-accelerated software has revolutionized big data and simplified the data architecture.

Level: Beginner
Type: Talk
Tags: Accelerated Analytics; Tools and Libraries; Federal

Day: TBD
Time: TBD
Location: TBD

S7375 - Petabyte Data Pipelines: Massively Distributed SQL Data Warehouse on GPUs

Rodrigo Aramburu CEO, BlazingDB
Rodrigo Aramburu started his career at Deloitte Consulting in New York. After building pension analytics and lifecycle management software for the public sector and financial services industry, Rodrigo decided to break out, move to Peru, and start an analytics services company, called Simply. In Peru, he led and built a business that was producing tools and advanced analytics and modeling services for multinationals and government institutions. BlazingDB was founded to support one of Simply's larger analytics engagements.
Felipe Aramburu CTO, BlazingDB
Felipe Aramburu is CTO at BlazingDB, a high-performance, massively distributed, GPU-accelerated database. Prior to BlazingDB, he built high-performance analysis platforms and applications, powered by massive datasets. Felipe tries to reinvent the wheel on a consistent basis, sometimes in an effort to annoy his younger brother and CEO, though ultimately this led to the birth of BlazingDB. He is more hacker/scientist than developer.

Scaling visual investigations is a tough problem. Analysts in areas like cyber security, anti-fraud, ML model tuning, and network operations are struggling to see their data and how it connects. We'll discuss where visual graph analytics gets used and how Graphistry is dramatically streamlining the analyst experience. For example, when using visual graph models for exploring security event logs, we can load events around an incident and quickly determine the root cause, scope, and progression. We'll demonstrate how we solve three technical aspects of scaling visual graph analysis: streamlining investigation workflows, visualizing millions of events in the browser, and fast analytics. Core to our approach, our platform connects GPUs in the client to GPUs on the server. The result is an investigation experience that feels like a ""Netflix for data"" and can be used by anyone with a browser.

Level: All
Type: Talk
Tags: Accelerated Analytics; Data Center and Cloud Computing; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7377 - Tooling to Containerize an OpenCL Visual Analytics Platform

Lee Butterman Infrastructure Engineer, Graphistry, Inc.
Lee Butterman has a background in computer science, and has worked for a dozen and a half years everywhere from academia to his college's LGBTA to industry.

Containerization is taking over software delivery for good reason, and NVIDIA Docker brings the same potential to GPU applications. We'll dive into how Graphistry leverages NVIDIA Docker to deliver its visual analytics platform through a combination of tooling and process. By combining docker best practices, NVIDIA Docker, and a new open source container infrastructure we developed, we dramatically streamlined our build/release/test process for both our cloud and on-premise deployments. We'll discuss the configuration for NVIDIA Docker, deployment validation scripts, co-locating GPU containers, friendly Jenkins-based control, and other practical steps. Our examples will center around enterprise and cloud design constraints such as air-gapped deployment.

Level: Beginner
Type: Talk
Tags: Tools and Libraries; Data Center and Cloud Computing

Day: TBD
Time: TBD
Location: TBD

S7378 - Deep Learning Approaches to Timeseries Data

Miro Enev Solution Architect, Deep Learning, NVIDIA
Miro Enev is driven by a passion for understanding how the human mind works. He is most excited about unlocking the potential of deep learning to solve the data-driven challenges faced by the clients he engages as a solution architect at NVIDIA. Miro has expertise in machine learning, software engineering, experimental design with smart sensors (IoT), computer security, and computational neuroscience. He studied cognitive science at UC Berkeley, neuroscience at Yale and the Univeristy of California at San Diego, and went on to a Ph.D. in machine learning and cybersecurity/IoT at the University of Washington.
Jeff Weiss Director, West Territory SAs, NVIDIA
Jeff Weiss is a Director that leads the West Territory SA team within the Solution Architecture & Engineering group at NVIDIA. Prior to joining NVIDIA, Jeff had a pedigree that included a 7 year stint at VMware as an EUC Staff Engineer, as well as spending time at Symantec and Sun Microsystems. Along with his current focus of NVIDIA GPU enabled computing solutions, his experience includes HPC, datacenter business continuity/disaster recovery solutions, software infrastructure identity management and email security/archiving tools.

Survey of successful deep learning (DL) applications within several domains featuring continuous streaming data [ time-series ]. Overview of what network architectures have yielded results and why these networks work. Network architectures reviewed included: RNNs (dynamic models and prediction), CNNs (for frequency transformed time series data, i.e., spectrograms), Autoencoders (anomaly detection and unsupervised data-structure visualization), and deep MLPs (sliding window event detection and classification). Example case studies: Industrial { Industrial Robotics, Automotive Telematics, Prognostics/Zero-Down-Time }, IoT { Event & Anomaly Detection, Information Leakage Attacks/Defenses }, Financial { Limit Books, Mortgage Risk Markets}.

Level: Intermediate
Type: Talk
Tags: Intelligent Machines and IoT; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7379 - V-PANE: Large-Scale, Real-Time Textured Surface Reconstruction From Lidar, GPS/IMU, and Video

Kerry Moffitt Scientist, Raytheon BBN Technologies
Kerry Moffitt developed real-time 3D entertainment games for a decade, and in 2005 turned his attention to serious games and information visualization. He likes bridging worlds. From 2008 to 2011, he served as technology integrator on a multi-disciplinary team that built a first-person damage control training game that every U.S. Navy recruit now plays in their basic training. He likes science. Ultimately, he wishes to enlighten and empower individuals with tools to generate, access, organize, and visualize information, because he believes that helps people understand themselves, each other, and the world, which in turn helps make a decent society work.

Dive deep into the design and implementation of V-PANE (Virtual Perspectives Augmenting Natural Experience), a 5-GPU system that performs truncated signed distance function (TSDF)-based 360-degree textured surface reconstruction using lidar with fused infrared and visible-spectrum video and renders the results in real time. The vehicle-mounted system updates its 3D model continuously, and simultaneously writes to disk all volume and texture data as the vehicle moves through the environment, reading parts of the model back into GPU memory when necessary. Learn about the system's bus bandwidth management and heavily optimized GPU activities: lidar resampling, lidar integration, pixel ray casting, video frustum projection, frame buffer assembly, and image compression and decompression.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7381 - Deep Learning for Human-Centered Semi-Autonomous Driving

Lex Fridman Postdoctoral Researcher, Massachusetts Institute of Technology (MIT)
Lex Fridman is a postdoc at MIT, working on computer vision and deep learning approaches in the context of self-driving cars with a human-in-the-loop. His work focuses on large-scale, real-world data, with the goal of building intelligent systems that have real world impact. Lex received his B.S., M.S., and Ph.D. from Drexel University, where he worked on applications of machine learning, computer vision, and decision fusion techniques in a number of fields, including robotics, active authentication, activity recognition, and optimal resource allocation on multi-commodity networks. Before joining MIT, Lex was at Google working on machine learning and decision fusion methods for large-scale behavior-based authentication.

We'll show how deep convolutional networks can be used to sense both the state of the driver and the external driving scene to achieve a safe semi-autonomous driving experience. At the core of our talk is a demonstration of using an NVIDIA DGX-1 and NVIDIA DRIVE PX 2 to train and run, respectively, a deep end-to-end network that takes the visual scene inside and outside the car as input and produces shared-control decisions as output. The demo presents a case study of a distracted driver in imminent danger and how an intelligent shared autonomy system can step in to determine a safe trajectory that avoids the danger. Lastly, we show the challenges of semi-autonomous driving and how deep learning can help solve those challenges with both decoupled sensing-planning approaches and end-to-end learning approaches.

Level: All
Type: Talk
Tags: Self-Driving Cars; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7382 - GPUs Unleashed: Analysis of Petascale Molecular Simulations with VMD

John Stone Senior Research Programmer, University of Illinois Urbana-Champaign
Highly-Rated Speaker
John Stone is a senior research programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and associate director of the NVIDIA CUDA Center of Excellence at the University of Illinois. John is the lead developer of VMD, a high-performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. John was awarded as an NVIDIA CUDA Fellow in 2010. In 2015, he joined the Khronos Group Advisory Panel for the Vulkan Graphics API. He also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing.

We'll showcase recent successes in the use of GPUs to accelerate challenging molecular simulation analysis tasks on the latest NVIDIA® Tesla® P100 GPUs on both Intel and IBM/OpenPOWER hardware platforms, and large-scale runs on petascale computers such as Titan and Blue Waters. We'll highlight the performance benefits obtained from die-stacked memory on the Tesla P100, the NVIDIA NVLink# interconnect on the IBM "Minsky" platform, and the use of NVIDIA CUDA® just-in-time compilation to increase the performance of data-driven algorithms. We will present results obtained with OpenACC parallel programming directives, current challenges, and future opportunities. Finally, we'll describe GPU-accelerated machine learning algorithms for tasks such as clustering of structures resulting from molecular dynamics simulations.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing; Accelerated Analytics; Computational Chemistry

Day: TBD
Time: TBD
Location: TBD

S7383 - Accelerating Cyber Threat Detection with GPU

Joshua Patterson Applied Solutions Engineering Director , NVIDIA
Joshua Patterson is the Director of Applied Solution Engineering at NVIDIA and a former White House Presidential Innovation Fellow. Prior to NVIDIA, Josh worked with leading experts across public sector, private sector, and academia to build a next generation cyber defense platform. His current passions are graph analytics, machine learning, and GPU data acceleration. Josh also loves storytelling with data, and creating interactive data visualizations. Josh holds a B.A. in economics from the University of North Carolina at Chapel Hill and an M.A. in economics from the University of South Carolina Moore School of Business.

Analyzing vast amounts of enterprise cyber security data to find threats is hard. Cyber threat detection is also a continuous task, and because of financial pressure, companies have to find optimized solutions for this volume of data. We'll discuss the evolution of big data architectures used for cyber defense and how GPUs are allowing enterprises to do better threat detection more efficiently. We'll discuss (1) briefly the evolution of traditional platforms to lambda architectures with new approaches like Apache Kudu to ultimately GPU-accelerated solutions; (2) current GPU-accelerated database, analysis, and visualization technologies (such as Kinetica and Graphistry), and discuss the problems they solve; (3) the need to move beyond traditional table-based data-stores to graphs for more advanced data explorations, analytics, and visualization; and (4) the latest advances in GPU-accelerated graph analytics and their importance all for improved cyber threat detection.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Deep Learning and AI; Federal

Day: TBD
Time: TBD
Location: TBD

S7387 - Speeding Up Conjugate Gradient Solvers by 10x

Mathias Wagner Sr. Developer Technology Engineer, NVIDIA
Mathias Wagner is a member of NVIDIA's European developer technology team working on high performance computing and scientific applications. Before joining NVIDI,A he worked as a postdoc in high-energy physics in Europe and the U.S., focusing on lattice quantum chromodynamics simulations using GPUs. Mathias holds a Ph.D. in theoretical physics from Darmstadt University of Technology.

On the path to exascale, high performance computing adapts wider and wider processors that need more parallelism. The energy required to move data and the available bandwidth pose significant challenges. See how an efficient implementation of iterative Krylov solvers can help deal with these issues. As an example, we the block conjugate gradient solver in QUDA, a library for lattice quantum chromodynamics. We demonstrate how an efficient implementation can overcome scaling issues and achieve a 10X speedup compared to a regular conjugate gradient solver.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7388 - Developing an Improved Generalized Eigensolver with Limited CPU Offloading

Joshua Romero Graduate Student, Stanford University
Joshua Romero is a graduate student at Stanford University. His graduate research includes the numerical analysis of high-order computational methods for computational fluid dynamics simulations and the development of software to perform simulations using these techniques on modern computing hardware, with emphasis on GPUs. His recent work has focused on the development of general scientific computing applications for GPU hardware.

We'll explore strategies to reduce CPU dependencies within existing hybrid CPU/GPU LAPACK routines, such as those implemented with the open-source MAGMA library. This will be carried out within the context developing an improved generalized eigensolver, written in CUDA Fortran for the open-source Quantum ESPRESSO library. The solver aims to replace offloaded subblock CPU computations within the existing hybrid algorithms with GPU resident subblock computations to limit dependencies on available CPU resources. Performance considerations and strategies used in developing the solver, including the use of profiling tools available within the CUDA toolkit will be covered. Additionally, we'll provide an example developing software using CUDA Fortran.

Level: Intermediate
Type: Talk
Tags: Algorithms; Computational Physics; Performance Optimization; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7390 - GPU-Accelerated VDI for Car Design Environments

Masashi Okubo Group Leader, Honda R&D Co., Ltd.
Masashi Okubo is a large project leader on Honda's Next-Gen EWS project. Masashi leads evolution and optimization of the engineering workstation environment, which is accelerated for CAD/CAE/PLM solutions for all of Honda's designers, engineers, and related company engineers. He has over 20 years of experience in optimization, computer simulation, and system development/project management for the automobile development process for stylists and designers.
Hiroshi Konno Assistant Project Leader, Honda R&D Co., Ltd.
Hiroshi Konno is an assistant leader of this project. He has over twelve years experience in infrastructure administrator in workstation engineering. He deployed a vDGA environment in 2014. For this project he was responsible for infrastructure construction, and deployment of the GRID vGPU environment.
Yuma Takahashi CAD Administrator, Honda R&D Co., Ltd.
Yuma participated in this project as administrator of CAD applications for automobile development. She has over 5 years of experience as an administrator of CAD applications. Her interest include optimized engineering workstation environments.

Honda's evolutionary new project—internally called the "Next-gen Engineering Workstation (EWS) Project"—is designed to optimize usage of our CAD-VDI environment for R&D offices and factories. The project's challenges are to move from the existing physical EWS and pass-through VDI environments to an NVIDIA GRID™ vGPU environment. All while improving user density (CCU/server), usage monitoring, resource optimization for designers, and flexible resource reallocation. Honda successfully deployed more than 4,000 concurrent CAD-VDI users in its initial phase, with aggressive plans to further increase utilization. This session will review the project's challenges and Honda's future vision.

Level: All
Type: Talk
Tags: Manufacturing Industries; Graphics Virtualization

Day: TBD
Time: TBD
Location: TBD

S7391 - Turbocharging VMD Molecular Visualizations with State-of-the-Art Rendering and VR Technologies

John Stone Senior Research Programmer, University of Illinois Urbana-Champaign
Highly-Rated Speaker
John Stone is a senior research programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and associate director of the NVIDIA CUDA Center of Excellence at the University of Illinois. John is the lead developer of VMD, a high-performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. John was awarded as an NVIDIA CUDA Fellow in 2010. In 2015, he joined the Khronos Group advisory panel for the Vulkan graphics API. He also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing.

State-of-the-art molecular simulations pose many challenges for effective visualization and analysis due to their size, timescale, and the growing complexity of the structures under study. Fortunately, a panoply of new and emerging technologies can address these challenges. We'll describe our experiences and progress adapting VMD, a widely used molecular visualization and analysis tool, to exploit new rasterization APIs such as EGL and Vulkan, and the NVIDIA OptiX(TM) ray tracing API for interactive, in-situ, and post-hoc molecular visualization on workstations, clouds, and supercomputers, highlighting the latest results on IBM POWER hardware. Commodity VR headsets offer a tremendous opportunity to make immersive molecular visualization broadly available to molecular scientists, but they present many performance challenges for both rasterization- and ray tracing-based visualization. We'll present results from our ongoing work adapting VMD to support popular VR HMDs.

Level: Intermediate
Type: Talk
Tags: In-Situ and Scientific Visualization; Virtual Reality and Augmented Reality; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7393 - Maximizing GPU Throughput Across Multiple Streams - Tips and Tricks

Chuck Seberino Principal Software Engineer, GPU Sequencing, Roche Sequencing Solutions
Chuck Seberino is a principal software engineer at Roche Sequencing Solutions, where he tackles big data problems processing genomic data in real time. Previously, he was at Complete Genomics doing similar work on its high-throughput sequencing. Prior to Chuck's time in life sciences, he worked for government, defense, and robotics companies, including Raytheon Missile Systems and Silicon Graphics. He has developed software for graphics, visual simulation, and GPGPU applications for over 20 years. Chuck has refocused his GPU and HPC expertise into the life sciences space, where he is pursuing an M.S. in bioinformatics at Stanford. He holds a degree in electrical engineering from the University of Arizona.

Efficiently utilizing one or more GPUs involves finding the right balance in three areas of CUDA programming: data movement, hardware architecture, and multi-level parallelism. CUDA Streams can be a powerful way to increase processor throughput if you can manage them properly. We'll go through some use case examples, synchronization pitfalls, and profiler cases to help identify ways to speed up your application.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Tools and Libraries; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7396 - Zen and the Art of vGPU Selection

Jeremy Main Lead Solution Architect - GRID, NVIDIA
Jeremy Main is the Lead Solution Architect for NVIDIA GRID working with manufacturing companies in Japan and APAC to assist in their adoption and deployment of vGPU-accelerated desktops and applications. He is also the developer of the configuration and resource monitoring tool GPUProfiler that can be used to provide data-driven insight into how applications and user workflows use GPU and system resources in both physical and virtual environments. Before joining NVIDIA, Jeremy lead the development of several 3D virtualization products, graphics remoting for RDS and virtual environments as well as 3D CAD software kernel development. Jeremy received his Bachelor of Science from the University of Utah.

We'll discuss how to use application resource profiling in physical workstations or virtual machines to determine the best GPU, profile, and edition for customer workloads. Actual application profiling data will be used to demonstrate how profile selection is not just about frame buffer alone. The main tool used for profiling will be GPUProfiler; NVIDIA-smi will monitor the selected profile's fitness for the workload.

Level: Advanced
Type: Talk
Tags: Graphics Virtualization

Day: TBD
Time: TBD
Location: TBD

S7397 - GPU-Accelerated Deep Learning Framework for Cyber-Enabled Manufacturing

Adarsh Krishnamurthy Assistant Professor, Iowa State University
Adarsh Krishnamurthy is an assistant professor in the mechanical engineering department at Iowa State University, where he currently leads the Integrated Design and Engineering Analysis (IDEA) lab. Prior to this, he was a post-doctoral researcher in the bioengineering department at UC San Diego. He received his Ph.D. in Mechanical Engineering from UC Berkeley and his Bachelors and Masters from Indian Institute of Technology, Madras. His research interests include computer-aided design (CAD), GPU and parallel algorithms, cyber-enabled manufacturing, biomechanics, patient-specific heart modeling, solid mechanics, computational geometry, and ultrasonic non-destructive testing. He has had more than 10 years experience in developing GPU algorithms for interactive mechanical CAD that includes spline evaluations, surface intersections, minimum distance computations, volume integration, etc.
Aditya Balu Ph.D. Student, Iowa State University
Aditya Balu is a Ph.D. student at Iowa State University, working under joint supervision of Dr. Adarsh Krishnamurthy and Dr. Soumik Sarkar. Previously, he worked at FMC Technologies, Inc., for two years. His research interests include design for manufacturing, machine learning, and high performance computing applications to solve mechanical engineering problems. He received his undergraduate degree in 2014 from Birla Institute of Technology and Sciences, Pilani.

We'll present a GPU-accelerated deep-learning framework for cyber-manufacturing, which enables real-time feedback to designers regarding the manufacturability of a computer-aided design model. We'll talk about a 3D-convolutional neural network-based approach for learning the manufacturability of a mechanical component. The 3D-CNN can recognize the features in a CAD model and classify it to be manufacturable or non-manufacturable with a greater accuracy than traditional rule-based methods. We'll discuss a novel GPU-accelerated voxelization algorithm used to discretize the CAD model and prepare it for deep learning. We'll briefly outline the challenges in training a 3D-CNN using complex CAD models on a GPU (NVIDIA TITAN X) with limited memory. Finally, we'll touch upon different methods to extend the framework to other manufacturing processes, such as additive manufacturing and milling.

Level: Intermediate
Type: Talk
Tags: Intelligent Machines and IoT; Deep Learning and AI; Computer Aided Engineering; Manufacturing Industries; Computational Fluid Dynamics

Day: TBD
Time: TBD
Location: TBD

S7400 - GPU-Cloud Photorealistic Rendering for the Next Generation of Cloud CAD Tools

Miguel Arias CEO, Prefixa
Miguel Arias is founder of Prefixa, a high tech company focused on 3D visualization solutions. Miguel has been working with computer vision and 3D technology for more than 20 years. He holds a Ph.D. in Electrical Engineering in Laval University (Quebec, Canada), and a Engineer and M.E. in Electronics from Guanajuato University (Mexico).

We'll introduce OneRender, a photorealistic elastic cloud solution for accelerated rendering. OneRender is connected with Onshape.com, a leading cloud CAD solution. We'll present a general overview of these two platforms, and explain how they connect to each other. We'll talk about the challenges and solutions to communicate complex geometries from Onshape CAD format to OneRender format, and continuously maintain consistency with any change in the former. OneRender core engine is built on top of the NVIDIA OptiX framework for ray tracing and GPU based acceleration. Furthermore, OneRender has the capability of launching multiple GPU clouds in parallel to accelerate rendering process. Then, we will give an overview of the GPU usage vs user arrival and workload growth. Finally, some examples of real-world CAD designs and photorealistic renderings visualizations will be shown.

Level: All
Type: Talk
Tags: Rendering and Ray Tracing

Day: TBD
Time: TBD
Location: TBD

S7401 - Daino: A High-level Framework for Parallel and Efficient AMR on GPUs

Mohamed Wahib Postdoctoral Researcher, RIKEN Advanced Institute for Computational Science
Mohamed Wahib is a postdoctoral researcher in the HPC Programming Framework Research Team at RIKEN Advanced Institute for Computational Science. Mohamed joined RIKEN AICS in 2012 after he received a Ph.D. in computer science from Hokkaido University, Japan. Prior to his graduate studies, he worked as a researcher at Texas Instruments R&D for four years. Mohamed's research is focused on accelerators and data-centric programming models.

We'll present a high-level framework for producing parallel and efficient adaptive mesh refinement code on GPU-accelerated supercomputers. AMR methods reduce computational requirements of problems by increasing resolution for only areas of interest. However, in practice, efficient AMR implementations are difficult, considering that the mesh hierarchy management must be optimized for the underlying hardware. Architecture complexity of GPUs can render efficient AMR to be particularity challenging in GPU-accelerated supercomputers. We'll present a compiler-based, high-level framework that can automatically transform serial uniform mesh code annotated by the user into parallel adaptive mesh code optimized for GPU-accelerated supercomputers. We show experimental results on three production applications. The speedups of code generated by our framework are comparable to hand-written AMR code while achieving good strong and weak scaling up to 3,640 GPUs.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing; Programming Languages; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7402 - Transfer Learning for Attractions Detection and Travel Industry

Artem Semyanov CEO, dpllab.com
Artem Semyanov is a data science team lead at Find Attractions Inc (dpllab.com, travellisabot.com). He uses deep convolutional nets to classify user-generated images of city objects with small training data samples and recurrent neural nets for automatic image captioning with style change. Previously, he was performing quantitative research at University College London and Higher School of Economics Moscow, implementing vector auto regressions to analyze dynamic equilibrium on commodities and stock exchange markets.

We'll discuss aspects of working with low examples training datasets of man made objects and other famous tourist attractions with classifier adaptation to user-generated pictures. For example, choosing alternative last layer for Inception v3 convolutional net and transfer learning has been applied. Also, we investigate a number of aspects of building dialog systems for the travel industry.

Level: Beginner
Type: Talk
Tags: Deep Learning and AI; Tools and Libraries
Industry Segments: Media & Entertainment

Day: TBD
Time: TBD
Location: TBD

S7404 - An Approach to a High-Performance Decision Tree Optimization Within a Deep Learning Framework

Yigal Jhirad Head of Quantitative and Derivatives Strategies , Cohen & Steers
Yigal D. Jhirad is senior vice president and a portfolio manager and director of Quantitative and Derivatives Strategies at Cohen & Steers, where he also heads the Investment Risk Committee. Prior to joining the firm in 2007, Yigal was an executive director in the institutional equities division of Morgan Stanley, where he headed the company's portfolio and derivatives strategies effort. He was responsible for developing, implementing, and marketing quantitative and derivatives risk-managed products to a broad array of institutional clients, including hedge funds, active and passive funds, pension funds, and endowments. Yigal holds a B.S. from the Wharton School. He is a financial risk manager, as certified by the Global Association of Risk Professionals.
Blay Tarnoff Senior Application Developer and Database Architect, Cohen & Steers
Blay A. Tarnoff is a senior applications developer and database architect, specializing in array programming and database design and development. Blay has developed equity and derivatives applications for program trading, proprietary trading, quantitative strategy, and risk management. He is currently a consultant at Cohen & Steers and was previously at Morgan Stanley.

We'll examine an innovative approach using an optimized algorithm to create a decision tree for the basis of regime dependent and pattern classification of financial and macroeconomic time-series data. Implemented in a supervised and unsupervised learning framework, the algorithm relies on the GPU for high performance computing and the host processor to further integrate the results in a deep learning framework. Also, we implement random number generation, in part, using a hardware quantum based true random number generator, balanced with the pseudo-random number generator in CUDA, so as to optimize overall speed where an exhaustive search is not feasible.

Level: All
Type: Talk
Tags: Finance; Accelerated Analytics; Algorithms; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7405 - Bifrost: A Python/C++ Framework for Easy High-Throughput Computing

Miles Cranmer Research Assistant, Harvard-Smithsonian Center for Astrophysics
Miles Cranmer is a physics undergraduate at McGill University and research assistant in Lincoln Greenhill's research group at the Harvard-Smithsonian Center for Astrophysics, specializing in software instrumentation for radio telescopes. He loves astroinformatics and machine learning, and has a profound interest in singularities - both black holes and superintelligence alike.

Bogged down trying to build a fast GPU processing pipeline? We'll present a solution: Bifrost, a framework for rapidly composing real-time data collection and analysis pipelines. Real-time data processing lies at the heart of most modern radio telescopes, and while hardware capabilities and data collection rates advance to the petascale regime, development of efficient real-time processing codes remains difficult and time-consuming. Bifrost solves this problem by combining a TensorFlow-like Python API with a library of common algorithms and highly efficient data transport. We'll describe the design and implementation of this framework, and demonstrate its use as the backend for a large radio telescope.

Level: Intermediate
Type: Talk
Tags: Astronomy and Astrophysics; Tools and Libraries
Industry Segments: Higher Education / Research; Government / National Labs

Day: TBD
Time: TBD
Location: TBD

S7406 - Targeted Sequencing for All on S5: GPUs Make It Happen

Mohit Gupta Senior Staff Software Engineer, Thermo Fisher Scientific
Highly-Rated Speaker
Mohit Gupta is a senior staff software engineer in the Clinical Sequencing Division of Life Sciences Solutions, a part of Thermo Fisher Scientific, Inc. He is responsible for development and acceleration of algorithms used in data analysis of PGM, Proton, S5, and S5XL DNA sequencers, with a particular focus on GPU computing. Previously, Mohit worked as senior research and development engineer with Mirafra Technologies, Bangalore, India, in the area of electronic design automation working on compiler for hardware description languages like Verilog. He holds a bachelor's degree in electrical engineering from the Indian Institute of Technology, Bombay, India, and an M.S. in computer engineering from the University of California, San Diego. He has published or presented in conferences and workshops like ICCAD, GTC, and DFMY.

See how GPUs are playing a central role in making advances in Ion Torrent's targeted sequencing workflow. We'll talk about the S5 DNA sequencer from Ion Torrent that is enabling democratization of sequencing market and accelerating research in precision medicine at a breathtaking pace with the help of GPUs. We'll highlight our work in liquid biopsy and non-invasive prenatal testing and how the breadth in technology offering in semiconductor chips gives us the scale of sequencing from small panels to exomes. We'll give an overview of our analysis pipeline and the latest and greatest in algorithm development and acceleration on GPUs as well as our experiences ranging from Fermi to Pascal GPU architectures.

Level: All
Type: Talk
Tags: Computational Biology; Healthcare and Life Sciences

Day: TBD
Time: TBD
Location: TBD

S7407 - Scaling Investigations: Next-Generation Visual Graph Analytics through GPU Cloud Streaming

Leo Meyerovich CEO, Graphistry, Inc.
Leo Meyerovich co-founded Graphistry, Inc. in 2014 to scale investigations through intelligent visual graph analysis. Part of Graphistry's approach is to connect browsers to GPU clusters, which builds upon the founding team's work at UC Berkeley on the first parallel web browser, declarative GPU-accelerated data visualization, and program synthesis. Leo's most referenced research is in language-based security. Previously, he worked on the first functional reactive web language (OOPSLA best paper, NSF GRFP), parallel web browser (PLDI, Qualcomm Innovation Fellow), and sociological foundations of programming languages (OOPSLA best paper, SIGPLAN annual highlight).

Scaling visual investigations is a tough problem. Analysts in areas like cyber security, anti-fraud, ML model tuning, and network operations are struggling to see their data and how it connects. We'll discuss where visual graph analytics gets used and how Graphistry is dramatically streamlining the analyst experience. For example, when using visual graph models for exploring security event logs, we can load events around an incident and quickly determine the root cause, scope, and progression. We'll demonstrate how we solve three technical aspects of scaling visual graph analysis: streamlining investigation workflows, visualizing millions of events in the browser, and fast analytics. Core to our approach, our platform connects GPUs in the client to GPUs on the server. The result is an investigation experience that feels like a ""Netflix for data"" and can be used by anyone with a browser.

Level: All
Type: Talk
Tags: Accelerated Analytics; Data Center and Cloud Computing; Federal

Day: TBD
Time: TBD
Location: TBD

S7411 - How Advances in Deep Learning and Computer Vision Can Empower the Blind Community

Anirudh Koul Senior Data Scientist, Microsoft
Anirudh Koul is a senior data scientist at Microsoft. Anirudh brings a decade of production-oriented applied research experience on petabyte-scale social media datasets. An entrepreneur at heart and driven by innovation, he runs a mini-startup team within Microsoft, prototyping ideas using deep learning techniques for social good. Anirudh has worked on a variety of machine learning, natural language processing, deep learning, computer vision, and scalability-related projects at Yahoo, Microsoft, and Carnegie Mellon University. A regular at hackathons, he has won close to three dozen awards, including top-three finishes for three years consecutively in the world's largest private hackathon with 16,000 participants. He has also been invited to showcase some of his recent work at a White House AI event, HBO, and National Geographic, and also to the Prime Minister of Canada.
Saqib Shaikh Senior Software Engineer, Microsoft
Saqib Shaikh is a senior software engineer at Microsoft, where he has worked for 10 years. Saqib has developed a variety of internet-scale services and data pipelines powering Bing, Cortana, Edge, MSN, and various mobile apps. Being blind, Saqib is passionate about accessibility and universal design. He serves as an internal consultant for teams including Windows, Office, Skype, and Visual Studio, and has spoken at several international conferences. Saqib has won three Microsoft hackathons in the past year. His interests focus on the intersection between AI and HCI and the application of technology for social good.

Motivated by making technology more accessible, we'll explore how deep learning can enrich image understanding that can, in turn, enable the blind community to experience and interact with the physical world in a more holistic manner than has ever been possible before. Going beyond research papers, we'll explore object recognition, image captioning, and visual question answering in the real world, along with practical tricks to train and squeeze some of these models for inference on mobile phones. By the end of the session, you'll develop intuition about what works and what doesn't, understand the practical limitations during development, and know how to use these techniques for your own applications.

Level: Beginner
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Computer Vision and Machine Vision

Day: TBD
Time: TBD
Location: TBD

S7412 - Towards Scene Understanding Under Challenging Illumination Conditions for ADAS Systems

Srinivas Kruthiventi S S Senior Software Engineer I , Harman International Industries
Srinivas Kruthiventi S S is a senior software engineer working in the research division of Harman International Industries, India. He received his M.S. from the Indian Institute of Science (IISc), Bangalore, with research work in the areas of computer vision and deep learning in 2016. Srinivas received his bachelor's degree in engineering physics from the Indian Institute of Technology Madras (IIT M) in 2012. He completed a research internship on speech processing at the International Institute of Information Technology Hyderabad (IIIT-H). His research interests include deep learning, machine learning, and artificial intelligence.
Pratyush Sahay Software Engineer, HARMAN International Industries
Pratyush Sahay is a software engineer working in the advanced technologies division at Harman International Industries, India, since 2013. Pratyush completed his master's degree from the Indian Institute of Technology, Madras (IIT-M), with research specialization in the areas of computer vision and 3D geometry and photometry. His research interests include deep learning, machine learning, and computer vision.

We'll provide insights leading to visual scene understanding under challenging illumination conditions using deep learning techniques accelerated over NVIDIA GPUs. We'll discuss our approaches to: (1) mitigate severe illumination challenges posed in a poorly lit environment, (2) perform object detection effectively in such scenarios, and (3) use GPU acceleration to achieve reasonable throughput for ADAS systems. Finally, we'll present compelling results achieved by our system on a publicly available low-light benchmark dataset.

Level: Intermediate
Type: Talk
Tags: Self-Driving Cars; Deep Learning and AI; Computer Vision and Machine Vision

Day: TBD
Time: TBD
Location: TBD

S7413 - High-Performance Machine Learning for Weather Prediction Applications

Hatem Ltaief Senior Research Scientist, KAUST
Highly-Rated Speaker
Hatem Ltaief is a senior research scientist in the Extreme Computing Research Center at KAUST, where is also advising several KAUST students in their M.S. and Ph.D. research. Hatem's research interests include parallel numerical algorithms, fault-tolerant algorithms, parallel programming models, and performance optimizations for multicore architectures and hardware accelerators. His current research collaborators include Aramco, Total, Observatoire de Paris, Cray, NVIDIA, and Intel. Hatem received his engineering degree from Polytech Lyon at the University of Claude Bernard Lyon I, France, an M.S. in applied mathematics at the University of Houston, and a Ph.D. in computer science from the University of Houston. From 2008 to 2010, he was a research scientist in the Innovative Computing Laboratory in the Department of Electrical Engineering and Computer Science at the University of Tennessee, Knoxville.

Learn how statistical modeling is revolutionizing weather/climate prediction applications. Such models offer high fidelity in theory and are increasingly viewed as potential replacements to actual simulations. The main drawbacks of such models are the expensive number of flops and the overhead of the memory footprint due to computations resulting from the large dense covariance matrix, which makes it unrealistic in practice. By exploiting the low rank structure of the matrix and redesigning the underlying linear algebra in terms of batch operations, the fidelity of the model is not only maintained but also the corresponding performance achieved on GPUs is unprecedented. Low-rank matrix computations on GPUs boosts existing machine learning algorithms for weather prediction applications and opens new research directions.

Level: Intermediate
Type: Talk
Tags: Algorithms; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7415 - Enhance Multi-Contrast MRI Reconstruction for Improved Diagnosis with Deep Learning Powered by NVIDIA GPUs

Enhao Gong PhD Candidate, Stanford University
Enhao Gong is a Ph.D. candidate at Stanford University. His work focuses on medical imaging, image processing, machine learning, computer vision, and optimization. Enhao's recent research interest is in applying deep learning in compressed sensing image reconstruction.

Advanced computation powered by GPUs is changing the clinical decision-making process. We'll present an exciting example of using NVIDIA GPUs for multi-contrast magnetic resonance imaging exams. Neurological disorders result in great clinical challenges and high societal burdens. Multi-contrast MRI exams are frequently used for diagnosis because the various tissue contrasts provides complementary diagnosis information to distinguish normal tissue from pathology. However, the cost of acquiring these multiple sequences is extensive scanning time, which significantly increases both the diagnosis cost and patients' discomfort and limit the acquired image quality. We'll propose a new approach to accelerate multi-contrast imaging using a deep learning approach powered by GPUs. Validated on both patients and healthy subjects, we'll demonstrate that we can significantly reduce scanning time while improving image resolution and quality and preserving the diagnostic information.

Level: All
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Video and Image Processing; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7417 - GPU Acceleration of Monte Carlo Simulation for Capital Markets and Insurance

Serguei Issakov Global Head of Quantitative Research and Development, Senior Vice President , Numerix
Serguei Issakov is global head of Quantitative Research and Development at Numerix, where he oversees quantitative research and development globally, including pricing, market and counterparty risk, funding, capital, margin, and valuation adjustments in all asset classes: fixed income, inflation, credit, equities, FX, commodities, and hybrids. Previously, Serguei held research positions in theoretical physics at the Nordic Institute for Theoretical Physics in Copenhagen, the University of Paris (Laboratory of Theoretical Physics and Statistical Models), the University of Oslo, and the Center for Advanced Study in Oslo. He has published over 40 papers in mathematics and theoretical physics and is a co-author of the Issakov-Ouvry-Wu equations in fundamental quantum statistical mechanics. He holds a Ph.D. in theoretical and mathematical physics from Moscow Institute of Physics and Technology, from the Theory Group led by Physics Nobel Laureate Vitaly Ginzburg.

Learn about CUDA-based GPU acceleration of Monte Carlo simulations in the financial industry for pricing, risk management, and regulatory calculations. We'll provide an overview of three use cases. (1) Pricing with tens or hundreds of thousands of Monte Carlo "paths" or scenarios, depending on complexity of the financial instrument. (2) For new international regulatory capital requirements introduced in January 2016 and also for new margin requirements that are in effect since September 2016, we'll discuss calculation of cost of capital and margin throughout the life of a portfolio which requires nested Monte Carlo simulation. (3) Since the insurance industry uses a smaller number of Monte Carlo paths for pricing, we'll consider other approaches to take advantage of GPU acceleration, such as grouping similar policies together and policy code optimizations. We stress the importance of NVLink for accelerating the pricing of insurance policies.

Level: All
Type: Talk
Tags: Finance

Day: TBD
Time: TBD
Location: TBD

S7418 - Low-Communication FFT with Fast Multipole Method

Cris Cecka Senior Research Scientist, NVIDIA
Cris Cecka joined NVIDIA Research in 2015 to combine and deploy his interests in developing advanced numerical algorithms and software. Previously, Cris worked at the new Institute for Applied Computational Science at Harvard University as a lecturer and research scientist, where he developed courses on parallel computing and robust software development for scientific computing. He also worked in the Mathematics Department at the Massachusetts Institute of Technology as a research associate, where focused on developing and applying integral equation methods and generalized N-body problems using hierarchical methods. He received his Ph.D. from Stanford University in computational and mathematical engineering in 2011.

We'll review a successful method for accelerating the 1D FFT by reducing the amount of communication required to be performed. The resulting method discards nearly two-thirds of the communication in exchange for the application of many hierarchical structured dense matrices, which can be applied efficiently via the fast multipole method (FMM). This FMM is formulated to be maximally computationally efficient on modern architectures and require little auxiliary space and data. We'll review the formulation, stages of computation, free parameters, and heuristics for choosing them, and efficient implementation strategies for an optimized FMM-FFT distributed across many GPUs. We'll present results obtained on up to eight Telsa P100 GPUs that show 1.2-2.2x speedup over the distributed 1D FFT provided by CUFFTXT 8.0.

Level: Intermediate
Type: Talk
Tags: Algorithms; Performance Optimization; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7420 - Deep Learning of Cancer Images for Precision Medicine

Olivier Gevaert Assistant Professor, Stanford University
Olivier Gevaert is an assistant professor at Stanford University focusing on developing machine-learning methods for biomedical decision support from multi-scale biomedical data. He is an electrical engineer by training with additional training in artificial intelligence, and a Ph.D. in bioinformatics at the University of Leuven, Belgium. He continued his work as a postdoc in radiology at Stanford and started his lab at the Department of Medicine, Biomedical Informatics. His lab focuses on multi-scale biomedical data fusion primarily in oncology and neuroscience.

We'll demonstrate a deep learning framework to predict survival of lung cancer patients by using convolutional networks to learn high-dimensional representations of tumor phenotypes from CT images and clinical parameters. We'll evaluate our framework from three independent cohorts with survival data, and show how the addition of clinical data improves performance. Furthermore, we'll describe how image noise can improve the robustness of our model to delineation errors and introduce the concept of priming, which helps improve performance when trained on one cohort and tested on another.

Level: All
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Video and Image Processing; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7422 - Real and Virtual Proving of Automated Driving in Berlin's Mixed Traffic

Ilja Radusch Director Smart Mobility, Fraunhofer FOKUS
Ilja Radusch is head of the Daimler Center for Automotive Information Technology Innovations - an Institute of Daimler AG at the University of Technology in Berlin - and director of the competence center for Automotive Services and Communication Technologies at the Fraunhofer-Institute FOKUS. Ilja works in the fields of collaborative mobility, especially connected and automated driving, security, simulation, and validation of location-based applications. He is responsible for various large-scale projects with industry partners as well as national and international research projects and consulting industry and public authorities in the area of connected and automated mobility.

Validating automated driving in city traffic requires a new approach that combines traditional proving by collecting lots of data from real vehicles with virtual proving for the software and its base data. City traffic raises a number of unique challenges that are hardly solved without machine learning. For training and validation of ML components, lots of high-quality ground truth data is required. Additional traditional software components often rely on specific base data - like HD map data - that requires constant in-field validation. We developed a comprehensive tool suite to generate, collect, segment, and label huge amounts of sensor data for algorithm validation or as ground truth for machine learning.

Level: All
Type: Talk
Tags: Self-Driving Cars; Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7423 - Community Detection on the GPU

Antonino Tumeo Research Scientist, Pacific Northwest National Laboratory
Highly-Rated Speaker
Antonino Tumeo has been a research scientist in the Pacific Northwest National Laboratory's High Performance Computing group since 2011. Antonino joined PNNL in 2009 as a postdoctoral research associate. Previously, he was a postdoctoral researcher at Politecnico di Milano. His research interests are modeling and simulation of high-performance architectures, hardware-software codesign, FPGA prototyping, and GPGPU computing. He received his M.S. in informatic engineering in 2005 and Ph.D. in computer engineering in 2009 from Politecnico di Milano in Italy.
Mahantesh Halappanavar Research Scientist, Pacific Northwest National Laboratory
Mahantesh Halappanavar is a senior research scientist in the Advanced Computing, Mathematics, and Data Division at the Pacific Northwest National Laboratory. Mahantesh is a member of the Data Sciences Group and led the Data Analytics team. His research interests broadly include parallel graph algorithms, data-intensive computing, scientific computing, high performance computing, and machine learning. He was a member of the High Performance Computing group from 2009 to 2014. Mahantesh is also an adjunct faculty member in the department of computer science at Old Dominion University, and formerly with the department of computer science at Washington State University.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Algorithms; Federal

Day: TBD
Time: TBD
Location: TBD

S7424 - Introduction and Techniques with NVIDIA Voxels

Rama Hoetzlein Graphics Research Engineer, NVIDIA
Rama Hoetzlein is the lead architect of NVIDIA Voxels (GVDB) at NVIDIA, where he investigates applications of sparse volumes to 3D printing, scientific visualization, and motion pictures. In 2010, Rama's interdisciplinary thesis work in media arts at the University of California, Santa Barbara, explored creative support tools for procedural modeling. He studied compute science and fine arts at Cornell University, and co-founded the Game Design Initiative at Cornell in 2001.

We'll explore NVIDIA Voxels, a new open source SDK framework for generic representation, computation, and rendering of voxel-based data. We'll introduce the features of the new SDK and cover applications and examples in motion pictures, scientific visualization, and 3D printing. NVIDIA Voxels, based on GVDB Sparse Volume technology and inspired by OpenVDB, manipulates large volumetric datasets entirely on the GPU using a hierarchy of grids. The second part of the talk will cover in-depth use of the SDK, with code samples, and coverage of the design aspects of NVIDIA Voxels. A sample code walk-through will demonstrate how to build sparse volumes, render high-quality images with NVIDIA OptiX(TM) integration, produce dynamic data, and perform compute-based operations.

Level: All
Type: Talk
Tags: In-Situ and Scientific Visualization; Computational Physics; Manufacturing Industries; Media and Entertainment

Day: TBD
Time: TBD
Location: TBD

S7425 - 3D Printing with NVIDIA Voxels

Rama Hoetzlein Graphics Research Engineer, NVIDIA
Rama Hoetzlein is the lead architect of NVIDIA Voxels (GVDB). He studied computer science and fine arts at Cornell University, where he co-founded the Game Design Initiative in 2001. In 2010, Rama's interdisciplinary thesis work in media arts at the University of California, Santa Barbara, explored creative support tools for procedural modeling. His current work investigates applications of sparse volumes to 3D printing, scientific visualization, and motion pictures at NVIDIA.

Improvements in 3D printing allow for unique processes, finer details, better quality control, and a wider range of materials as printing hardware improves. With these improvements comes the need for greater computational power and control over 3D-printed objects. We introduce NVIDIA Voxels as an open source SDK for voxel-based 3D printing workflows. Traditional workflows are based on processing polygonal models and STL files for 3D printing. However, such models don't allow for continuous interior changes in color or density, for descriptions of heterogeneous materials, or for user-specified support lattices. Using the new NVIDIA Voxels SDK, we demonstrate practical examples of design workflows for complex 3D printed parts with high-quality ray-traced visualizations, direct data manipulation, and 3D printed output.

Level: All
Type: Talk
Tags: Manufacturing Industries; AEC Industries

Day: TBD
Time: TBD
Location: TBD

S7426 - Automated Truck Driving - Lane Keeping for Platooning based on Drive PX2

Devid Will Manager Automated Driving Functions, fka Forschungsgesellschaft Kraftfahrwesen mbH Aachen
Devid Will has worked as manager of Automated Driving Functions at Forschungsgesellschaft Kraftfahrwesen mbH Aachen since August 2016. Beside traditional approaches for ADAS development, his work focuses on verifying the potential of new techniques in the context of function development from an integration point of view. Previously, he was a research assistant at the Institute for Automotive Engineering, RWTH Aachen University. Devid studied mechanical engineering at the University of Applied Sciences in Aachen, earning a diploma degree, and he earned a master's degree of automotive engineering and transport from RWTH Aachen University.

We'll present our achievements in the field of automated truck driving, especially in the use case of lane keeping in platooning scenarios based on mirror cameras. The whole functionality (lane detection, generating of control parameters, controller, and arbitration) runs on the NVIDIA DRIVE PX 2 with three cameras attached to it. We'll conclude with next steps, especially increasing the number of cameras for a better representation of the surroundings.

Level: All
Type: Talk
Tags: AI for In-Vehicle Applications; Self-Driving Cars; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7428 - Singularity: Containers for Scientific Reproducibility, Mobility and High Performance Computing

Gregory Kurtzer HPC Systems Architect and developer, Lawrence Berkeley National Laboratory
Gregory Kurtzer has created many open source initiatives related to HPC, including Centos Linux, Warewulf, Perceus, and, most recently, Singularity. Gregory serves as a member of the OpenHPC Technical Steering Committee and is the IT HPC systems architect and software developer for Lawrence Berkeley National Laboratory.

Learn about Singularity, a container system designed to support the computational needs of scientists, including scientific reproducibility, extreme mobility of compute, and easily integratable with high-performance computational resources. Singularity supports all standard computational workflows, is resource manager agnostic, and natively supports GPUs.

Level: All
Type: Talk
Tags: HPC and Supercomputing; Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7429 - Real-World Tales of GPU-Accelerated Desktops and Apps - Implementers Share Best Practices

Luke Wignall Sr Mgr Perf Eng / Tech Marketing, NVIDIA
Highly-Rated Speaker
Luke Wignall leads the Performance Engineering and Technical Marketing team supporting NVIDIA GRID. He came to NVIDIA after working as an owner of an integrator/VAR, as a sales engineer, solution architect, consultant, and system administrator with both VMware and Citrix technologies in both public and private industry. An early evangelist of virtualization, Luke now sees the ability to bring GPUs to the end-user experience as the missing "special sauce" that brings virtual desktops to the next level.
Pat Lee VP Product Marketing, VMware
Pat Lee is vice president of product management for VMware Desktop and Application products. Among his responsibilities across the VMware portfolio, Pat oversees the Remote Experience team responsible for 3D graphics, remote display protocols, remote device access, desktop clients, thin clients, web clients, and mobile clients.

Experts from various industries join us for a roundtable discussion of their experiences implementing GPU-accelerated virtual desktops and apps. You'll learn how Windows 10 is creating new urgency around including GPUs in VDI deployment architectures; how to design environments for greater scale, superior user experience, and lower cost; and how the latest features in VMware Horizon and NVIDIA GRID(TM) can make desktop virtualization for every use case a reality.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization

Day: TBD
Time: TBD
Location: TBD

S7432 - Real-Time Analytics Powered by GPU-Accelerated Databases

Chris Prendergast Managing Director of Global Business Development and Alliances, Kinetica
Chris Prendergast is managing director of Global Business Development and Alliances at Kinetica. Prior to joining Kinetica, Chris led a team of specialists evangelizing new use cases to help SAP customers leverage HANA for Accelerated Analytics. Chris was an early employee of both Cloudera and Hortonworks, and has over 10 years of experience working with open source technology. For the better part of the last decade, he has been bringing organizations together to solve customer challenges in the U.S. and abroad. This work spans cloud, on premise, business intelligence, platforms, and applications. Chris has a B.S. in business management from the University of West Virginia, College of Business & Economics.

To stream data in real time, organizations need to figure out how to build an effective streaming pipeline that is easy to manage, low cost, scalable, and future-proofed. To address this challenge, enterprise architects have stitched together different tools, yielding decent results on predetermined analytics. But adding new data sources, new analytics, and scale can limit its long-term value. New solutions have emerged that can ingest many disparate datasets into one platform and open up net-new analytics. We'll describe a groundbreaking in-memory database technology powered by GPUs that enables high-speed data ingest and real-time data analytics. Drawing from real-world production use cases and live demos, we'll demonstrate how it's possible to perform advanced analytical queries across billions of rows of data in under a second.

Level: Beginner
Type: Talk
Tags: Accelerated Analytics; Deep Learning and AI; Intelligent Machines and IoT; Real-Time Graphics

Day: TBD
Time: TBD
Location: TBD

S7433 - How to Achieve Real-Time Analytics on a Data Lake Using GPUs

Mark Brooks Principal Solutions Engineer, Kinetica
Mark Brooks, principal systems engineer at Kinetica, has more than 20 years of experience designing and developing distributed systems. At Kinetica, Mark is championing the role of GPUs in accelerating large-scale, in-memory analytics. Previously, Mark spent four years at Cloudera implementing Hadoop-based solutions in a range of verticals, including healthcare, telco, and financial services. Prior to that, Mark held technical, engineering, and consulting roles at a variety of software companies. He holds a BA from UC Berkeley.

The complexities associated with development and ongoing management of a data lake that aims to deliver real-time analytic response can be costly and overwhelming. To get real-time analytic response on live, streaming data, consider plugging a GPU-accelerated database into your data lake. GPUs are often embedded in compute-intensive technologies like video games, cars, and mobile devices. They're now gaining traction in the data center. This talk will describe how a GPU-accelerated, scale-out, in-memory database brings orders of magnitude more compute power, with a significantly smaller hardware footprint, to provide unrivaled analytic capabilities. Get the latest information on GPUs, and how their multi-core architecture can process many computations efficiently and quickly, making them ideal for today's streaming datasets and IoT use cases.

Level: Beginner
Type: Talk
Tags: Accelerated Analytics; Deep Learning and AI; Intelligent Machines and IoT; Real-Time Graphics

Day: TBD
Time: TBD
Location: TBD

S7434 - Scaling Cyber Analysis with End-to-End GPU Acceleration

Brad Bebee Solutions Architect, SYSTAP, LLC
Brad Bebee is the CEO of SYSTAP, LLC, leading efforts to deliver graphs at scale with GPU and graph technologies. An expert in graphs and large-scale analytics, his background includes software development, telecommunications, and information retrieval. He has implemented large-scale analytics using Hadoop and Accumulo. He is leading the integration of GPU technologies for graph analytics into business and mission applications.

There are billions of network events every day. Analyzing these large event graphs is critical for effective cyber defense. However, the sheer amount of events overwhelms existing tools and systems. Graph analytics forms the basis for complex processing on large event graphs to identify anomalies as smaller sub-graphs for exploration. This session will unveil ongoing work for the first capability for end-to-end GPU acceleration of network traffic analysis that accelerates both analytics and visualization. We'll discuss community detection algorithms such as Newman Spectral Modularity and accelerating it with Blazegraph DASL and nvGraph. We'll also demonstrate the first integration with GPU-accelerated visualization (Graphistry) in which an analyst used community detection and graph traversals to successfully discover a network exfiltration.

Level: All
Type: Talk
Tags: Accelerated Analytics; Federal

Day: TBD
Time: TBD
Location: TBD

S7435 - Adapting DL to New Data: An Evolutionary Algorithm for Optimizing Deep Networks

Steven Young Research Scientist in Deep Learning, Oak Ridge National Laboratory
Steven Young is a researcher at Oak Ridge National Laboratory working in the Computational Data Analytics Group. His research focuses on applying deep learning to challenging datasets using HPC to enable faster training and quicker discovery. He has a Ph.D. in computer engineering from the University of Tennessee, where he studied machine learning in the Machine Intelligence Lab.

There has been a surge of success in using deep learning in imaging and speech applications for its relatively automatic feature generation and, in particular, for convolutional neural networks, high-accuracy classification abilities. While these models learn their parameters through data-driven methods, model selection (as architecture construction) through hyper-parameter choices remains a tedious and highly intuition driven task. To address this, multi-node evolutionary neural networks for deep learning (MENNDL) is proposed as a method for automating network selection on computational clusters through hyper-parameter optimization performed via genetic algorithms. MENNDL is capable of evolving not only the numeric hyper-parameters (for example, number of hidden nodes or convolutional kernel size), but is also capable of evolving the arrangement of layers within the network.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; HPC and Supercomputing
Industry Segments: Higher Education / Research; Government / National Labs

Day: TBD
Time: TBD
Location: TBD

S7436 - Boosting Visual Object Tracking Using Deep Features and GPU Implementations

Michael Felsberg Professor, Linköping University
Michael Felsberg received a Ph.D. in engineering from the University of Kiel, Germany, in 2002. Since 2008, he has been a full professor and the head of the Computer Vision Laboratory at Linköping University, Sweden. His current research interests include signal processing methods for image analysis, computer and robot vision, and machine learning. He has published more than 100 reviewed conference papers, journal articles, and book contributions. He was a recipient of awards from the German Pattern Recognition Society in 2000, 2004, and 2005, from the Swedish Society for Automated Image Analysis in 2007 and 2010, from the Conference on Information Fusion in 2011 (Honorable Mention), from the CVPR Workshop on Mobile Vision 2014, and from ICPR 2016 (best scientific paper in computer vision). He has achieved top ranks on various challenges (VOT: 3rd 2013, 1st 2014, 2nd 2015, 1st 2016; VOT-TIR: 1st 2015 and 2016; OpenCV Tracking: 1st 2015; KITTI Stereo Odometry: 1st 2015, March). He has coordinated the EU projects COSPAL and DIPLECS, he is an associate editor of the Journal of Mathematical Imaging and Vision, Journal of Image and Vision Computing, Journal of Real-Time Image Processing, Frontiers in Robotics and AI. He was publication chair of the International Conference on Pattern Recognition 2014 and track chair in 2016. He was the general co-chair of the DAGM symposium in 2011, and he will be general chair of CAIP 2017.

We'll explain how to use Deep Features for enabling state-of-the-art results in visual object tracking. Visual object tracking is a difficult task in three respects, since (1) it needs to be performed in real-time, (2) the only available information about the object is an image region in the first frame, and (3) the internal object models needs to be updated in each frame. The use of Deep Features gives significant improvements regarding accuracy and robustness of the object tracker, but straightforward frame-wise updates of the object model become prohibitively slow for real-time performance. By introducing a compact representation of Deep Features, a smart updating mechanism, and exploiting systematically GPU implementations for feature extraction and optimization, real-time performance is achievable without jeopardizing tracking quality.

Level: Advanced
Type: Talk
Tags: Computer Vision and Machine Vision; Intelligent Video Analytics; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7437 - Deep Learning-based Accelerated Analytics for Medical Imaging

Di Zhao Dr., Chinese Academy of Sciences
Dr. Di Zhao introduced their research in Show Your Science at GTC 2016.

Medical Accelerated Analytics includes electronic health records, medical imaging, genomic data, and more. Meanwhile, medical imaging data occupies more than 90 percent among them. How to apply medical big data into clinical practice? This is a question that concerns medical and computational researchers, and deep learning and GPU computing provide an excellent answer for this question. We'll introduce our research of deep learning-based disease diagnosis such as Alzheimer's disease and mild cognitive impairment, and discuss current statuses and approaches of deep learning-based medical Accelerated Analytics.

Level: All
Type: Talk
Tags: Healthcare and Life Sciences; Accelerated Analytics; Deep Learning and AI; Medical Imaging
Industry Segments: Healthcare & Life Sciences; Higher Education / Research

Day: TBD
Time: TBD
Location: TBD

S7438 - Build Systems: Combining CUDA and Modern CMake

Robert Maynard Staff R&D Engineer, Kitware, Inc.
Robert Maynard joined Kitware in 2010 as a research and development engineer. He is one of the primary developers of VTK-M and an active contributor to CMake, SMTK, CMB, ParaView, and VTK.

Learn all about CMake's new CUDA support and how best to combine it with "modern" CMake usage requirements. CMake is an open-source, cross-platform meta build generator. This year CMake was updated to fully support CUDA as a first-class language on all major platforms. This enables projects to fully leverage "modern" target-based features inside projects that require CUDA compilation. We'll iteratively develop the CMake logic for a sample project using modern CMake with a focus on CUDA. We'll cover transitive usage requirements, how to request language standard levels, mix language libraries, CUDA separable compilation, and generating export configuration files. We expect people to already have some familiarity with the CMake language.

Level: Intermediate
Type: Talk
Tags: Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7439 - How Video Analytics Help to Improve Efficiency for Broadcasting Industry

Jin Huang CTO, Arcvideo, Inc.
Jin Huang is CTO of Arcvideo, Inc., a leading video solution company in China that provides video transcoding, intelligent video analytics, and private and public cloud video service for broadcasting, OTT, and education customers.

Arcvideo is top video solution provider in China, targeting broadcasting companies, TV stations, and recent booming game/entertainment live broadcasting and online education markets. Video codec, intelligent video analytics, universal end device player, and cloud video service are four pillars of our product line. We'll discuss GPU-accelerated intelligent video analytics, which plays an increasingly important role in video-related products and services, bringing more efficiency to handling tons of emerging video content, and better interaction between end users and their video interests.

Level: Intermediate
Type: Talk
Tags: Media and Entertainment; Video and Image Processing; Intelligent Video Analytics

Day: TBD
Time: TBD
Location: TBD

S7440 - Create High-Quality Materials from Scans with MDL and Substance

Jerome Derel Chief Product Officer, Allegorithmic
Jerome Derel is an engineer and product designer who has been working for seven years at Dassault Systemes as a vizualization expert. We works on the Design Studio and CATIA Design teams, leading projects all meant to produce high-quality virtual materials. Jerome joined Allegorithmic in 2014 as chief product officer.
Pierre Maheut Product Manager, Allegorithmic
Pierre Maheut is a product manager and senior industrial designer at Allegorithmic. He spent eight years at Dassault Systemes as a CATIA creative design expert and portfolio manager. He graduated in mechanical engineering, industrial product design, and innovation management, with a strong industrial design background.

A worldwide leader for procedural texturing in the gaming industry with its Substance technology, Allegorithmic has partnered with NVIDIA to release Substance Designer 5.5, the first MDL visual editor to efficiently author material and transport the material definition across all supporting software. We'll present a full customer workflow, from high-resolution image scanning to actual MDL-defined material that could serve as reference, similarly to those available through Substance Source. We'll demonstrate customer use cases and present results (at GTC 2016 we showcased Hyundai and Harley-Davidson) with a live demo of Substance solutions with NVIDIA(R) Iray rendering on an NVIDIA VCA cluster, as well as an update on new features of Substance Painter 2.5 and Substance Designer 6.0 to be released in January 2017.

Level: All
Type: Talk
Tags: Rendering and Ray Tracing; AEC Industries; Manufacturing Industries; Media and Entertainment

Day: TBD
Time: TBD
Location: TBD

S7441 - Assembly Chain Training with Professional VR by Optis

Nicolas Dalmasso Innovation Director, Optis
Nicolas Dalmasso created his company, SimplySim, in 2008 with the goal of providing highly accurate real-time simulation middleware to compete with Virtools and Unity. SimplySim was acquired by Optis in 2011 to bring real-time and VR capabilities to the Optis portfolio. After driving the development and deployment of the different real-time products available at Optis (Theia-RT, HIM, and VR Xperience), Nicolas is now leading innovation at the corporate level. Nicolas studied computer graphics and advanced computer science at the University of Nice and Polytech Engineering School.

Optis has been involved in advanced optical simulation for the past 25 years and has recently invested in VR for virtual prototyping. Its latest HIM built for human ergonomics evaluation in combination with advanced, real-time, physics-based rendering enables precise environment reproduction for appropriate prototyping or training. We'll present the latest integration for assembly line training with HTC Vive and feedback powered by NVIDIA(R) PhysX(R). Companies such as Tesla Motors and Bentley are the proud early adopters of this solution. We'll demonstrate our software and show customer use cases and their data to explain how to improve the VR experience with haptics and audio simulation in the future.

Level: All
Type: Talk
Tags: Virtual Reality and Augmented Reality; Manufacturing Industries; Rendering and Ray Tracing

Day: TBD
Time: TBD
Location: TBD

S7442 - Pruning Convolutional Neural Networks for Resource-Efficient Inference

Pavlo Molchanov Research Scientist, NVIDIA
Pavlo Molchanov has been a research scientist at NVIDIA since 2015, working on efficient algorithms for deep learning and novel computer vision applications. He received his Ph.D. in the area of signal processing from Tampere University of Technology, Finland, in 2014.

We'll introduce a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. The approach is based on interleaving greedy criteria-based pruning with fine-tuning by backpropagation -- a computationally efficient procedure that maintains good generalization in the pruned network. We'll propose a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters. We'll focus on transfer learning, where large pretrained networks are adapted to specialized tasks. The proposed criterion demonstrates superior performance compared to other criteria, for example, the norm of kernel weights or feature map activation, for pruning large CNNs after adaptation to fine-grained classification tasks (Birds-200 and Flowers-102) relaying only on the first order gradient information. We'll also show that pruning can lead to more than 10x theoretical reduction in adapted 3D-convolutional filters.

Level: All
Type: Talk
Tags: Deep Learning and AI
Industry Segments: Higher Education / Research; Software

Day: TBD
Time: TBD
Location: TBD

S7444 - What the Profiler is Telling You: Optimizing GPU Kernels

Christoph Angerer Developer Technology Engineer, NVIDIA
Christoph Angerer is a developer in NVIDIA's European Developer Technology team. Based in Munich, Germany, he works with developers accelerating applications on GPUs. He holds a Ph.D. in computer science from ETH Zurich in Switzerland.
Jakob Progsch Developer Technology Engineer, NVIDIA
Jakob Progsch is a member of NVIDIA's European developer technology team working on scientific and machine learning applications. Jakob graduated with a master's in computational science and engineering from ETH Zurich in Switzerland.

In this session we explore how to analyze and optimize the performance of kernels running on the GPU. Working with a real-world example, we will walk through an analysis-driven process leading to a series of kernel-level optimizations, using NVIDIA's profiling tools as an example. Attendees will learn about the fundamental performance limiters—instruction throughput, memory throughput, and latency—and we will present strategies to identify and tackle each type of limiter. This session is accompanied by Session S7445, which considers performance optimization at application level.

Level: All
Type: Talk
Tags: Performance Optimization; Algorithms; Tools and Libraries; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7445 - What the Profiler is Telling You: Optimizing Whole Application Performance

Jakob Progsch Developer Technologies Engineer, NVIDIA, NVIDIA
Jakob Progsch is a member of the European developer technology team at NVIDIA, where he works on scientific and machine learning applications. Jakob graduated with a master's in computational science and engineering from ETH Zurich in Switzerland.
Mathias Wagner Sr. Developer Technology Engineer, NVIDIA
Mathias Wagner is a member of the European developer technology team at NVIDIA, where he works on high performance computing and scientific applications. Before joining NVIDIA, he worked as a postdoc in high-energy physics in Europe and the U.S. focusing on lattice quantum chromodynamics simulations using GPUs. Mathias holds a Ph.D. in theoretical physics from Darmstadt University of Technology.

In this session we explore how to analyze and optimize the performance of GPU-accelerated applications. Working with a real-world example, attendees will learn how to analyze application performance by measuring data transfers, unified memory page migrations, inter-GPU communication, and performing critical path analysis. Using the example application, and using NVIDIA's profiling tools as an example tool set, we will walk through various optimizations and discuss their impact on the performance of the whole application. This session is accompanied by Session S7444, which considers performance optimization of GPU kernels.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Algorithms; Tools and Libraries; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7447 - Image Restoration with Neural Networks

Orazio Gallo Senior Research Scientist, NVIDIA
Orazio Gallo joined NVIDIA Research in 2011, where he has worked on several computer vision and computational photography projects. These include studies on several aspects of stack-based photography, alternatives to traditional image processing pipelines, new paradigms for capturing and consuming photos, and displays that leverage human perception mechanisms to provide a more immersive visual experience. Orazio is an associate editor of the journal Signal Processing: Image Communication. He earned an M.S. in biomedical engineering from Politecnico di Milano, Italy, and a Ph.D. in computer engineering from the University of California at Santa Cruz.
Iuri Frosio Senior Research Scientist, NVIDIA
Iuri Frosio got his Ph.D. in biomedical engineering at the Politecnico of Milano, Italy, in 2006. He was a research fellow at the Computer Science Department of the University of Milan from 2003 and an assistant professor in the same department from 2006 to 2013. In the same period, he worked as a consultant for various companies in Italy and the U.S. He joined NVIDIA in 2014 as a senior research scientist. He is author of 12 international patents, one Italian patent, 16 papers in international journals, two book chapters, and 36 papers in international conferences.

We'll show how image restoration tasks, such as image denoising and demosaicking, super-resolution, JPEG deblocking, and image inpainting, performed with neural networks can beat state-of-the-art methods. In particular, we'll show how results significantly improve when the network is trained to evaluate images in the same way in which humans do, that is, when perceptual loss functions are used in training. We've done extensive research on the use of both differentiable and non-differentiable loss functions. To the best of our knowledge, we are the first to propose a method to use a non-differentiable loss function for a neural network.

Level: All
Type: Talk
Tags: Deep Learning and AI; Video and Image Processing

Day: TBD
Time: TBD
Location: TBD

S7449 - Driving the Assembly of the Zebrafish Connectome through Deep Learning

Ishtar Nyawira Co-President, Timmy Global Health: Pitt Chapter, University of Pittsburgh
Ishtar Nyawira is a computer science major at the University of Pittsburgh (class of 2018). Upon entering her freshman year, she chose to study biology but quickly grew interested in computer science, despite having little background in the field. After changing her major in her third year, she became wholly dedicated to educating herself inside and outside of the classroom in the fields of computer science. After she graduates with a B.S. in computer science and a minor in Korean, she will pursue a Ph.D. in machine learning or computer science. She works at the Pittsburgh Supercomputing Center on a machine learning project that will harness the power of deep learning to automate the process of high-resolution biomedical image annotation. Her current research interests include machine learning and deep learning, natural language processing and computational linguistics, software engineering, biological modeling and simulation, and the pairing of HPC and AI.
Nick Nystrom Senior Director of Research, Pittsburgh Supercomputing Center
Nick Nystrom is senior director of research at the Pittsburgh Supercomputing Center. Nick leads the scientific research and future technology teams of PSC, including the user support for scientific applications, biomedical, and public health applications groups, as well as a core team targeting strategic applications, allocations, and project management. He is principal investigator for "Bridges," a new kind of supercomputer that converges HPC and HPDA and aims to aid researchers who are new to HPC. His research interests include machine learning and data analytics, genomics, causal modeling, coupling HPC applications and AI, graph algorithms, hardware and software architecture, software engineering for HPC, and performance modeling. Nick earned his B.S. in chemistry, math, and physics and his Ph.D. in quantum chemistry from the University of Pittsburgh.

Tracing pathways through large volumes of data is an incredibly tedious, time-consuming process that significantly encumbers progress in neuroscience and the tracing of neurons through an organism. We'll explore the potential for applying deep learning to the automation of high-resolution scanning electron microscope image data segmentation. We've started with neural pathway tracing through 5.1GB of whole-brain serial-section slices from larval zebrafish collected by the Center for Brain Science at Harvard. This kind of manual image segmentation requires years of careful work to properly trace the neural pathways in an organism as small as a zebrafish larvae, which is approximately 5mm in total body length. Automating this process could vastly improve productivity, which would lead to faster data analysis and more breakthroughs in understanding the complexity of the brain.

Level: All
Type: Talk
Tags: Deep Learning and AI; HPC and Supercomputing
Industry Segments: Healthcare & Life Sciences

Day: TBD
Time: TBD
Location: TBD

S7452 - Cutting Edge OptiX Ray Tracing Techniques for Visualization of Biomolecular and Cellular Simulations in VMD

John Stone Senior Research Programmer, University of Illinois at Urbana-Champaign
Highly-Rated Speaker
John Stone is a senior research programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and associate director of the NVIDIA CUDA Center of Excellence at the University of Illinois. John is the lead developer of VMD, a high-performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. John was awarded as an NVIDIA CUDA Fellow in 2010. In 2015, he joined the Khronos Group advisory panel for the Vulkan graphics API. He also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing.

We'll present the latest advances in the use of NVIDIA ® OptiX™ for high-fidelity rendering of state-of-the-art biomolecular and cellular simulations. We'll present the latest technical advances in the OptiX-based ray -racing engines in VMD, which are heavily used for both interactive progressive ray-tracing (local and remote), and for batch mode in-situ or post-hoc visualization of petascale molecular dynamics simulations.

Level: All
Type: Talk
Tags: Rendering and Ray Tracing; In-Situ and Scientific Visualization; Healthcare and Life Sciences; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7453 - NVIDIA Advanced Rendering Products for End Users

Phil Miller Senior Director, Advanced Rendering Products, NVIDIA
Phillip Miller is senior director of NVIDIA's commercial advanced rendering offerings, ranging from Iray and mental ray, which ship within leading design and entertainment products, to IndeX technology, which is used in large data visualization. Phil has been with NVIDIA for 7 years and has led leading software products for over 20 years, including the entertainment offerings at Autodesk and the web design product line at Adobe. He holds a master's of architecture from the University of Illinois and is a registered architect.

Learn about NVIDIA products you can use within popular 3D tools, like 3ds Max, Maya, Rhino, and Cinema4D or to scale the rendering from other products across render farms. The new range of NVIDIA® Iray® plug-in products will be discussed, along with recent advances in NVIDIA mental ray®. We'll also include how the Iray SDK is employed and Iray is exposed in each of the different products to better support native workflows.

Level: Intermediate
Type: Talk
Tags: Rendering and Ray Tracing; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7456 - SQream DB: Analyzing Customer Behavioral Data to Drive Revenue, the GPU Way

Arnon Shimoni Senior Solutions Architect, SQream Technologies
Arnon Shimoni is a senior solutions architect at SQream. Arnon started out as a programmer working on SQream DB's core, and has since shifted to overseeing implementing SQream DB for customers in a variety of fields. Arnon is a keen technologist, interested in parallel computing, GPUs, and communication and information systems.

We'll explore how we deployed a GPU-based analytics database in a telecom, and how it created value for the customer in just a few days. Discover how major enterprises have successfully harnessed next-generation database technology to understand and use multi-terabytes of customer behavior data, using SQream DB powered by NVIDIA GPUs. We'll explore how a GPU solution can be applied to a real-world customer use case, and highlight the benefits and challenges of deploying a GPU-enabled solution. We'll also include actual performance benchmarks and screenshots from the deployed solution.

Level: All
Type: Talk
Tags: Accelerated Analytics

Day: TBD
Time: TBD
Location: TBD

S7458 - Deploying Unique DL Networks as Micro-Services with TensorRT, user Extensible Layers, and GPU Rest Engine

Chris Gottbrath Accelerated Computing Product Manager, NVIDIA
Chris Gottbrath has been attending GTC since around 2010. He's learned something every time, which is why he keeps coming back. He also likes to share what he knows. For the first few years, he gave talks about the awesome TotalView parallel debugger. Later, as an NVIDIA employee, he's given talks about various kinds of libraries and tools that developers can use to create unique solutions the CUDA platform.

Once you have trained your neural network to do some unique and interesting task, you might wonder how to make it available to colleagues, collaborators, or perhaps the world. One of the best ways to do that is to create a REST-based microservice. Then anyone with the URL can make a request and get an answer from your neural network. We'll show how three technologies come together to make that possible: 1. TensorRT provides low-latency, high-throughput inference; 2. Custom layer support in TensorRT allows you to express your unique deep learning secret sauce within TensorRT; 3. GPU Rest Engine gives you a fast and easy way to create a GPU-powered microservice. We'll show the steps necessary for you to start creating your own deep learning-powered microservices.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7459 - VR Rendering Improvements Featuring Autodesk VRED

Michael Nikelsky Senior Principal Engineer , Autodesk
Michael Nikelsky is a senior principal engineer at Autodesk working on the VRED product line. His field of work focuses on rendering and ray-tracing techniques as well as shader development. He has a diploma in computer science from the University of Koblenz-Landau, Germany.
Kai Ingo Esser Senior Engineer, Developer Technology, NVIDIA
Ingo Esser is a senior developer technology engineer in the Professional Solutions Group at NVIDIA, where he helps ISVs improve their rendering algorithms. These ISVs mostly work in the automotive and the oil and gas domains, where either rendering complex surfaces or visualizing large datasets is an issue. Ingo has a diploma in computer science from the chair for Computer Graphics and Multimedia at the RWTH Aachen, Germany.

Autodesk and NVIDIA will provide an update on current VR rendering features and their use in the wild. We'll introduce a new VR SLI extension and give an overview on other VR-related hardware features and algorithms. We'll then discuss their integration into Autodesk VRED and showcase the performance improvements achieved by leveraging features like VR SLI, single-pass stereo, and occlusion culling and how they help to great stunning VR experiences using complex, real-world automotive engineering and design datasets.

Level: Intermediate
Type: Talk
Tags: Virtual Reality and Augmented Reality; Manufacturing Industries; Rendering and Ray Tracing
Industry Segments: Manufacturing; Architecture / Engineering / Construction

Day: TBD
Time: TBD
Location: TBD

S7461 - Spectral rendering on the GPU

Juan Cañada Head of Rendering, Next Limit Technologies
Juan Canada joined Next Limit to work on research projects, later moving to the newly formed Maxwell Render research team. Since then Juan has held several positions in the team, leading it since 2007. He holds a bachelor's degree in mechanical engineering and a degree in environmental sciences. Outside the office, Juan used to describe himself as an acceptable guitar player, although his skills have deteriorated since the birth of his beautiful daughter. To try to stop himself thinking about rendering all the time, he is an avid scuba diver and underwater photographer, although sometimes, when he looks at how light behaves under the sea, he realizes how much work we have left to do!

Maxwell Render for the GPU was released in 2016. We'll present some state-of-the-art techniques used for advanced ray-tracing topics such as spectral rendering, dispersion, and multilight.

Level: Intermediate
Type: Talk
Tags: Rendering and Ray Tracing

Day: TBD
Time: TBD
Location: TBD

S7463 - Next-Generation GPU Rendering: High-End Production Features on GPU

Vladimir Koylazov CTO, Chaos Group
Highly-Rated Speaker
Vladimir Koylazov (Vlado) has more than 15 years of software development experience, the majority of which he spent developing and improving the render engine V-Ray. Passionate about 3D graphics and programming, Vlado is the driving force behind Chaos Group's software solutions. Vladimir is CTO of Chaos Software and one of the original creators of the V-Ray renderer.
Blagovest Taskov Lead Developer, Chaos Group
Blagovest Taskov is the lead of the V-Ray RT GPU developers team at Chaos Group. He works on the some of the latest advancements in V-Ray RT GPU, including improved OpenCL support, performance optimizations, and many rendering features.

Take a look at the next generation of GPU-accelerated rendering. See how advances such as MDL materials, procedural shading, and adaptive lighting algorithms are changing how high-end CG productions are created.

Level: Intermediate
Type: Talk
Tags: Rendering and Ray Tracing

Day: TBD
Time: TBD
Location: TBD

S7465 - Deep Learning for 3D Design and Making

Mike Haley Senior Director, Machine Intelligence, Autodesk, Inc.
Mike Haley leads the Machine Intelligence group at Autodesk, focused on groundbreaking machine learning technologies for the future of making things, which includes everything from 3D digital design to how physical creation or assembly occurs. His team develops the strategies for applying machine learning as well as performing research and development on techniques unique to designing and making. For the last several years, Mike's team has been focused on bringing geometric shape-analysis and high-scale machine learning techniques to 3D design information with the intent to make software a true partner in the design process. Formerly, Mike led the move of Autodesk products from the desktop to the cloud by driving the adoption of scalable distributed compute and data technology. Prior to joining Autodesk, Mike has performed research and product development in the fields of volumetric graphics, distributed multimedia, computer vision, and embedded systems.

We'll look at the application of deep learning to design information to provide AI-assisted 3D design as well as AI-assisted robotic assembly during the manufacturing process. Autodesk is working on facilitating a more efficient and open design-manufacture-use cycle using intelligent sensors, data aggregation, and deep learning. We'll discuss the DeepForm project for generating novel 3D forms as well as an intelligent robotic assembly project for making industrial robotic assembly a closed loop, general-purpose solution that is amenable to environmental and design changes.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Intelligent Machines and IoT; Manufacturing Industries

Day: TBD
Time: TBD
Location: TBD

S7466 - Production-Quality, Final-Frame Rendering on a GPU

Panagiotis Zompolas CTO, Redshift Rendering
Panagiotis Zompolas is a video game industry veteran driven by a passion for computer graphics and hardware. Panos has worked with GPUs since the days of the 3dfx and has closely followed the GPU compute revolution since its inception in the mid-2000s. Panos' career in the video game industry includes leading companies like Sony Computer Entertainment Europe and Double Helix Games (now Amazon Games). He has led teams of graphics programmers in the creation of render engines, spanning several generations of hardware. This experience, tied with his passion for the industry, is one of the key pillars of Redshif's success.
Robert Slater VP Engineering, Redshift
Robert Slater is a seasoned GPU software engineer and video game industry veteran, with a vast amount of experience in and passion for the field of programming. As a programmer, Rob has worked for companies such as Electronic Arts, Acclaim, and Double Helix Games (now Amazon Games). During this time, he was responsible for the core rendering technology at each studio, driving their creative and technical development. Rob's graphics engine programming experience and know-how ensures that Redshift is always at the forefront of new trends and advances in the industry.

We'll discuss the latest features of Redshift, the GPU-accelerated renderer running on NVIDIA GPUs that is redefining the industry's perception towards GPU final-frame rendering. A few customer work examples will be demonstrated. This talk will be of interest to industry professionals who want to learn more about GPU-accelerated production-quality rendering as well as software developers who are interested in GPU-accelerated rendering.

Level: Intermediate
Type: Talk
Tags: Rendering and Ray Tracing; Media and Entertainment

Day: TBD
Time: TBD
Location: TBD

S7467 - Multi-Dimensional Deep Learning for Medical Images

Bradley Erickson Director, Radiology Informatics Lab, Mayo Clinic
Brad Erickson received his M.D. and Ph.D. from Mayo Clinic. He went on to be trained in radiology, and then a neuroradiology fellowship at Mayo, and has been on staff at Mayo for 20 years. He does clinical neuroradiology, has been chair of the Radiology Informatics Division, and is currently associate chair for research. He has been vice chair of information technology for Mayo Clinic. He has been awarded multiple external grants, including NIH grants on MS, brain tumors, polycystic kidney disease, and medical image processing. He is a former president of the Society of Imaging Informatics in Medicine and is the chair of the board of directors for the American Board of Imaging Informatics and is on the board of the IHE USA. He holds several patents and has been involved in three startup companies.

Machine learning and deep learning have been applied to medical images to predict tumor type, genomics, and therapy effects. They can also be used to segment images, such as to define a tumor. While some traditional machine learning work has been multi-dimensional and multi-parametric, very few deep learning applications have gone beyond applying photographic networks to medical image problems. As such, they ignore some of the rich information available in other dimensions (3D and time) as well as parameter space (other types of images). We'll discuss some of the challenges and early results in extending traditional 2D convolutional neural networks to n-dimensional images, including space, time, and other parametric image types. Challenges include representational issues as well as computational (for example, memory constraints). Applications we'll show include multi-dimensional image segmentation of brain tumors as well as prediction of tumor genomics and therapy response.

Level: All
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Medical Imaging
Industry Segments: Healthcare & Life Sciences; Higher Education / Research

Day: TBD
Time: TBD
Location: TBD

S7468 - Deep Packet Inspection Using GPUs

Wenji Wu Principal Network Research Investigator, Fermilab
Wenji Wu is a principal network research investigator at Core Computing Division, Fermilab, where he has worked on high-speed networking and bulk data transfer. His research focus is to utilize multicore and manycore to address performance challenges in high-speed networks. Wenji is now responsible for two DOE network research projects, the MDTM project and the BigData Express project. He is also working on the WireCAP project, and a GPU-based network traffic monitoring and analysis project.

In high-speed networks, packet-based network traffic monitoring and analysis applications require a large amount of computing power and high I/O throughputs. These applications face extreme performance and scalability challenges. GPUs have been widely applied to accelerate general-purpose scientific and engineering computing. The GPU architecture fits well with the features of packet-based network monitoring and analysis applications. Fermilab network research group's prototype GPU-based network traffic monitoring and analysis system consists of two major components: a lossless packet capture engine that supports 10/40GE commodity NICs, using our WireCAP technology; and a complete set of GPU libraries for network traffic analysis. Our GPU libraries now supports per-packet-based deep inspection analysis. It is anticipated to support per-flow-based deep inspection analysis very shortly.

Level: All
Type: Talk
Tags: Data Center and Cloud Computing; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7469 - Parallel Depth-First Search on GPU

Maxim Naumov Sr. Research Scientist, NVIDIA
Highly-Rated Speaker
Maxim Naumov is a senior research scientist at NVIDIA. His interests include parallel algorithms, numerical linear algebra, optimization, and graphs. Maxim contributes to data analytics nvGRAPH library and has led the development of the AmgX library, which provides distributed Algebraic Multigrid, Krylov and Relaxation-based schemes. He has also worked on the cuBLAS, cuSPARSE, and cuSOLVER(RF) libraries that are part of the CUDA toolkit. In the past, Maxim held different positions at NVIDIA, including on the CUDA Platform and Emerging Applications teams, and at Intel in the Microprocessor Technology Lab and Computational Software Lab. Maxim received his Ph.D. in computer science, with a specialization in computational science and engineering, in 2009 and his B.S. in computer science and mathematics in 2003, all from Purdue University - West Lafayette.
Alysson Vrielink Sr. Research Scientist, SLAC National Accelerator Laboratory, Stanford University
Alysson Vrielink is currently working towards her Ph.D. in Electrical Engineering at Stanford University, conducting research on high frequency, high average power radiofrequency (RF) sources at SLAC National Accelerator Laboratory. Her interests include high performance parallel computing for scientific applications, applied mathematics and numerical methods, classical electromagnetism and novel RF structure design. She is Siemann Graduate Fellow, holds an NSERC postgraduate doctoral award and was recently elected as student representative to the American Physical Society, Division of Beam Physics. She received her B.S. in Engineering Physics from the University of British Columbia in 2013.

The Depth-First Search (DFS) algorithm is a fundamental building block used in many higher level applications, such as topological sort and connectivity and planarity testing of graphs. We'll briefly review prior results and propose two novel variations of parallel DFS on DAGs. The first traverses the graph three times in a breadth-first search-like fashion. The second assigns a weight to each edge, such that the shortest path from root to a node corresponds to the DFS path. The parallel algorithm visits all nodes in the graph multiple times and as a result computes the DFS parent relationship, pre- (discovery) and post-order (finish) time for every node. In some cases, the parallel DFS on GPU can outperform sequential DFS on CPU by up to 6x. However, the performance of the algorithm depends highly on the structure of the graph, and is related to the length of the longest path and the degree of nodes in the graph.

Level: All
Type: Talk
Tags: Algorithms; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7471 - Combining NVIDIA Docker and Databases to Enhance Agile Development and Optimize Resource Allocation

Sophie Voisin Research & Development Associate, Oak Ridge National Laboratory
Sophie Voisin is an R&D associate at Oak Ridge National Laboratory developing high performance computing methods for geospatial data analysis for the GIST group. Sophie received her Ph.D. in computer science and image processing from the Université de Bourgogne (France) in 2008 and joined ORNL in 2010 to work on numerous image processing-related projects, successively performing quantitative analysis of neutron 2D and 3D image data; developing new techniques for eye-gaze data analysis, for which she is a co-recipient of an R&D 100 award (2014); and now implementing multidimensional image processing algorithms on GPU platforms for high performance computing of satellite imagery.
Christopher Davis Geospatial Software Engineer , Oak Ridge National Laboratory
Chris Davis is a geospatial software engineer at Oak Ridge National Laboratory. He is the engineering lead for a high performance computing effort for solving geospatial data problems. He has over 15 years of scientific and engineering software development experience, ranging from proof-of-concept to full production code bases. Over this time, he has accumulated domain experience with EO, multi-spectral, hyper-spectral, LiDAR, and FMV data processing. He has contributed to several components in the ENVI remote sensing data processing software package. He holds a B.S. in electrical engineering from George Mason University.

Learn how to use NVIDIA Docker combined with database analysis to improve your agile development process, generalize hardware requirements, speed up deployment, and identify optimal configurations. Discover how to leverage the resource isolation of Docker containers to test different GPU-architecture performances and resource allocation to optimize system use and maximize processing throughput. Learn how to test this resource isolation using agile methods including development of a processing chain from multi-threaded CPU, to single GPU, and finally to multi-GPU architecture. Hear our observations about compilation timing, execution performance, resource allocation, and generation of CUDA binaries within containers while showcasing an automated image registration pipeline.

Level: Intermediate
Type: Talk
Tags: Data Center and Cloud Computing; HPC and Supercomputing; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7472 - Comparative Study of CNN Models for Detection of Clouds in Overhead Imagery

Byung Hoon Park R&D Staff Scientist, Oak Ridge National Laboratory
Byung Hoon Park, an R&D staff member at Oak Ridge National Laboratory, has worked as a data scientist focusing on high performance computing. His past research includes computational statistics, data mining, discriminative probabilistic graphical models, and distributed parallel machine learning algorithms. Recently he has participated in a number of HPC software R&D projects that aim to bring HPC capabilities into scientific and application domains, including biology, climate, healthcare, and quality assurance of airborne imageries. He also serves as an adjunct associate professor at the Department of Business Analytics and Statistics of the University of Tennessee, Knoxville.

Learn how to improve pixel-wise image quality and geolocation accuracy by leveraging high-end hybrid computing resources. This particular test case involves the use of deep learning in the detection and masking of cloud objects, and imagery content that reduces image quality and usability, from overhead imagery. Timely results are attained through expediting selection and deployment of a deep learning model for overhead imagery for the cloud detection problem. An optimum deep learning model is selected through evaluation of a set of convolutional neural networks for their ability to detect cloud objects. Evaluation of each network is performed using a number of open-source neural network packages to give comparative performance results. In addition, two complementary image segmentation techniques are implemented in parallel, one operating on CPUs and the other on GPUs, to rapidly obtain candidate regions for cloud objects at a fine resolution.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7473 - Towards a Fully Automated, High-Performance Pipeline for Stereo Reconstruction from Earth Observing Satellites

Dave Kelbe Postdoctoral Research Associate; Imaging Scientist, Oak Ridge National Laboratory
Dave Kelbe is an imaging scientist with a passion for applying satellite remote sensing technology to global geospatial challenges. Currently embedded in the Computational Science and Engineering Division at Oak Ridge National Laboratory in Knoxville, Tennessee, Dave's research relies heavily upon high-performance GPU computing to process large volumes of satellite imagery at scale. Dave received a Ph.D. in imaging science with an emphasis on remote sensing and 3D image processing from Rochester Institute of Technology.

Learn how CPU-GPU parallelization is used for high-throughput 3D surface point cloud generation from Earth-observing satellites. Stereo photogrammetry, used in computer vision applications, analyzes the parallax between image pairs to estimate depth. However, extending this workflow to satellite imagery presents computational challenges; notably, near-continuous streams of gigapixel-sized images. We leverage multicore and multiple Tesla K80 GPUs to assemble a fully automated pipeline capable of rapidly processing large image streams. Initial timings demonstrated an 89x (~10x over OpenMP, multicore scaling) performance improvement over its publicly available version. We'll share lessons learned in extending stereo reconstruction algorithms into satellite imaging, at scale.

Level: Intermediate
Type: Talk
Tags: Computer Vision and Machine Vision; HPC and Supercomputing; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7474 - Scalable Enterprise Visualization

Jack Greasley Head of New Technology, The Foundry
Jack Greasley is the Head of New Technology at The Foundry, located in London, UK.
George Matos Bunsen Project Leader, The Foundry
George Matos is a Bunsen project leader at The Foundry, located in London, UK.

We'll discuss Bunsen, a large-scale visualization framework that prepares and optimizes engineering, architectural, and other CAD and CAM data. Bunsen is a cloud-hosted solution that reads and writes various industry standard file formats (for example, Revit, SOLIDWORKS, Rhino, Maya, Max, Siemens, and Microstation) and provides powerful tools for processing and conversion. It runs on public cloud solutions, such as AWS or Google, or within your own data center or on-prem cloud. All hardware and software are provisioned in the cloud and are usable from any laptop, tablet, or phone with a web browser. Within Bunsen, the user can create sets of reusable rules to process data for visualization and output. You can think of these rules as company standards relating to lighting, materials, colors, and how to reduce object complexity. Possible visualization output platforms include rendering and animation, virtual reality, augmented reality, and real-time game engines, such as Unreal and Unity. Bunsen doesn't mean you change your workflow -- it is a framework to automate, document, and accelerate your existing workflows.

Level: All
Type: Talk
Tags: AEC Industries; Manufacturing Industries; Rendering and Ray Tracing

Day: TBD
Time: TBD
Location: TBD

S7475 - Using the Entire GPU: Accelerating Analytics and Visualizing Outputs

Todd Mostak Founder and CEO, MapD
Todd Mostak is the founder and CEO of MapD, a pioneer in building GPU-tuned analytics and visualization applications for the enterprise. Todd conceived of the idea of using GPUs to accelerate the extraction of insights from large datasets while conducting his Harvard graduate research on the role of Twitter in the Arab Spring. Frustrated by the capabilities of conventional technologies to allow for the interactive exploration of these multi-million row datasets, Todd built one of the first GPU-based databases. Upon completion of his studies at Harvard, Todd joined MIT as a research fellow at the Computer Science and Artificial Intelligence Laboratory, focusing on GPU databases and visualization before founding MapD in late 2013. Todd received his undergraduate degree from the University of North Carolina at Chapel Hill in economics and anthropology.

We'll discuss the approach to and advantages of using GPUs to not only power through large-scale database queries but also use the graphics pipeline of the GPU to rapidly and efficiently visualize the outputs of billions of rows of data. The application of the GPU for both query and render results in a fast system for multi-terabyte scale analytic challenges. We'll cover the high-level benefits of the approach and delve into the technical details associated with GPU-powered databases, server side rendering, and other software refinements needed to squeeze the maximum amount of performance from this exceptional hardware platform.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Real-Time Graphics; AI Startup; Federal

Day: TBD
Time: TBD
Location: TBD

S7477 - IFM: Intelligent Flying Machines Counting Inventory

Marc Gyongyosi CEO and Founder, Intelligent Flying Machines
Marc Gyongyosi is a senior at Northwestern University and the CEO and founder of Intelligent Flying Machines, a data analytics company using computer vision and robotics to automate indoor data capture. IFM's first product is an end-to end system that fully automates inventory counts in warehouses, saving individual facilities several millions of dollars in costs every year. Marc's expertise is in visual-inertial navigation, GPU programming, and industrial robotics. Previously, Marc worked at a startup developing a novel 3D vision system for self-driving cars with funding from the National Science Foundation and Google. Before that, he worked with BMW's Advanced Robotics R&D group in Munich, where he developed lightweight collaborative robots.

We'll describe how Intelligent Flying Machines is leveraging NVIDIA GPU technology to fully automate inventory counting in warehouses with flying robots. Three different topics will be covered: First, we'll talk about the challenges of commercializing advanced robotics technology for industrial applications and how IFM has developed a framework that enables effective deployment and implementation. Then, we'll discuss recent advances in leveraging an onboard Jetson TX1 GPU for highly accurate, long-distance visual inertial navigation in a warehouse environment. Finally, we'll show how IFM is using deep learning to enable its flying robots to adapt to different types of warehouses and identify key pieces of information in the environment.

Level: Intermediate
Type: Talk
Tags: Intelligent Machines and IoT; Computer Vision and Machine Vision; Deep Learning and AI; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7478 - Using OpenACC to Parallelize Irregular Algorithms on GPUs

Sunita Chandrasekaran Assistant Professor, University of Delaware
Sunita Chandrasekaran is an assistant professor at the University of Delaware with the Computer and Information Sciences Department. Her research interests include exploring suitability of high-level programming models and runtime systems for HPC and embedded platforms, along with exploring challenges while migrating scientific applications to such systems. Sunita's research publications include developing and using parallel programming models, building performance and power modeling for GPUs, constructing compiler and runtime frameworks, and adapting scientific applications on parallel computing platforms. She holds a Ph.D. from NTU, Singapore, with specialization on designing software for FPGAs.
Arnov Sinha Graduate Student, University of Delaware
Arnov Sinha is a master's student at the University of Delaware in the Computer and Information Sciences Department. His research focus includes high performance computing, specifically leveraging high-level programming models to target and optimize computationally heavy codes running on parallel architectures. Arnov is also interested in deep learning and computer vision. He obtained his bachelor's degree in engineering from University of Mumbai, India.

We'll dive deeper into using OpenACC and explore potential solutions that can overcome challenges faced while parallelizing an irregular algorithm, sparse Fast Fourier Transform (sFFT). We'll analyze code characteristics using profilers, discuss optimizations applied, things we did right, things we did wrong, along with roadblocks that we faced and steps taken to overcome them. We'll highlight how to compare data reproducibility between accelerators in heterogeneous platforms, and report on the algorithmic changes from sequential to parallel especially for an irregular code, while using OpenACC. The results will demonstrate how to create a portable, productive, and maintainable codebase without compromising on performance using OpenACC.

Level: Intermediate
Type: Talk
Tags: Programming Languages; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7479 - Mars Rovers to End-to-End Inspection Solutions: GPUs for Machine Intelligence

Mark Woods Head of Autonomy and Robotics, SCISYS
Mark Woods is the Head of Autonomy and Robotics at SCISYS.

Benefit from our experiences applying NVIDIA GPUs and libraries to implement high performance deep learning and vision systems, from training on high powered workstations to deployment on embedded systems. GPUs provide the performance to enable us to exploit approaches and algorithms developed for space in terrestrial spin-off applications. We will use examples and demonstrations from our Mars rover development systems to show how we were easily able to leverage GPUs to advance our R&D work on autonomous science into practical terrestrial applications for the automated inspection of the built environment. Use our experience to judge how you might apply GPUs to your challenging problems.

Level: All
Type: Talk
Tags: Manufacturing Industries; Computer Vision and Machine Vision; Deep Learning and AI; Intelligent Machines and IoT

Day: TBD
Time: TBD
Location: TBD

S7481 - Big Image-Omics Data Analytics for Clinical Outcome Prediction

Junzhou Huang Associate Professor, University of Texas at Arlington
Junzhou Huang is an associate professor in the department of computer science and engineering at the University of Texas, Arlington. He received the B.E. from Huazhong University of Science and Technology, Wuhan, China, his M.S. from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, and his Ph.D. in computer science at Rutgers University in New Jersey. Junzhou's major research interests include machine learning, computer vision, and big medical data analytics. He was globally selected as one of the 10 emerging leaders in multimedia and signal processing by the IBM T.J. Watson Research Center in 2010. His work won the MICCAI Young Scientist Award 2010, the FIMH Best Paper Award 2011, the MICCAI Young Scientist Award Finalist 2011, the STMI Best Paper Award 2012, the NIPS Best Reviewer Award 2013, the MICCAI Best Student Paper Award Finalist 2014 and the MICCAI Best Student Paper Award 2015. He received the NSF CAREER Award 2016.

We'll introduce how to develop big image-omics data analytics algorithms with GPU computing tools for clinical outcome prediction from pathological images and cell profiling data of cancer patients. Recent technological innovations are enabling scientists to capture image-omics data at increasing speed and resolution, where the image-omics refers to both image data (pathology images or radiology images) and omics data (genomics, proteomics, or metabolomics) captured from the same patient. This is generating a deluge of heterogeneous data from different views. Thus, a compelling need exists to develop novel data analytics tools to foster and fuel the next generation of scientific discovery in image-omics data-related research. However, the major computational challenges are due to the unprecedented scale and complexity of heterogeneous image-omics data analytics. There is a critical need for large-scale modeling and mining strategies to bridge the gap and facilitate knowledge discovery from complex image-omics data. We'll introduce our recent work on developing novel deep learning methods to detect cells in the terapixel histopathological images with 10,000+ speedup and automatically discovering biomarkers for clinical outcome prediction.

Level: All
Type: Talk
Tags: Healthcare and Life Sciences; Accelerated Analytics; Deep Learning and AI; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7482 - Advances in Real-Time Graphics at Pixar

Dirk Van Gelder Software Engineer, Pixar Animation Studios
Dirk Van Gelder joined Pixar Animation Studios in 1997 as an software engineer for Academy Award® nominated film "A Bug's Life" and winning short film "Geri's Game," working on animation software and the studio's first use of subdivision surfaces. Dirk has worked on software for every Pixar movie since, including the ground-up rewrite of the studio's proprietary animation system Presto. Currently Dirk leads the Presto Character team within the Pixar Studio Tools Department.
David Yu Senior Graphics Software Engineer, Pixar Animation Studios
David Yu is a senior graphics software engineering at Pixar.
Pol Jeremias-Vila Graphics Engineer, Pixar
Pol Jeremias-Vila is passionate about technology and art. He grew up near Barcelona and moved to California in 2006. Since then, Pol has researched computer graphics and worked in multiple games for companies such as LucasArts or SoMa Play. Today, he helps create movies at Pixar Animation Studios. In his spare time, he has co-founded Shadertoy.com and Beautypi. When he is not programming, you'll find him running, reading, or watching movies.

Explore how real-time graphics are used at Pixar Animation Studios. We'll describe the unique needs for film production and our custom solutions, including Presto and our open-source projects Universal Scene Description (USD), OpenSubdiv, and Hydra. Don't miss this great opportunity to learn about graphics, algorithms, and movies!

Level:
Type: Talk
Tags: Media and Entertainment; Rendering and Ray Tracing; Real-Time Graphics

Day: TBD
Time: TBD
Location: TBD

S7483 - SpaceNet Satellite Imagery Deep Learning Implementations and Performance

Todd Stavish CTO, CosmiQ Works
Todd Stavish is a co-founder and CTO of CosmiQ Works, a division of In-Q-Tel Labs. CosmiQ Works's mission is to help the intelligence community leverage new and emerging commercial space capabilities against mission problems. At CosmiQ Works, Todd leads the SpaceNet Challenge, a corpus of commercial satellite imagery and associated algorithm design competitions. The goal of SpaceNet is to foster innovation in the development of computer vision to automatically extract information from remote sensing data. Before working at CosmiQ, Todd was the technical lead on In-Q-Tel's big data, geospatial, and commercial space investments. He spent his early career working in Silicon Valley startups.
Todd Bacastow Director, Strategic Alliances, Digital Globe
Todd Bacastow is director of strategic alliances for DigitalGlobe, a leading provider of geospatial information and insight. He works to incubate, launch, and grow products for DigitalGlobe's Insight line of business, which brings together satellite imagery, geospatial data, and analytic technologies to answer critical questions for decision makers. He joined DigitalGlobe through the acquisitions of GeoEye and SPADAC, where he was a leader in geospatial predictive analytic software and expertise, working closely with the founding leadership team as manager of strategic initiatives.

The commercialization of the geospatial industry has led to an explosive amount of data being collected to characterize our changing planet. One area for innovation is the application of computer vision and deep learning to extract information from satellite imagery at scale. SpaceNet's objective is to release remote sensing data (for example, satellite imagery) to the public to enable developers and data scientists. Today, map features such as roads, building footprints, and points of interest are primarily created through manual techniques. We believe that advancing automated feature extraction techniques will serve important downstream uses of map data, including humanitarian and disaster response, as recently observed by the need to map buildings in Haiti during the response to Hurricane Matthew. Furthermore, we think that solving this challenge is an important stepping stone to unleashing the power of advanced computer vision algorithms applied to a variety of remote sensing data applications in both the public and private sector.

Level: All
Type: Talk
Tags: Federal; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7484 - Object Detection, Super-Resolution on Satellite Imagery

Patrick Hagerty Director of Research, CosmiQ Works
Patrick Hagerty is the director of Research of CosmiQ Work, an In-Q-Tel Lab focusing on Space 3.0 startups. Previously, Patrick was an applied research mathematician for the Department of Defense working on high performance computing and emerging technologies. He received his Ph.D in mathematics from the University of Michigan in the area of geometric mechanics.

A new frontier in the commercialization of space is emerging that offers lower-cost access to space-based remote sensing with companies. While capabilities vary across current and future providers of spaced-based imagery, we'll investigate how the application of modern imagery analysis techniques increase the complementing value of multiple remote sensing solutions. We architect and train deep convolutional neural networks to enhance lower resolution imagery from higher resolution imagery: super-resolution. During the super-resolution process, the peak signal-to-noise ratio (PSNR) is not uniform through out the image. To assist the imagery analyst, it is preferable to maximize PSNR gain in areas of interest. We investigate the distribution of PSNR gain during the super-resolution of a satellite image. We compare the results of PSNR with accuracy of object detection algorithms to measure the impact of super-resolution on standard computer vision problems.

Level: Advanced
Type: Talk
Tags: Federal; Computer Vision and Machine Vision; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7487 - Infrastructure Differentiation in Satellite Imagery with Convolutional Neural Networks

Adam Van Etten Research Scientist, IQT
Adam Van Etten is a research scientist at CosmiQ Works, an In-Q-Tel Lab, where he applies machine learning and computer vision techniques to satellite imagery data, focused on problems of interest to the U.S. government. Prior to In-Q-Tel, Adam was a data scientist for L-3 Data Tactics at DARPA headquarters developing tools and scalable algorithms for big data analysis. He received his Ph.D. in physics from Stanford University and bachelors in physics and astronomy from the University of Washington.

We'll discuss efforts to leverage state-of-the art deep learning frameworks to the task of broad area search in satellite imagery. Differing infrastructure projects of very different purpose often look very similar in satellite imagery, and we'll explore the ability of deep learning frameworks to disentangle such classes. A similar complication applies to vehicles viewed from space, where object sizes are often only a couple dozen pixels.

Level: Intermediate
Type: Talk
Tags: Federal; Deep Learning and AI; Computer Vision and Machine Vision

Day: TBD
Time: TBD
Location: TBD

S7489 - Clustering GPUs with Ethernet

Fazil Osman Distinguished Engineer, Broadcom Limited
Fazil Osman is a Distinguished Engineer in Broadcom's Compute and Connectivity Division. For the last few years, he has been responsible for Broadcom's Ethernet NIC strategy as datacenters move to a heterogeneous computing environment. One of his focus areas has been providing direct connectivity between GPUs in order to enable high performance, scalable machine learning platforms. He has also been responsible for new SoCs that enable the disaggregation and sharing of HDDs and SSDs at scale in the datacenter. He started at Broadcom in the Core Switching Group defining their software strategy for enabling white box Ethernet switches by opening up the switch to software developers. Prior to Broadcom, he has been part of multiple startups including being the CTO of Astute Networks and XLNT Designs. At Astute Networks, he led the development of the first, fully software programmable 10 Gbps TCP termination engine and at XLNT Designs, he developed multiple Ethernet switch products.

As GPUs get more widely deployed for machine learning, training is being done over larger datasets than ever before resulting in longer training time. Reducing training time from days to hours or less, requires clustering of large number of GPUs. As more users are starting to see the benefits of machine learning to their businesses, there is also a need to provide on-demand access to the users of these data center-based clusters. The ideal technology for such large-scale clustering in the data center is Ethernet. We'll discuss the work Broadcom is doing with NVIDIA to enable GPUDirect using its RoCE v2 line of Ethernet NICs.

Level: All
Type: Talk
Tags: Data Center and Cloud Computing; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7491 - LipNet: End-to-End Sentence-Level Lipreading

Yannis Assael DPhil Student, University of Oxford
Yannis Assael graduated from the Department of Applied Informatics, University of Macedonia, Greece, in 2013. He was awarded a full scholarship from the Hellenic State Scholarships Foundation to study for an M.S. in computer science at the University of Oxford, where he received the Tony Hoare Prize for the best overall performance in 2014. His studies focused in the fields of artificial intelligence and machine learning. In 2015, he continued for an M.Res. in machine learning at Imperial College London under the HiPEDS Scholarship. Having obtained the second highest mark, he went back to the University of Oxford to study for a D.Phil. degree in machine learning under the Oxford - Google DeepMind Graduate Scholarship. Throughout his studies, he has participated in more than 50 freelance and consulting projects, and his machine learning research projects have attracted the attention of media several times, including the BBC, Reuters, and New Scientist.
Brendan Shillingford DPhil Student, University of Oxford
Brendan Shillingford graduated from the University of British Columbia after studying a Combined Honours degree in Computer Science and Statistics. He now studies for a DPhil (PhD) in Machine Learning in the Department of Computer Science at the University of Oxford as a Clarendon Scholar.

LipNet is the first end-to-end sentence-level lip reading model that simultaneously learns spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in sentence-level, overlapped speaker split task, outperforming experienced human lip readers and the previous 86.4% word-level state-of-the-art accuracy (Gergen et al., 2016).

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Video and Image Processing

Day: TBD
Time: TBD
Location: TBD

S7493 - DNA for Automated Driving

Thomas Halva Labella Project Manager, Driver Assistance, Elektrobit Automotive GmbH, Elektrobit
Thomas Halva Labella is the project manager for driver assistance at Elektrobit, which he joined in 2007. In this role, he manages R&D projects on the topics of autonomous driving and HD maps. Thomas received a Ph.D. in 2007 from Université Libre de Bruxelles, Belgium, on swarm intelligence and swarm robotics. The publications related to his work have been cited more than 1,000 times. He holds an M.S. in computer science engineering from Politecnico di Milano, Italy, on autonomous group-robotics and control algorithms based on fuzzy logic.

We'll discuss an architecture that enables discrete driver assistance systems to work in tandem that is enabling automakers to develop complex systems more quickly and efficiently. We'll show a reference implantation of a valet parking function built using the architecture and accessing maps from the cloud.

Level: Intermediate
Type: Talk
Tags: Self-Driving Cars; AI for In-Vehicle Applications

Day: TBD
Time: TBD
Location: TBD

S7495 - Optimizing Application Performance with CUDA Profiling Tools

Sanjiv Satoor Senior Engineering Manager, Developer Tools, NVIDIA
Sanjiv Satoor is a manager for the CUDA profiler development team at NVIDIA, Pune, India. He received his Master's degree in Computer Science and Engineering from IIT Mumbai and Bachelor's degree from IIT Chennai.
Mayank Jain Engineering Manager, Developer Tools , NVIDIA
Mayank Jain is a manager of the Developer tools team at NVIDIA, Pune, India.

This session will provide an overview of new features added in NVIDIA Visual Profiler and nvprof. It will show how these profiling tools can be used to identify optimization opportunities at the application, kernel, and source-line levels. It will include some examples on how the Visual Profiler Unified Memory and NVLink analysis features can be used to identify performance bottlenecks.

Level: All
Type: Talk
Tags: Tools and Libraries; Performance Optimization
Industry Segments: Higher Education / Research

Day: TBD
Time: TBD
Location: TBD

S7497 - Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

Xiaodong Yang Research Scientist , NVIDIA
Xiaodong Yang is a research scientist at NVIDIA. His research interests include computer vision, machine learning, deep learning, and multimedia analytics. He has been working on large-scale image and video classification, hand gesture and activity recognition, dynamic facial analysis, video surveillance event detection, multimedia search, and computer vision-based assistive technology. He received his Ph.D. from City University of New York in 2015 and B.S. from Huazhong University of Science and Technology in 2009.

We'll present a novel framework to combine multiple layers and modalities of deep neural networks for video classification, which is fundamental to intelligent video analytics, including automatic categorizing, searching, indexing, segmentation, and retrieval of videos. We'll first propose a multilayer strategy to simultaneously capture a variety of levels of abstraction and invariance in a network, where the convolutional and fully connected layers are effectively represented by the proposed feature aggregation methods. We'll further introduce a multimodal scheme that includes four highly complementary modalities to extract diverse static and dynamic cues at multiple temporal scales. In particular, for modeling the long-term temporal information, we propose a new structure, FC-RNN, to effectively transform the pre-trained fully connected layers into recurrent layers. A robust boosting model is then introduced to optimize the fusion of multiple layers and modalities in a unified way. In the extensive experiments, we achieve state-of-the-art results on benchmark datasets.

Level: All
Type: Talk
Tags: Media and Entertainment; Deep Learning and AI; Intelligent Video Analytics; AI for In-Vehicle Applications

Day: TBD
Time: TBD
Location: TBD

S7502 - Generative Adversarial Networks

Ian Goodfellow Research Scientist, OpenAI
Ian Goodfellow is a research scientist at OpenAI. He is best known as the inventor of generative adversarial networks and as the lead author of the Deep Learning textbook. He has studied under Andrew Ng at Stanford University and Yoshua Bengio and Aaron Courville at Université de Montréal. Prior to joining OpenAI, he was a senior research scientist at Google Brain. His research interests include generative models and machine learning security.

Generative adversarial networks are machine learning models that can generate new data drawn from the same distribution as the training data. They are widely used for image generation tasks and are beginning to be used for video generation and reinforcement learning. We'll describe the basics of how GANs work and summarize their latest applications.

Level: All
Type: Talk
Tags: Deep Learning and AI; Algorithms; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7503 - Massively Parallel Algorithm and Implementation of RI-MP2 Energy Calculations for Multi-GPU Supercomputers

Michio Katouda Research Scientist, RIKEN
Michio Katouda is a researcher of theoretical and computational chemistry. He received his Ph.D. in chemistry from Waseda University in Tokyo in 2011. Afterwards, he was appointed as a research scientist in the Computational Molecular Science Research Team at RIKEN Advanced Institute for Computational Science. His main research interests are the development of efficient computation techniques and massively parallel algorithm of molecular electronic structure theory, such as Moller-Plesset perturbation theory and density functional theory for large molecules and extended systems. He is a developer of massively parallel RI-MP2 code in NTChem software and GAMESS-US software.

We performed the multi-GPU massively parallel implementation of resolution-of-identity second order Moller-Plesset perturbation (RI-MP2) energy calculation suitable for calculations of large molecules on CPU/GPU hybrid supercomputers. We'll report the overview of implementation and the results of performance evaluation of the implementation using up to 1,349 nodes and 4,047 GPUs of the TSUBAME 2.5 supercomputer. The GPU computation speeds up considerably (4.1-6.6 times) the RI-MP2 calculations. Parallel scalability of present GPU implementation is good with the number of nodes. 514.7 TFLOPs of the measured peak performance is attained for the GPU job of (C96H24)2 using 1,349 nodes and 4,047 GPUs of TSUBAME 2.5, which is much higher than that of CPU jobs (87.5 TFLOPs). We also present application of the inter-molecular interaction analysis of nano-carbon molecular assemblies such as nanographenes.

Level: Advanced
Type: Talk
Tags: Computational Chemistry; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7504 - Improving Consumer Compliance Through Better Product Recommendation- New Skin Advisor Tool Powered by AI

Matthew L. Barker, Ph.D. Principal Data Scientist, Proctor & Gamble
Matthew L. Barker is a Principal Data Scientist in R&D Quantitative Sciences at Procter & Gamble. After completing a Ph.D. in Statistics at the University of Kentucky, Matt has worked at P&G for 16 years. His current focus is on data science & machine learning involving unstructured or semi-structured data from sensors, images, video, and text. Matt also has computer science experience with Linux system administration, solving numerical analysis problems, and applying several different programming languages.

Consumers currently struggle to find the right cosmetic skin care products suited to their personal needs and preferences. Hundreds of brands and product forms sit next to each other on store shelves without simple and intuitive means for consumers to determine what's right for them. A new skin advisor tool has been developed to deliver a personalized beauty consultation tailored for consumers unique skin needs right at her fingertips. We identified that getting the right level of educational information to the consumer, combined with understanding her concerns and aesthetic preferences, can drive product compliance. We collected over 50,000 images of women of known chronological age and built a deep convolutional neural network model that could not only predict a woman's visible skin age with great accuracy but also identify which areas of her face she should focus her skincare on, to improve her skin appearance. Skin age accuracy was validated compared to image gradings from over 350 dermatologists. We'll discuss how we used NVIDIA GPUs and deep learning techniques to develop this new tool.

Level: All
Type: Talk
Tags: Deep Learning and AI; Video and Image Processing

Day: TBD
Time: TBD
Location: TBD

S7505 - Enable GPU-Accelerated Simulation Practices on the Cloud with Rescale

fanny Treheux director of solutions, Rescale
Fanny Treheux is responsible for leading solution engineering at Rescale. Previously, Fanny spent 10 years at Dassault Systemes SIMULIA as a sales engineer addressing customer simulation needs across multiple industries, including high tech, life science, aerospace, and automotive.

We'll review the benefits of leveraging NVIDIA GPU technology through Rescale, a cloud-based simulation platform. Through concrete engineering use cases and benchmark results, we'll illustrate performance gains with GPUs across a large selection of simulation software.

Level: All
Type: Talk
Tags: Computer Aided Engineering; Data Center and Cloud Computing; Computational Fluid Dynamics; AI Startup
Industry Segments: Aerospace; Automotive; Cloud Services; Defense; Energy / Oil & Gas; Financial Services; Manufacturing; IT Services

Day: TBD
Time: TBD
Location: TBD

S7506 - Rolling in the Deep: How to Debug Machine Learning Call Stacks

Martin Bakal Product Manager, Rogue Wave Software
Marty Bakal is a product manager for TotalView, a industry-leading debugger. With over 25 years of experience, his expertise spans multiple industries, including medical devices and electronics. Some of Martin's areas of expertise include product line engineering, modeling, validation and verification testing, agile development, and internet of things and embedded systems. He also has spent time consulting and supporting customers (developers) in many different industries.

Python is a popular language for deep learning, but debugging calls to existing C/C++ code in shared libraries can be extremely challenging. Untangling the confusing maze of library calls, data translations, and linked in CUDA code can be convoluted and time consuming, as neither Python nor C/C++ debuggers provide a comprehensive view across the languages. We'll look at how Python-C/C++ transformations combined with a multi-threaded, multi-process debugger helps you understand what's going on within your deep learning code.

Level: All
Type: Talk
Tags: Tools and Libraries; Deep Learning and AI
Industry Segments: Software; Aerospace; Energy / Oil & Gas; Financial Services

Day: TBD
Time: TBD
Location: TBD

S7507 - Computer Preemption and TotalView Have Made Debugging Pascal Much More Seemless

Martin Bakal Product Manager, Rogue Wave Software
Marty Bakal is a product manager for TotalView, a industry-leading debugger. With over 25 years of experience, his expertise spans multiple industries, including medical devices and electronics. Some of his areas of expertise include product line engineering, modeling, validation and verification testing, agile development, internet of things, and embedded systems. He also has spent time consulting and supporting customers and developers in many different industries.
Larry Edelstein Sales Engineer, Rogue Wave Software
Larry Edelstein has been working in software development for over 25 years. From mainframes to desktops through GPU clusters and the EC2 cloud, he has seen platforms and practices come and go. Larry has worked with many companies throughout his career – from start-up to well established industry leaders creating software, leveraging his systems engineering expertise and mentoring junior engineers. Larry has a Bachelor of Science in Computer Science from Cornell University and lives in the Bay Area.

With Pascal, NVIDIA released compute preemption built right into the card. Debugging now is much smoother because when we stop a thread on the GPU we no longer stop the whole GPU, enabling interactive debugging on single-GPU systems and debugging multiple processes using the same GPU. Having said that, TotalView, the leading multi-threaded Linux debugger, has invested into improving its architecture to support multi-GPU systems at scale, resulting in a much more seamless debugging experience. Come get a better understanding of the latest technology and how and where we are looking to go next.

Level: All
Type: Talk
Tags: Tools and Libraries; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7508 - Medical ImageNet: A Resource for Machine Learning from Medical Images

Curtis Langlotz Professor of Radiology and Biomedical Informatics, Stanford University
Curtis P. Langlotz, M.D., Ph.D., serves as professor of radiology and biomedical informatics and associate chair for information systems in the Department of Radiology at Stanford University and as a medical informatics director for Stanford Health Care, responsible for the computer technology that supports the Stanford Radiology practice. Curtis's research is focused on reducing diagnostic errors through real-time decision support systems and other information technologies. His biomedical informatics laboratory develops novel machine learning and natural language processing algorithms that provide intelligent assistance to radiologists, clinicians, patients, and other consumers of diagnostic imaging information. Curtis has founded three health care information technology companies, including Access Radiology in 1992, eDictation in 1998, and Montage Healthcare Solutions in 2010, which was acquired by Nuance Communications in 2016.

Machine learning research on medical images has lagged similar work on conventional visible light images due to the added complexity of medical images and the lack of available annotated large image sets. To address this limitation, Stanford researchers are creating a massive clinical imaging research resource, containing de-identified versions of all Stanford radiology images, annotated with concepts from a medical imaging ontology, and linked to genomic data, tissue banks, and information from patients' electronic medical records. This dataset contains 0.5 petabyte of clinical radiology data, comprising 4.5 million studies, and over 1 billion images. The broad long-term objective of this resource is to dramatically reduce diagnostic imaging errors by: (1) facilitating reproducible science through standardization of data and algorithms for medical image machine learning research, (2) enabling patients to participate in the scientific enterprise by volunteering their data for these experiments, (3) spurring innovation by hosting competitions on clinically validated image sets, and (4) disseminating the resulting data, informatics tools, and decision support algorithms to the widest possible scientific audience. We'll review progress toward creation of the Stanford Medical ImageNet, including details of database structure and contents, and recent results from deep learning experiments on the data it contains.

Level: Beginner
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Computer Vision and Machine Vision; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7510 - Apache Spark and GPUs for Scaling Deep Learning Libraries

Joseph Bradley Software Engineer, Databricks, Inc
Joseph Bradley is an Apache Spark Committee member and PMC member working on machine learning at Databricks. Previously, he was a postdoc at UC Berkeley, after receiving his Ph.D. in machine learning from Carnegie Mellon University.
Tim Hunter Software Engineer, Databricks, Inc
Tim Hunter is a software engineer at Databricks and contributes to the Apache Spark MLlib project. During his Ph.D. at UC Berkeley, he built distributed machine learning systems starting with Spark version 0.2.

Apache Spark has become a popular tool for data warehousing, ETL, and advanced analytics. Meanwhile, deep learning has become one of the most powerful classes of machine learning methods, in large part due to the computational power of modern machines with GPUs and specialized hardware. Spark and GPUs combine well for large deep learning workflows: Spark can handle ETL and data management, and it can distribute data parallel tasks to scale out across many GPUs.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Deep Learning and AI; Data Center and Cloud Computing; Federal

Day: TBD
Time: TBD
Location: TBD

S7511 - Scaling Scale: How Distributed GPU Databases Get to 1 Trillion Rows

Bill Maimone VP of Engineering, MapD
Bill Maimone is the vice president of engineering at MapD. Bill joined the company from Anaplan, where he served as vice president of engineering. Before that, he was the vice president of engineering for Chatter at Salesforce as well as a senior vice president and the CTO of big data analytics pioneer, Actian. Bill began his career at Oracle, where he spent two decades. During that time, he held a number of senior engineering roles, culminating as a vice president with responsibility for over 500 members of the R&D team across four continents. He holds an M.S. and a B.S. in computer science and a B.S. in journalism from the Massachusetts Institute of Technology.

The ability to use GPUs to power real-time analytics past the billion row threshold is already here. But what about a trillion rows? The technical challenges to overcome that hurdle are more complex and require a delicate balance of memory management, data serialization over the network, servers working in lockstep, and managing redundancy and single points of failure. We'll outline and demonstrate how MapD tackled this problem and, more importantly, how you can visualize the outputs of various queries.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Data Center and Cloud Computing; AI Startup; Federal

Day: TBD
Time: TBD
Location: TBD

S7512 - Deep Learning for Self-Driving Cars

Raquel Urtasun Associate Professor, University of Toronto
Raquel Urtasun is an associate professor in the Department of Computer Science at the University of Toronto and a Canada Research Chair in machine learning and computer vision. Raquel received her Ph.D. from the Ecole Polytechnique Federal de Lausanne in 2006 and did her postdoc at MIT and UC Berkeley. Her research interests include machine learning, computer vision, robotics, and remote sensing. Her recent work involves perception algorithms for self-driving cars, deep structured models, and exploring problems at the intersection of vision and language. Her lab was selected as an NVAIL lab. She is a recipient of an NVIDIA Pioneers of AI Award, an NSERC Steacie Award (given to the best six scientists in Canada), an Early Researcher Award, two Google Faculty Research Awards, an Amazon Faculty Research Award, a Connaught New Researcher Award, and a Best Paper Runner-up Prize awarded at CVPR. She is program chair of CVPR18 and an editor of IJCV.

We'll talk about deep learning for self-driving cars, including sensing, perception, localization, and mapping. It'll be non-technical and a summary of my group's latest results in the field.

Level: All
Type: Talk
Tags: Computer Vision and Machine Vision; Deep Learning and AI; Self-Driving Cars

Day: TBD
Time: TBD
Location: TBD

S7513 - Applications of Generative Adversarial Networks to Drug Discovery in Oncology and Infectious Diseases

Artur Kadurin Head of Segmentation Group, Mail.ru
Artur Kadurin is the head of the segmentation group at Mail.ru, one of the world's largest internet holding companies. He has worked for the company since 2011 in various technical roles in the fields of advertising, search, social media networks, and game development. He is a high-ranked online poker professional and has specialized in machine learning since 2009. He graduated from Kuban State University as a systems programmer and did his graduate work at the Steklov Mathematical Institute and at Insilico Medicine, where he is a consultant. His current research interests are in generative adversarial networks and applications of GANs to healthcare. Over the past several years, he developed an active interest in longevity research and age-related diseases.
Polina Mamoshina Sr. Research Scientist, Pharmaceutical Artificial Intelligence, Insilico Medicine, Inc
Polina Mamoshina is a senior research scientist at Insilico Medicine, Inc., a Baltimore-based bioinformatics and deep learning company focused on reinventing drug discovery and biomarker development and a part of the computational biology team of Oxford University Computer Science Department. Polina graduated from the Department of Genetics of Moscow State University. She was one of the winners of GeneHack, a 48-hour hackathon on bioinformatics at the Moscow Institute of Physics and Technology attended by hundreds of young bioinformaticians from across Russia. Polina is involved in multiple deep learning projects at the Pharmaceutical Artificial Intelligence division of Insilico Medicine, working on the drug discovery engine and developing biochemistry, transcriptome, and cell-free nucleic acid-based biomarkers of aging and disease. She recently co-authored seven academic papers in peer-reviewed journals.
Alex Zhavoronkov CEO, Insilico Medicine, Inc
Alex Zhavoronkov, PhD, is the CEO or Insilico Medicine, Inc, a company applying latest advances in artificial intelligence to drug discovery, biomarker development and aging research headquartered at the Emerging Technology Centers located at the the Johns Hopkins University at Eastern in Baltimore and the CSO of the Biogerontology Research Foundation, a UK-based registered charity supporting aging research worldwide. He is also the director of the International Aging Research Portfolio (IARP) knowledge management project and head of the Regenerative Medicine Laboratory at the Federal Clinical Research Center for Pediatric Hematology, Oncology and Immunology, one of the largest children's cancer centers in the world performing over 300 bone marrow transplantations annually since 2012.

Recent advances in deep learning and specifically in generative adversarial networks have demonstrated surprising results in generating new images and videos upon request, even using natural language as input. We'll present the first application of generative adversarial autoencoders (AAE) for generating novel molecules with a defined set of parameters. In the first proof of concept experiment, we developed a seven-layer AAE architecture with the latent middle layer serving as a discriminator. As an input and output, the AAE uses a vector of binary fingerprints and concentration of the molecule. In the latent layer, we also introduced a neuron responsible for growth inhibition percentage, which, when negative, indicates the reduction in the number of tumor cells after the treatment. To train the AAE, we used the NCI-60 cell line assay data for 6252 compounds profiled on MCF-7 cell line. The output of the AAE was used to screen 72 million compounds in PubChem and select candidate molecules with potential anti-cancer properties. This approach is a proof of concept of an artificially intelligent drug discovery engine, where AAEs are used to generate new molecular fingerprints with the desired molecular properties. We'll also present the applications of this approach to discovering new anti-infective drugs and present the roadmap for generating drugs for rare diseases and even for individual patients.

Level: All
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Computational Biology

Day: TBD
Time: TBD
Location: TBD

S7514 - Deep Representation and Reinforcement Learning for Anomaly Detection and Control in Multi-Modal Aerospace Applications

Soumalya Sarkar Senior Research Scientist , United Technologies Research Center
Dr. Soumalya Sarkar is a senior research scientist at United Technology Research Center. He has six years of experience in statistical signal processing and machine learning, deep learning, probabilistic graphical models, anomaly detection and estimation, sensor fusion, image processing and situation awareness. He serves as co-principal investigator for government (DARPA TRADES) and corporate research funded programs in the area of machine learning for complex systems. Along with dual master's degrees in mathematics and mechanical engineering, Soumalya has a Ph.D. in mechanical engineering focused on machine learning in electro-mechanical applications from Penn State University in 2015. He has received numerous awards and accolades, "best paper of the session" award at ACC 2012, research and innovation grants in last five years. He is also a member of IEEE, ASME, and PHM society and he has co-authored 35 peer-reviewed publications.

We'll discuss how deep auto-encoder (DAE) and deep reinforcement learning (DRL) can be formulated to address multimodal anomaly detection and additive manufacturing control problems in aerospace domain. DAE-based representation learning is constructed by multi-layered neural-net architecture to model complex data non-linearity. We use DAE via NVIDIA GPU implementation for: (1) unsupervised fault disambiguation from big multimodal data, and (2) structural health monitoring (crack detection) from experiment video frames on aerospace material. At the second half of the talk, we show how guided policy search (GPS) based DRL framework can be implemented for optimally planning and generalizing trajectory nozzle dynamics in a wide range of cold spray type of additive manufacturing application.

Level: Intermediate
Type: Talk
Tags: Intelligent Machines and IoT; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7515 - Eliminating the Regular Expression with Neural Networks

Tim Delisle CEO, Datalogue
Tim Delisle is the CEO and co-founder of Datalogue, a venture-backed company that uses deep learning to automate data preparation. Before Datalogue, Tim worked at Merck on the data science and insights team, where he dealt first-hand with the problems that Datalogue solves with deep learning. He was part of the inaugural class in health tech at Cornell Tech, where he studied computer science and its intersection with the health care industry. His master's project involved the use of deep learning to automate data preparation. During his leisure time he enjoys cooking, biking and hanging out with his dog.

Regular expressions are as old as computing itself. Our deep learning-based approaches aim to retire this tool from the modern data scientist's tool bag. The regular expression is often introduced to computer scientists as part of their early college education, often in their first discrete structures course. In this context, they are an incredible tool used to describe languages, grammars, and syntax. In practice though, developers all over the world use them to detect data types or parse certain structures. Even for common use cases such as email or phone validation, regular expressions that capture the full breadth of cases can become untenably large. We show how neural networks can learn approximation of regular expressions so that modern data scientists and developers never have to write one again.

Level: All
Type: Talk
Tags: Accelerated Analytics; Deep Learning and AI; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7516 - TorontoCity Benchmark: Towards Building Large-Scale 3D Models of the World

Shenlong Wang Ph.D. Student, University of Toronto
Shenlong Wang is a fourth-year Ph.D. student at the University of Toronto, advised by Professor Raquel Urtasun. His research interests lie on the intersection of computer vision and machine learning. Before coming to Toronto, Shenlong received his B.S. and M.S. from Northwestern Polytechnical University. He has also spent time working at research labs of Microsoft, Snapchat, the Chinese Academy of Sciences, and The Hong Kong Polytechnic University.

We'll introduce the TorontoCity HD mapping benchmark, which covers the full greater Toronto area with 712.5 square-km of land, 8,439 km of roads, and around 400,000 buildings. Our benchmark provides different perspectives of the world captured from airplanes, drones, and cars driving around the city. Manually labeling such a large-scale dataset is infeasible. Instead, we propose to utilize different sources of high-precision maps to create our ground truth. Towards this goal, we develop algorithms that allow us to align all data sources with the maps while requiring minimal human supervision. We have designed a wide variety of tasks, including building height estimation (reconstruction), road centerline and curb extraction, building instance segmentation, building contour extraction (reorganization), semantic labeling, and scene-type classification (recognition). Our pilot study shows that most of these tasks are still difficult for modern convolutional neural networks.

Level: All
Type: Talk
Tags: Computer Vision and Machine Vision; Deep Learning and AI; HD Mapping

Day: TBD
Time: TBD
Location: TBD

S7517 - Mastering Computational Chemistry with Deep Learning

Olexandr Isayev Professor, University of North Carolina at Chapel Hill
Olexandr Isayev is a research assistant professor at UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill. His research interests focus on making sense of chemical data with molecular modeling and machine learning. Before joining UNC in 2013, Olexandr was a post-doctoral research fellow at Case Western Reserve University and a scientist at a government research lab. In 2008, he received his Ph.D. in computational chemistry. He received the "Emerging Technology Award" from the American Chemical Society and the GPU computing award from NVIDIA in 2014.

Deep learning is revolutionizing many areas of science and technology, especially image, text, and speech recognition. We'll demonstrate several examples how a deep neural network trained on quantum mechanical (QM) DFT calculations can learn an accurate and fully transferable potential for organic molecules and materials. In a recent paper, (1) we introduced ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies), or ANI for short. ANI is a new method designed with the intent of developing fully transferrable neural network potentials that utilize symmetry functions to build single-atom atomic environment vectors as a molecular representation. Through a series of case studies, we'll show that ANI-1 is chemically accurate compared to reference DFT calculations on much larger molecular systems than those included in the training dataset, with root mean square errors as low as 0.56 kcal/mol. As the results clearly show, the ANI method is a potential game changer for molecular simulation.

Level: Advanced
Type: Talk
Tags: Computational Biology; Computational Chemistry; Healthcare and Life Sciences; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7519 - Developer Tools for Automotive, Drones and Intelligent Cameras Applications

Sebastien Domine VP Software Engineering, Developer Tools , NVIDIA
Highly-Rated Speaker
Sebastien Domine has been working at NVIDIA CORP for 17 years, where he currently holds the position of VP. Software Engineering, Developer Tools. He spent most of his tenure at NVIDIA creating developer tools to foster the adoption of GPU technologies and spread its adoption for Graphics and Compute applications. Prior to NVIDIA, he held various software engineering positions at THQ/GameFX, where he worked on a few 3D PC games (Sinistar Unleashed) and Katrix and Nichimen Graphics on Digital Content Creation tools for 3D modelers and animators. Sebastien holds a Diplome d'Ingenieur from EPITA (school of software engineering, Paris, France).

Embedded development systems are getting more powerful than ever. With this trend comes the ever-growing complexity of delivering real-time applications that can capitalize on all the potential computational horsepower of the system. The application developer needs to be able to design new software IP, easily port the application to the Embedded system, and then optimize and maximize the CPUs and GPUs utilization, data acquisition and transfers, to provide a reliable real-time visual computing experience that can full fill even the most demanding computational requirements. In this tutorial/talk – the audience will learn about recommended development flows for the latest embedded systems. We will cover the overall developer tools offering available for each of the specific Software Development Kits provided respectively to Automotive, Embedded and Mobile platforms. For each of these platforms, we will dissect and present important learnings from the development of show casing applications demonstrating advanced Autonomous Driving and Intelligent Video Analytics use cases. The audience will learn what tools are available for each platform and the purpose of each tool and its value proposition that can be taken advantage of.

Level: Intermediate
Type: Talk
Tags: Tools and Libraries; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7520 - DeepLumen: Fast and Accurate Segmentation of Coronary Arteries for Improved Cardiovascular Care

Kersten Petersen Senior Medical Imaging Researcher, HeartFlow
Kersten Petersen has been a senior medical imaging researcher at HeartFlow since 2014. He especially enjoys research at the boundary of medical image analysis and machine learning. Kersten has published state-of-the-art segmentation algorithms of anatomical structures, including coronary arteries, mammograms, and the spine. He obtained a Ph.D in computer science from the University of Copenhagen, Denmark in 2012, and focused on deep learning for medical imaging in the lab of Andrew Ng at Stanford in 2011. Previously, he graduated with an M.S. in computer science from the University of Freiburg, Germany, including a research stay at the University of Western Australia.

Learn about HeartFlow's unique approach for better diagnosis and treatment of cardiovascular disease. From CT images, HeartFlow creates a complete geometric and physiologic model of the patient's coronary anatomy. Blood flow is simulated using computational fluid dynamics to functionally assess narrowings of the coronary artery. HeartFlow's approach is approved by regulatory bodies and in commercial use around the world today. We'll focus on DeepLumen, the fast and highly accurate method for extracting coronary arteries from a CT scan. It is formulated as a novel 3D rotational CNN that exploits translational and cyclic symmetries. DeepLumen is shown to be at least as accurate as expert radiologists in quantifying disease compared to invasive catheterization measurements.

Level: Intermediate
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Algorithms; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7521 - Ocean Circulation on GPUs

Hailong Liu Prof., LASG, Institute of Atmospheric Physics, Chinese Academy of Sciences
Hailong Liu is a senior scientist of LASG, Institute of Atmospheric Physics, Chinese Academy of Sciences, and also a professor of University of Chinese Academy of Sciences. Hailong got his Ph.D. in meteorology from the Graduate School of Chinese Academy of Sciences in 2002. He has been developing an ocean circulation model since he got his Ph.D. Recently, his team tried to port their model to GPUs.

We'll show the development of an ocean circulation model in China and recent work to port the whole model to GPUs using OpenACC. The preliminary results of the performance of the GPU version also will be shown.

Level: All
Type: Talk
Tags: Computational Fluid Dynamics; Earth Systems Modeling; Computer Aided Engineering

Day: TBD
Time: TBD
Location: TBD

S7524 - Impacts and Paradigms Enabled by GPUs in Engineering Simulations of Discrete Elements

Nicolin Govender Senior Scientist, Research Center Pharmaceutical Engineering GmbH / CSIR: Center for High Performance Computing
Nicolin Govender is a senior research scientist at the Center for Pharmaceutical Research and Engineering (RCPE GmbH) in Austria and the Center for High Performance Computing (CHPC) in South Africa. Nicolin is associated with several world-leading institutes as an affiliated scientist. He is a member of the ATLAS Collaboration at CERN in Geneva, Switzerland, where he works on projects associated with computing in high-energy physics. He is also a visiting researcher at the Ecole Mines Douai and University of Lille in France and works on collaborative projects on granular mechanics, and at the University of Utah working on collaborative projects related to mining and minerals engineering. Nicolin has published over 25 journal papers spanning high-energy physics, DEM and high performance computing with an H-factor of 25. He is the developer of BLAZE-DEM, currently the fastest DEM code in the world and capable for simulating polyhedral particles on the GPU.
Daniel N. Wilke Senior Lecturer (PhD) in the Department of Mechanical and Aeronautical Engineering, University of Pretoria
Daniel N. Wilke is a senior lecturer (PhD) in the Department of Mechanical and Aeronautical Engineering at the University of Pretoria. Daniel is a design optimization researcher that investigates computational finite and discrete element applications within the Centre for Asset and Integrity Management. C-AIM is focused on life cycle management of physical assets for key industries in South Africa. This includes the optimization of industrial processes, which requires computationally demanding large-scale analyses. He proposed gradient-only optimization in 2006 as an alternative optimization formulation that allows for multi-fidelity simulation models to be used when accurate sensitivities are available. He is also a co-developer relating formulation of the discrete element computational platform BlazeDEM-GPU. Since 2015, Daniel has been a Tuks Young Research Leadership Fellow, which is aimed at driving and developing research excellence within Africa.

We'll explore the impact of the GPU in engineering simulations of discrete elements and glimpse into the future of simulations and engineering training. We consider the roles played by the open-source Blaze-DEMGPU framework we developed, as well as the commercial framework XPS, developed specifically for the pharmaceutical industry by the RCPE GmbH (Research Center Pharmaceutical Engineering GmbH) that allows engineers to simulate process changes before being actually implemented. Industrial-scale discrete element simulations remain a big challenge, but the GPU architecture is changing that perception fast, as is demonstrated by the open-source framework Blaze-DEM and the commercial framework XPS. However, engineering simulation remains characterized by either the analyze-wait-modify-analyze cycle or more recently the batch analyze-wait-modify-batch analyze cycle. The GPU is enabling a new and alternative paradigm denoted interactive simulation and design (ISD) as is demonstrated by Blaze-DEMGPU. We'll explore the algorithmic development of Blaze-DEMGPU in detail with a short historical tour outlining the development as the GPU architectures changed from Kepler to Pascal, enabling higher fidelity models in addition to the natural progression from the conventional analysis cycle towards ISD and the various roles machine learning can play.

Level: All
Type: Talk
Tags: Computational Physics; Computational Biology; Computational Chemistry; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7525 - GI Next: Global Illumination for Production Rendering on GPUs

Enzo Catalano Senior Graphics Software Engineer, NVIDIA ARC GmbH
Enzo Catalano is a senior graphics software engineer at NVIDIA, working on the core of the mental ray rendering engine, which has become a standard for photorealistic rendering across the film, visual effects, and design industries. He previously worked for many years in the field of visual effects as a developer of pipeline tools, custom shaders, and ad-hoc solutions like simulation tools. Prior to that, he was involved in the development of custom multimedia installations across Europe.
Rajko Yasui-Schoeffel Senior Graphics Software Engineer, NVIDIA ARC GmbH
Rajko Yasui-Schöffel is Senior Graphics Software Engineer at NVIDIA, working on GPU acceleration of mental ray, a production renderer which is widely used for photorealistic rendering across the film, visual effects, and design industries. Before, he has been working on the NVIDIA OptiX GPU ray tracing engine. Rajko has a lot of experience in real-time computer graphics, he co-founded a small company specialized in visualization of large architectural datasets, and worked on the OpenGL-accelerated rendering engine of mental images RealityServer project.

Learn how to accelerate the computation of global illumination (a very expensive part of the rendering process) with the aid of GPUs. Porting a production renderer to take advantage of GPUs is a considerable effort and often requires rewriting the whole engine; moreover, custom shaders may not be accessible in source code and often introduce performance penalties if not especially adapted to the accelerator. However, function calls to the renderer's API from within shaders may be intercepted and thus costly functions in the render core may be accelerated outside of the shader code. One such render core API function is the calculation of the global illumination contribution, and it is this part that we accelerate on the GPU.

Level: Intermediate
Type: Talk
Tags: Rendering and Ray Tracing; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7527 - Unstructured Low-Order Finite-Element Earthquake Simulation Using OpenACC on Pascal GPUs

Takuma Yamaguchi Master course student, The University of Tokyo
Takuma Yamaguchi is a master's student in the Department of Civil Engineering at the University of Tokyo. His research focus is on high performance computing targeting earthquake simulations. More specifically, his work performs fast crustal deformation computation for multiple computation enhanced by GPUs. Takuma has a B.E. from the University of Tokyo.

We'll show a method that decreases random memory accesses for GPUs by splitting up calculations properly. The target application is unstructured low-order finite element analysis, the core application for manufacturing analyses. To reduce the memory access cost, we apply the element-by-element method for matrix-vector multiplication in the analysis. This method conducts local matrix-vector computation for each element in parallel. Atomic and cache hardware in GPUs has improved and we can utilize the data locality in the element node connectivity by using atomic functions for addition of local results. We port codes to GPUs using OpenACC directives and attain high performance with low development costs. We'll also describe the performance on NVIDIA DGX-1, which contains eight Pascal GPUs.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing; Computational Physics; Computer Aided Engineering; Manufacturing Industries; Computational Fluid Dynamics

Day: TBD
Time: TBD
Location: TBD

S7528 - Quick and Robust: Constructing Suffix Arrays Using GPU

Pallav Kumar Baruah Associate Professor, Sri Sathya Sai Institute of Higher Learning
Pallav Kumar Baruah is an associate professor and head of the Department of Mathematics and Computer Science at Sri Sathya Sai Institute of Higher Learning in Puttaparthi, India. After completing his master's degree in mathematics in 1990, Pallav embarked on teaching and research in the area of mathematical analysis and differential equations, culminating with a Ph.D. in 1994. His research interests include differential equations, nonlinear systems, parallel processing, multicore computing, high performance computing and bio-informatics with HPC. Pallav has about 120 publications in national and international journals and conferences. Widely known for his contributions to research in high performance and parallel computing, he was conferred the NVIDIA Innovation Award in IEEE HiPC 2014, Goa, in addition to 12 Best Paper & Poster awards in various IEEE conferences such as HiPC, MDM, and PDGC.

We'll present GPU-based methods for constructing suffix arrays for genome data. Genome analysis often requires sequence alignment for finding similarity among several genome sequences. This requires efficient pattern matching algorithms to find the substring match in reasonable time. The best algorithms known to suit the needs of genome analysis for indexing are based on suffix trees, suffix arrays, Burrows-Wheeler transforms, FM-index. Suffix tree is a tree containing all the suffixes of the given sequence. In bio-informatics applications, the suffix array has to be constructed for humongous data. It is a time-consuming process. By reducing the suffix array construction time, we can effectively reduce the overall computation time of the applications. Difference Cover Modulo 3 is one of the techniques that enable us to construct the suffix array in linear time. Implementation of this requires a different architecture from those of conventional CPUs. GPU architecture that is made of thousands of cores coupled with various features of CUDA empowers an implementation of suffix arrays that gives 10x performance over the sequential one.

Level: All
Type: Talk
Tags: Computational Biology; Algorithms; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7533 - Leveraging Deep Learning And Drones In Capital Projects Monitoring: Case Study

Adam Wisniewski Director, PricewwaterhouseCoopers
Adam Wiśniewski is a director and co-founder of PwC Drone Powered Solutions Global Centre of Excellence. Adam is focused on practical implementation of technologies that increase situational awareness of all stakeholders of the construction process. He's responsible for the rollout of the solution across the PwC network globally. Adam also leads the PwC CEE practice specializing in deep learning for image data analytics. Adam has over 10 years of experience in operational consulting and geospatial data analytics. He has worked for clients from various industries all over the world. Adam also has practical experience in operating drones as a certified VLOS and BVLOS pilot.

Recent advancements in drone technology and machine learning enabled PwC to enhance automatic analysis of photogrammetric and engineering documentation of a construction site. They trained a deep neural network to automatically identify features in aerial images and output a map of objects in the image such as asphalt, cars, and cement. This solution enhances construction progress monitoring, supports litigation and asset management, and provides a competitive edge with fast, scalable, and very accurate analytics. We'll discuss techniques and the main challenges of turning drone images into value.

Level: All
Type: Talk
Tags: Intelligent Video Analytics; Deep Learning and AI; Intelligent Machines and IoT

Day: TBD
Time: TBD
Location: TBD

S7534 - Leveraging Deep Learning and GPUs to Accelerate Surveillance Video to Insight

Tom Edlund Director of Software Engineering Group, Briefcam
Tom Edlund joined BriefCam in 2011 as director of the Software Engineering Group and was promoted to vice president of R&D in 2012. Under his leadership, BriefCam's R&D team has expanded in scope, developing professional, consumer, and mobile versions of the company's core Video Synopsis technology. Previously, Tom led at MindArk the development of Entropia Universe, a 3D platform for online entertainment, communication, and e-commerce. At Modo Paper, he was part of an R&D team developing image analysis applications. Tom graduated from Chalmers University of Technology, Gothenburg, Sweden, with an M.S. in physics, mathematics and computer science. He's an avid hiker and a volunteer at Tech Career, a unique student mentoring program, where he holds weekly C#, ASP .NET, and HTML5 lessons with Ethiopian-Israeli young adults to prepare and integrate them into Israel's thriving high-tech sector.

Law enforcement and enterprise security personnel are increasingly being drowned by the sheer volume of video stream data. We'll discuss Briefcam's general purpose video analysis engine that tackles this problem by using GPUs and deep learning to break live or archived video into structured data with rich metadata. From this metadata, a wide range of applications are possible to manage the barrage of video, such as rapid video review, video search, statistics, and alerts.

Level: All
Type: Talk
Tags: Intelligent Video Analytics; Deep Learning and AI; Federal

Day: TBD
Time: TBD
Location: TBD

S7535 - Potential Field Solutions of the Solar Corona: Converting a PCG Solver from MPI to MPI+OpenACC

Ronald Caplan Computational Scientist, Predictive Science Inc.
Ronald Caplan is a computational scientist whose main interests are in developing and optimizing numerical methods for simulating physics-based models and their implementations in parallel high performance computing environments. His research currently focuses on the continued development and optimization of Predictive Science's magnetohydrodynamic codes used to study the solar corona and heliosphere, as well as providing computational solutions for additional projects.

We'll describe a real-world example of adding OpenACC to a legacy MPI FORTRAN Preconditioned Conjugate Gradient code, and show timing results for multi-node, multi-GPU runs. The code's application is obtaining 3D spherical potential field (PF) solutions of the solar corona using observational boundary conditions. PF solutions yield approximations of the coronal magnetic field structure and can be used as initial/boundary conditions for MHD simulations with applications to space weather prediction. We highlight key tips and strategies used when converting the MPI code to MPI+OpenACC, including linking Fortran code to the cuSparse library, using CUDA-aware MPI, maintaining performance portability, and dealing with multi-node, multi-GPU run-time environments. We'll show timing results for three increasing-sized problems for running the code with MPI-only (up to 1728 CPU cores), and with MPI+GPU (up to 60 GPUs) using NVIDIA K80 and P100 GPUs.

Level: Intermediate
Type: Talk
Tags: Astronomy and Astrophysics; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7536 - New Video Search Capabilities for Public Safety through Intelligent Video Analytics and Deep Learning

Mahesh Sapharishi Chief Technology Officer, Avigilon
Mahesh Saptharishi has more than 17 years of experience developing intelligent video analytics technology as well as software and camera hardware specifically for the security industry. As CTO of Avigilon, he is responsible for driving innovation in the company's product, intellectual property portfolios, identifying strategic technology capabilities, and exploring new business opportunities. Previously, Mahesh served as Avigilon's senior vice president of analytics and data science. He was also president, CTO and co-founder of VideoIQ, Inc. He co-founded and led the core analytics team at Broad Reach Technologies, Inc., where he was vice president of research and development. A pioneer in the surveillance industry, he received his doctorate in machine learning from Carnegie Mellon University, where he was involved in research on autonomously navigating vehicles at the Robotics Institute and the Institute for Complex Engineering Systems. He has also authored multiple peer-reviewed scientific publications, articles, and patents.

For security teams working to ensure public safety, the ability to minimize incident response time and speed forensic investigations is critical. We'll discuss a new end-to-end architecture and video search engine for video data being deployed to solve for this that relies on deep learning and GPUs.

Level: All
Type: Talk
Tags: Intelligent Video Analytics; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7537 - Applying Deep learning and GPUs to Intelligent Video Analytics for Law Enforcement

Chiao-fe Shu Chief Technologist, IBM Intelligent Video Analytics
Chiao-Fe Shu is a chief technologist at IBM Intelligent Video Analytics.

We'll explore solutions IBM is developing to solve for a range of video analysis challenges that law enforcement professionals are tackling from both static and moving cameras (body worn, police car). We'll discuss applications such as advanced video search, face redaction, and facial recognition analytics.

Level: All
Type: Talk
Tags: Intelligent Video Analytics; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7539 - Petascale Molecular Dynamics Simulations from Titan to Summit

James Phillips Senior Research Programmer, University of Illinois
Highly-Rated Speaker
James Phillips is a senior research programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. He has a Ph.D. in physics from the University of Illinois. Since 1999, James has been the lead developer of the highly scalable parallel molecular dynamics program NAMD, for which he received a Gordon Bell Award in 2002. His research interests include improving the performance and accuracy of biomolecular simulations through parallelization, optimization, hardware acceleration, better algorithms, and new methods.

The highly parallel molecular dynamics code NAMD is used on the GPU-accelerated Cray XK7 Blue Waters and ORNL Titan machines to perform petascale biomolecular simulations, including a 64-million-atom model of the HIV virus capsid. In 2007, NAMD was one of the first codes to run on a GPU cluster, and it's now being prepared for the ORNL Summit supercomputer, which will feature IBM Power9 CPUs, NVIDIA GPUs, and the NVLink CPU-GPU interconnect. Learn the opportunities and pitfalls of taking GPU computing to the petascale, along with recent NAMD performance advances and early results from the Summit Power8+/P100 "Minsky" development cluster.

Level: Intermediate
Type: Talk
Tags: HPC and Supercomputing; Computational Chemistry

Day: TBD
Time: TBD
Location: TBD

S7540 - CAE Productivity and GPU Technology

Wayne Mindle Director of Sales & Marketing, CertaSIM, LLC
Highly-Rated Speaker
Wayne Mindle is director of Sales and Marketing at CertaSIM, LLC, the U.S. and Canadian distributor of the IMPETUS Afea Software Suite. Wayne has worked for several major aerospace companies, a consulting company for the FAA, and, prior to his association with CertaSIM, spent 15 years at Livermore Software Technology Corp. as the lead technical sales engineer. He earned his Ph.D. from Northwestern University in the area of applied mechanics, more specifically finite element analysis as applied to the area of nonlinear explicit transient dynamic problems.

We'll present performance results for the NVIDIA Tesla P100. Simulation is the key to greater productivity in many areas of product development and GPU technology plays a crucial role in achieving that goal. We'll use the simulation of a full 3D particle compaction process to compare run times with the NVIDIA Tesla K40. The results are generated from a commercially available nonlinear explicit transient dynamic finite element solver that takes full advantage of GPU technology for parallelization. The commercial software used to create the finite element mesh includes newly developed meshing techniques that make it easy to create the model. We'll also discuss details of the commercially available hardware used to perform the simulation, which has been certified for the P100.

Level: All
Type: Talk
Tags: Computer Aided Engineering; HPC and Supercomputing; Computational Fluid Dynamics; Manufacturing Industries

Day: TBD
Time: TBD
Location: TBD

S7541 - VR for AEC Design Reviews: Accelerating Projects and Reducing Costs

Ron Swidler Principal, The Gettys Group
Ron Swidler received his bachelor's degree from the University of Illinois and is affiliated with the Hospitality Leadership Advisory Board at Kendall College and the DePaul University, Driehaus College of Business Advisory Board. Ron also has served as an adjunct professor at Kendall College School of Hospitality; Institut Paul Bocuse in Lyon, France; Institut Paul Bocuse, Shanghai, China; and Haaga Helia University, Helsinki, Finland. Ron has been a guest lecturer at the Kellogg School of Business and NHTV University in Breda, Netherlands. He has been a keynote speaker at a number of industry conferences, including the HNN Data Conference, BITAC Global, HD Summit, and GastroPro Europe. He also has appeared as a guest on several radio shows and podcasts, including TravelingGlenn, The Savvy Traveler, and more.

Architectural and interior design are often hampered by the disconnect between a designer's vision for a proposed project and the client's ability to truly understand and share in that vision. One of the great promises of VR is to offer true-to-scale immersive exploration of a digital space; and the great promise of photorealistic rendering is to faithfully predict how lighting and materials of a project will look.

Level: All
Type: Talk
Tags: Virtual Reality and Augmented Reality; AEC Industries

Day: TBD
Time: TBD
Location: TBD

S7543 - Effectively Scaling Deep Learning Frameworks to 40 GPUs and Beyond

Andrew Gibiansky Machine Learning Engineer, Baidu SVAIL
Andrew Gibiansky is a machine learning and systems engineer at Baidu Silicon Valley AI Lab (SVAIL). Before working on deep learning at SVAIL, he spent several years working at the intersection of genetics, computer science, and robotics. Prior to that, Andrew graduated with a degree in mathematics from Harvey Mudd College.

A variety of deep learning frameworks now make it simple to train deep neural networks of many types. However, scaling deep learning frameworks to large models with data parallel training on many GPUs remains a challenge, as the default utilities for inter-device and inter-node communication provided by these frameworks are often not optimal. Using examples from several frameworks, we demonstrate that linear strong scaling to many nodes and many devices can be achieved augmenting deep learning frameworks with CUDA-aware MPI allreduce and allgather operations, which allow them to be used in an HPC setting where multi-GPU nodes are augmented with high-speed Infiniband interconnects. We'll show that these operations allow us to quickly train very large speech recognition models.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7544 - Efficient Inference for WaveNet Audio Synthesis Models

Andrew Gibiansky Machine Learning Engineer, Baidu SVAIL
Andrew Gibiansky is a machine learning and systems engineer at Baidu Silicon Valley AI Lab (SVAIL). Before working on deep learning at SVAIL, he spent several years working at the intersection of genetics, computer science, and robotics. Prior to that, Andrew graduated with a degree in mathematics from Harvey Mudd College.

WaveNet is a generative neural network architecture for audio in the time domain. Due to the high sampling frequency of audio signals and the sequential dependencies between timesteps, inference in a WaveNet model is incredibly expensive, and can take many minutes to generate a single second of audio with an unoptimized implementation. We implement custom WaveNet inference kernels and demonstrate that an efficient implementation on a CPU or a GPU can provide faster than realtime audio generation, even though neither platform is perfectly suited to such a task due to the effective lack of parallelism and high compute requirements. To our knowledge, this is the first demonstration that neural audio generation can be done efficiently enough to deploy in a production text-to-speech system.

Level: Advanced
Type: Talk
Tags: Deep Learning and AI; Signal and Audio Processing; Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7545 - High-Speed Robotic Weeding

Lee Redden CTO, Blue River Technology
Lee Redden is Blue River Technology's CTO and co-founder. Lee has a background in robotics, computer vision, and machine learning, where he worked in research labs at NASA's Johnson Space Center, Johns Hopkins Applied Physics Lab, and Stanford. He grew up in Nebraska, where his grandfather and uncle have large farms and he worked summers detassling corn. Lee earned a B.S. from the University of Nebraska-Lincoln with honors and an M.S. from Stanford University, when he went on leave as a Ph.D. student to start Blue River.

Blue River Technology builds "See & Spray" robots for agricultural applications. Its current product sees, detects, optimizes, and acts on 10% of the lettuce produced in the U.S. and is capable of plant-by-plant care. We'll go through the milestones in developing and deploying computer vision systems into a market where high reliability is expected, data is biased, compute platforms need to be rugged, and the system needs to run in real time.

Level: Beginner
Type: Talk
Tags: Intelligent Machines and IoT; Computer Vision and Machine Vision; Deep Learning and AI; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7547 - SVNet: A CNN-Based Object Detection for ADAS

Junhwan Kim CEO, StradVision, Inc.
As CEO of StradVision, Inc., Junhwan Kim's key responsibilities are to further accelerate the company's growth in the automotive sector and create new momentum in other sectors, such as surveillance, robots, and smartphones. Previously, he was CEO of Olaworks, Inc., where he was the technical leader in deploying face analysis software with major smartphone OEMs. He joined Intel Korea as an engineering manager after Intel's acquisition of Olaworks. Junhwan began his career as a senior researcher at Samsung Electronics, representing the company at Device Management Working Group in Open Mobile Alliance. He holds a Ph.D. in computer science from Cornell University and a CFA charter.

We'll discuss how we made a competent CNN-based object detection software for ADAS using GPU hardware. StradVision has developed SVNet, a CNN-based object detection for ADAS. SVNet is robust for bad weather/lighting conditions, small object sizes, and occlusion. We'll describe automotive customers' requests, technical challenges, and our solution using GPU hardware. We'll also discuss the impact of GPUs on achieving significant enhancements to the performance of our algorithm.

Level: All
Type: Talk
Tags: Self-Driving Cars; Deep Learning and AI; Computer Vision and Machine Vision

Day: TBD
Time: TBD
Location: TBD

S7549 - Deep Learning Acceleration of Progress toward Delivery of Fusion Energy

William Tang Principal Research Physicist, Princeton Plasma Physics Laboratory, Princeton University
William Tang of Princeton University is principal research physicist at the Princeton Plasma Physics Laboratory for which he served as chief scientist (1997-2009) and is currently lecturer with rank and title of professor in astrophysical sciences, and member of the executive board for the Princeton Institute for Computational Science and Engineering, which he helped establish and served as associate director (2003-2009). William is internationally recognized for expertise in the mathematical formalism and associated computational applications dealing with electromagnetic kinetic plasma behavior in complex geometries -- with over 200 publications with more than 150 peer-reviewed papers and an "h-index" or "impact factor" of 44 on the Web of Science, including well over 7,000 total citations. William has taught for over 30 years and has supervised numerous Ph.D. students, including recipients of the Presidential Early Career Award for Scientists and Engineers in 2000 and 2005. He is also head of the Intel Parallel Computing Center at the Princeton Institute for Computational Science & Engineering at Princeton University.

Expediting delivery of fusion power -- identified by the 2015 CNN "Moonshots for the 21st Century" series as one of six grand challenges for the modern world -- can be enabled by engaging big-data-driven machine/deep learning predictive methods. Princeton's associated project has access to over a half-petabyte of the EUROFUSION/JET disruption database, and it's new FRNN (Fusion Recurrent Neural Net) code exhibits excellent scaling to nearly 200 GPUs. We'll target extending this exciting trend on NVIDIA's powerful SATURN V to its nearly 1,000 GPUs (124 nodes with eight Pascal P100 GPUs per node) in time for presentation at GTC 2017.

Level: All
Type: Talk
Tags: Deep Learning and AI; Computational Physics; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7550 - Utilizing GPUs for Avionics Maintenance and Safety

Jason Fairey Research Engineer, Analatom Incorporated
Jason Fairey is an AI/data mining engineer at Analatom, working on aircraft maintenance systems using parallel computation with GPU-enabled systems. His experience with the aerospace industry includes working for Rockwell Collins as a software engineer. Jason is also a Ph.D. candidate at Washington State University focusing on graph isomorphism and compression.

Replacing aircraft components at unexpected times can be extremely costly both in terms of financial strain and customer satisfaction. The alternative to this is to repair before the problem arises, but replacing parts or components before they are worn out also wastes considerable amounts of money. To determine when a part is going to fail, technicians make judgements based on the alerts and sensor data that has been provided by the part. Alerts are often too late to be useful and raw sensor data is not easy to consistently review even with detailed visualizations. Two components of Analatom's IMAS (Intelligent Management Analysis System) software solution that are improved with the use of GPUs are complex multivariate comparative analysis of sensor streams and a fast, approximate query system. The query system works by creating packed indexes for each row/column in a database of information. By storing a subset of these indexes in memory on the GPU, each thread can handle a row or column without needing to have access to the entire database. The system built needs to perform large numbers of queries with ever expanding results, so performing this across sections of the GPU allows for more efficient processing.

Level: All
Type: Talk
Tags: Federal; Accelerated Analytics; Algorithms; Manufacturing Industries; Intelligent Machines and IoT
Industry Segments: Aerospace; Defense; Manufacturing

Day: TBD
Time: TBD
Location: TBD

S7551 - Deep Unconstrained Gaze Estimation with Synthetic Data

Shalini De Mello Senior Research Scientist, NVIDIA
Shalini De Mello has been a senior research scientist at NVIDIA Research since 2013. Shalini's interests are in computer vision and machine learning technology for human-computer interaction and smart interfaces. Prior to joining NVIDIA Research, she worked as a senior computer vision engineer at NVIDIA. Her work includes NVIDIA's shipping products for head pose tracking, hand gesture recognition, face detection, video stabilization, and libraries for the development for computer vision applications on mobile platforms. She received doctoral and master's degrees in electrical and computer engineering from the University of Texas at Austin in 2008 and 2004, respectively.

Gaze tracking in unconstrained conditions, including inside cars, is challenging where traditional gaze trackers fail. We've developed a CNN-based algorithm for unconstrained, head-pose- and subject-independent gaze tracking, which requires only consumer-quality color images of the eyes to determine gaze direction, and points along the boundary of the eye, pupil, and iris. We'll describe how we successfully trained the CNN with millions of synthetic photorealistic eye images, which we rendered on the NVIDIA GPU for a wide range of head poses, gaze directions, subjects, and illumination conditions. Among appearance-based gaze estimation techniques, our algorithm has best-in-class accuracy.

Level: Intermediate
Type: Talk
Tags: Computer Vision and Machine Vision; AI for In-Vehicle Applications

Day: TBD
Time: TBD
Location: TBD

S7553 - Exploring Sparsity in Recurrent Neural Networks

Sharan Narang Researcher, Baidu
Sharan Narang is a researcher at Baidu's Silicon Valley AI Lab (SVAIL), working in the systems team. He has played an important role in improving the performance and programmability of the deep learning framework used by researchers at SVAIL. Sharan's research work has focused on reducing the memory requirement of deep learning models. He has explored techniques like pruning neural network weights and quantization to achieve this goal. He has also proposed DSD training flow that improved the accuracy of deep learning applications by ~5%. Prior to Baidu, Sharan was working on next-generation mobile processors at NVIDIA.

Recurrent neural networks are widely used to solve a variety of problems. As the quantity of data and the amount of available compute have increased, model sizes have also grown. We'll describe an approach to reduce the parameter count of RNNs using a simple pruning schedule without increasing the training time. The reduction in parameters achieves two goals. It helps reduce the size of the neural network, allowing it to be deployed on mobile and embedded devices. It also helps speed up evaluation time for inference. We'll demonstrate how this technique works for vanilla RNNs and the more complex gated recurrent units.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7554 - Deep Learning Application Development on Multi-GPU/ Multi-Node Environment

Toshiki Sakai Data Scientist, NTT DOCOMO, INC.
Toshiki Sakai is a data scientist at NTT DOCOMO, one of the bigget mobile phone operators in Japan, where he works on data mining, computer vision, and multimedia software applications. Prior to joining NTT DOCOMO, Toshiki completed his M.A. at the University of Tokyo, where he achieved a distinction for his research in the field of vision science -- understanding visual information processing system and its relation to cognition, action, and the brain.

We'll show a brief overview of our deep learning applications such as image recognition and taxi demand forecasts and how we have accelerated our development using NVIDIA Docker, the NVIDIA DGX-1 AI supercomputer, and tens of GPU servers. As deep learning applications become widespread, it becomes more essential for engineers to quickly adapt deep learning to new data and to efficiently seek optimal configurations. To improve the development speed by engineers on the shared GPU resources, we developed a job management system, which provides the separated learning environment for each engineer using NVIDIA Docker and queuing functions on the multi-GPU/multi-node system. This system helps us improve our productivity and create more sophisticated solutions to offer better services.

Level: All
Type: Talk
Tags: Deep Learning and AI; Accelerated Analytics

Day: TBD
Time: TBD
Location: TBD

S7555 - The Virtual Frontier: Computer Graphics Challenges in Virtual Reality

Morgan McGuire NVIDIA Research, NVIDIA
Morgan McGuire is the author of "The Graphics Codex" and co-author of "Computer Graphics: Principles & Practice and Creating Games." He cochaired the I3D'08, I3D'09, NPAR'10, and HPG'17 conferences, and was the founding editor and editor-in-chief of the Journal of Computer Graphics Techniques. Morgan contributed to many commercial products, including NVIDIA GPUs, the Unity game engine, and the game series "Titan Quest," "Marvel Ultimate Alliance," and "Skylanders." He is a professor of computer science at Williams College and has worked with NVIDIA since 2009. He holds a B.S. and M.Eng. from MIT and an M.S. and Ph.D. from Brown University.

Video game 3D graphics are approaching cinema quality thanks to the mature platforms of massively parallel GPUs and the APIs that drive them. Consumer head-mounted virtual reality is a new domain that poses exciting new opportunities and challenges in a wide-open research area. We'll present the leading edge of computer graphics research for VR across the field. It highlights emerging methods for reducing latency, increasing frame rate and field of view, and matching rendering to both display optics and the human visual system while maximizing image quality.

Level: All
Type: Talk
Tags: Virtual Reality and Augmented Reality; Rendering and Ray Tracing
Industry Segments: Media & Entertainment; Games; Higher Education / Research

Day: TBD
Time: TBD
Location: TBD

S7557 - Miovision's Deep Learning Traffic Analytics System for Real-World Deployment

Kurtis McBride CEO, Miovision Technologies Inc.
Kurtis McBride co-founded Miovision Technologies Inc. in 2005 and serves as its chief executive officer. A leader in the computer vision field, Kurtis is a driven entrepreneur responsible for Miovision Technologies' sales and technological vision. Previously, he held positions in product management, product marketing, product development, and software design at Nortel Networks and Cypress Semiconductor.

Miovision generates traffic analytics on over 16,000 hours of video every week from over 50 countries around the world using an NVIDIA GPU cloud-based system. Using surveillance-quality video, our system combines a deep convolutional neural network (CNN) with quality assurance agents, who review, verify, and correct, as needed, the CNN results. Using this hybrid approach, we're able to provide customers with accurate traffic analytics, such as traffic volume, class, and movements, and apply agent feedback to identify which real-world environmental conditions, lighting conditions, or perspectives contribute to CNN mislabeling or missed vehicles, pedestrians, or bicycles. Using the human corrections, Miovision can retrain the CNN to continuously improve its accuracy. We'll describe our traffic analytics pipeline and the use of sparse CNN representations to achieve robust state-of-the-art accuracy at faster than real-time performance.

Level: All
Type: Talk
Tags: Intelligent Video Analytics; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7558 - Porting and Optimization of Search of Neighbor-Particle by Using OpenACC

Takaaki Miyajima Researcher, Japan Aerospace Exploration Agency
Takaaki Miyajima is a researcher for the Japan Aerospace Exploration Agency. Takaaki's research interests include parallel programming, design methodology for heterogeneous platform, and algorithm implementation on GPU or FPGA. He received his Ph.D. from Keio University, Japan in 2015.

MPS method is a sort of particle method (not a stencil computation) used for computational fluid dynamics. "Search of neighbor-particle" is a main bottleneck of MPS. We show our porting efforts and three optimizations of search of neighbor-particle by using OpenACC. We evaluate our implementations on Tesla K20c, GeForce GTX 1080, and Tesla P100 GPUs. It achieved 45.7x, 96.8x, and 126.1x times speedup compared with single-thread Ivy-bridge CPU.

Level: Intermediate
Type: Talk
Tags: Computational Fluid Dynamics; Computer Aided Engineering

Day: TBD
Time: TBD
Location: TBD

S7559 - Digital Driving License

Jorrit Kuipers CEO, robotTUNER
Jorrit Kuipers is the Founder of robotTUNER and Founder of Green Dino. He is a member of Dutch standardization committee Robotics, Vice president and treasurer of the Dutch Academy of Technology & Innovation, and a PhD candidate Man Machine Systems Delft Technical University. He has received the following awards: National ICT Award small business 2004, National insurance innovation award 2015, and FIA road safety innovation award 2016.

We'll present the project 'digital driving license'. The project aims at a driving license for autonomous vehicles. We believe that standardization of testing and assessment of the autonomous operating systems of robot vehicles is necessary to accelerate the use of autonomous machines in pubic space. We'll suggest using the methodology my company successfully used for training and assessment of human drivers (since 2003) with 3D simulation. Data of more than 100.000 human drivers gives insight in driving skills and styles. This data is useful as a reference for performance measurement of robot vehicles. The methodology is also applicable for human robot interaction in other domains. We proved our 3D simulations reduced accident involvement of young human drivers with 59%. In the Nederlands we initiate an ISO certification project called 'digital driving license'. We invite stakeholders to join our initiative. We use DRIVEPX2 for realtime assessment of driving skills of an autonomous vehicle while driving on road. We run a 3D simulation on the DRIVEPX2 with an exact copy of the real world. Sensor data is imported and classified in the 3D simulation. Our AI constructs the actual traffic situation and follows the decision process of the self driving software. The performance on driving tasks is measured and compared with performance data of human drivers or peers. Reports are constructed showing strength and weaknesses of the assessed software. The realtime assessment with DRIVEPX2 is also used to validate our virtual test battery. We are building a virtual library with use cases to assess the software of autonomous vehicles without going on road. To determine the transfer to the real world we play back the scenario's in the real world. With help of the real time assessment we measure the performance on road and compare it with the performance in the virtual test environment. We did the same for humans and proved valid correlations.

Level: All
Type: Talk
Tags: Self-Driving Cars; AI for In-Vehicle Applications
Industry Segments: Automotive

Day: TBD
Time: TBD
Location: TBD

S7560 - Machine Learning Applications in the Radiology Department and Beyond

Synho Do Assistant Medical Director, Massachusetts General Hospital and Harvard Medical School
Dr. Synho Do is director of the Laboratory of Medical Imaging and Computation and an assistant professor of radiology at Harvard Medical School and assistant medical director for Advanced Health Technology Engineering, Research, and Development within the Massachusetts General Physicians Organization. As a NIH T32 fellow, Synho received clinical training in the Cardiac MR PET CT program. He then built his team of scientists, clinicians, and mentors as an instructor at Massachusetts General Hospital, Harvard Medical School. His research interests are healthcare data machine learning, high performance computing, nonlinear system identification, complex system modeling, and clinical workflow understanding. He has an M.S. in electrical engineering (cryptosystem analysis) and a Ph.D. in biomedical engineering (nonlinear biological system analysis).

Learn about state-of-art and practical medical image machine learning projects, which will be tested in hospitals. Presently, high performance computing systems are the most crucial components of the machine learning system. They are relatively inexpensive and very efficient tool in the medical imaging. In addition, there are many open-source algorithms, published network topologies, and pre-trained parameters of neural network. You can also find solutions to error messages or tough questions through online communities. These novel tools and techniques are a great opportunity for people who are in medical imaging, bioinformatics, and radiology practices to expand their horizons. We'll discuss three topics: (1) Projects and applications that can be developed and easily implemented. (2) Challenging projects with current technologies and how to overcome them. (3) Exciting new fields that we can tackle together.

Level: All
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Medical Imaging
Industry Segments: Healthcare & Life Sciences; Higher Education / Research

Day: TBD
Time: TBD
Location: TBD

S7561 - GPU Acceleration of a Large Eddy Simulation Software for High-Pressure, Supercritical Reacting Flows

Ramanan Sankaran Computational Scientist, Oak Ridge National Laboratory
Ramanan Sankaran is a computational scientist at the Oak Ridge National Laboratory. He received his Ph.D. in mechanical engineering from the University of Michigan, Ann Arbor. Ramanan performs numerical studies of reacting and multiphase flows using high performance computing to understand the fundamental characteristics of fluid flows in engineering applications. He is an expert in computational combustion and engineering with more than 17 years of experience in the modeling and simulation of combustion. His research focuses on numerically studying various combustion phenomena such as auto-ignition, turbulent premixed, and non-premixed combustion and combustion chemical kinetics. He also develops scalable and massively parallel software tools for combustion and engineering simulations and analysis of large simulation datasets.

RAPTOR is a massively parallel flow solver for the simulation of turbulent combustion. In preparation for the upcoming Summit system at the Oak Ridge Leadership Computing Facility, a performance portable and GPU-ready version of RAPTOR has been developed. A combination of programming models have been used to convert the distributed memory parallel code to a hybrid parallel code with multiple levels of parallelism. Major performance-critical kernels have been reimplemented in C++ using the Kokkos programming model. The main flow solver has been accelerated using OpenMP compiler directives. We'll present the performance characteristics of RAPTOR on the IBM Minsky system for a high-pressure, supercritical reacting flow problem with applications in the aerospace and energy industry.

Level: Intermediate
Type: Talk
Tags: Computational Fluid Dynamics; HPC and Supercomputing; Computer Aided Engineering

Day: TBD
Time: TBD
Location: TBD

S7562 - Deep Learning to Enable Real-Time Gravitational Wave and Multimessenger Astrophysics

Daniel George Scientist, University of Illinois at Urbana-Champaign, National Center for Supercomputing Applications
Daniel George is a Ph.D. student in astronomy, pursuing the computational science and engineering concentration, at the University of Illinois at Urbana-Champaign. He obtained his bachelor's degree in engineering physics from IIT Bombay. He is currently a research assistant in the Gravity Group at the National Center for Supercomputing Applications and a member of the LIGO collaboration working at the interface of deep learning, high performance computing, and gravitational wave and multimessenger astrophysics. His long-term interests lie in applying cutting-edge computer science and technology, especially machine learning and artificial intelligence, to accelerate discoveries in the fundamental sciences.
Eliu Huerta Gravity Group Leader, University of Illinois at Urbana-Champaign, National Center for Supercomputing Applications
Eliu Huerta is the head of the Gravity Group at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign. Eliu obtained a master's degree in applied mathematics and theoretical physics and a Ph.D. in theoretical astrophysics at the University of Cambridge, U.K. His work is at the interface of analytical and numerical general relativity, and on the exploitation of advanced cyberinfrastructure facilities to create scenarios for multi-messenger astrophysics. He is a member of the LIGO Scientific Collaboration, the NANOGrav Consortium, and the Dark Energy Survey.

The aLIGO Advanced Laser Interferometer Gravitational Observatory went on line last year and very rapidly produced data confirming Einstein's theory of gravitational waves. This discovery and the success of the detection device open the door for another dimension to be added to and combined with other electromagnetic detection devices (telescopes, radio telecopes, etc.) to dramatically increase the potential to understand the workings of deep space and astronomical phenomena at the origins of the universe. The project used data produced by the CACTUS HPC simulation to produce datasets that were used to train a DNN using the MXNet framework. The results were that the prediction accuracy increased over classical waveform analysis and reduced the number of processors from hundreds of CPUs to one GPU, where the prediction was achieved with a latency of 1 millisecond. The work was done on the BlueWaters supercomputer and at the Innovation Lab at NCSA. The reduction in the "pipeline size" (number of CPUs needed to make a detection) and the improved latency open up the potential for multi-messenger astrophysics, where an observation that is "heard" with the gravitational wave detector can be used to steer a detector in the visible or EM spectrum where to look.

Level: All
Type: Talk
Tags: HPC and Supercomputing; Deep Learning and AI; Astronomy and Astrophysics; Computational Physics
Industry Segments: Higher Education / Research

Day: TBD
Time: TBD
Location: TBD

S7563 - Deep Patient: Predict the Medical Future of Patients with Deep Learning

Riccardo Miotto Research / Data Scientist, Icahn School of Medicine at Mount Sinai, New York
Riccardo Miotto is a research and data scientist in the Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai in New York and a member of the Institute for Next Generation Healthcare directed by Dr. Joel Dudley. Riccardo's work encompasses the design of algorithms for information retrieval, machine learning, and data mining applied to clinical data for personalized medicine and medical search engines. Previously, Riccardo worked on clinical trial search engines through free-text eligibility criteria processing and machine learning applied to music information retrieval, in the particular semantic discovery and recommendation, automatic tagging, and cover identification. He obtained his Ph.D. in information engineering from the University of Padova, Italy.
Joel Dudley Associate Professor, Icahn School of Medicine at Mount Sinai, New York
Joel Dudley is a recognized leader in applying biomedical big data to healthcare and drug discovery. He currently holds positions as Associate Professor of Genetics and Genomic Sciences and Director of Biomedical Informatics at the Icahn School of Medicine at Mount Sinai. He also directs the newly formed Institute for Next Generation Healthcare at Mount Sinai. Prior to Mount Sinai, he held positions as Co-founder and Director of Informatics at NuMedii, Inc., one of the first companies to apply big data to drug discovery, and Consulting Professor of Systems Medicine in the Department of Pediatrics at Stanford University School of Medicine. His work is focused on developing and applying pioneering computational methods to bring about a next generation of medicine that leverages advances in diagnostics, wearables and digital health to enable new approaches to precision medicine and scientific wellness.

Precision medicine initiatives bring tremendous opportunities to speed up scientific discovery and promote quality improvement in medicine. However, it also raises big challenges in dealing with massive data from heterogeneous sources, such as electronic health records (EHRs), -omics, and wearables. Traditional data mining and statistical learning methods tend to favor clean and structured data, which may not be able to effectively utilize the rich information embedded in biomedical data. The latest breakthrough in deep learning technologies provides a unique opportunity to retrieve information from complex and heterogeneous sources. We'll review advances in deep learning applied to precision medicine and next-generation healthcare, with a special focus on Deep Patient, a general-purpose patient representation from EHRs that facilitates clinical predictive modeling and medical analysis.

Level: Intermediate
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7565 - Distributed Deep Learning on AWS Using MXNet

Joseph Spisak Sr. Mgr - Product Management , Amazon
Joseph Spisak has experience driving strategies and technical/business engagements around machine learning-based cloud workloads such as computer vision, natural language processing, video summarization and analysis, and speech recognition. He's led cross teams, business/technical engagements with tier 1 customers, managed P&Ls at the $200 million-plus level and built relationships from engineering teams all the way to C-level executives.
Mu Li Sr. Applied Scientist, Amazon
Mu Li manages app submissions, maintaining and improving developer applications at Amazon.

Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding, and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer-friendly deep learning frameworks. During this tutorial, members of Amazon's machine learning team will provide a short background on deep learning, focusing on relevant application domains and an introduction to using the powerful and scalable deep learning framework MXNet. You'll gain hands-on experience targeting a variety of applications, including computer vision and recommendation engines, as well as exposure to how to use preconfigured deep learning AMIs and CloudFormation templates to help speed your development.

Level: Advanced
Type: Talk
Tags: Deep Learning and AI; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7567 - Practical VR for the Scivis Community

Kees van Kooten Scientific Visualization Software Engineer, NVIDIA
Kees van Kooten is a scientific visualization software engineer at NVIDIA, where he works with researchers in many fields to advance the state of visualization for high performance computing. Previously, Kees spent a large part of his professional life working in the video games industry, starting with PlayStation 3 development at Playlogic Game Factory in the Netherlands and later at Havok, providing real-time game physics solutions like the Havok FX physics effects package used by Rainbow Six: Siege. Intermittently, he has also worked in the medical industry doing volume and protein visualization. Kees holds an M.S. in computer graphics and visualization from the University of Eindhoven.

Until recently, immersive visualization with VR was an extremely useful but rather costly and impractical way to explore data in many scientific areas. Affordable consumer VR solutions provide an enticing answer to that problem, but a general software solution enabling this hardware for scientific visualization does not exist yet. Interaction in VR has requirements that differ greatly from a traditional 2D interface, and the rendering performance requirements are very strict. Supporting VR in your custom rendering engine or visualization tool of choice therefore requires a large engineering effort, for which there are typically no resources available. We'll provide a way to solve this problem: it shows by example how a scientific visualization tool such as Paraview (or any other) can be connected to a fully fledged game engine (like Unreal), to provide both the data processing capabilities of the former, with the VR-specific rendering pipeline of the latter. It will allow you to send any dataset as seen in the scientific visualization tool directly to the VR environment. This enables you to set up your VR environment and quickly create/prototype interactions within that world, while still being able to use your scientific visualization tool as you have always been used to do.

Level: All
Type: Talk
Tags: In-Situ and Scientific Visualization; Virtual Reality and Augmented Reality

Day: TBD
Time: TBD
Location: TBD

S7569 - High-Performance Data Loading and Augmentation for Deep Neural Network Training

Trevor Gale Student, Northeastern University
Trevor Gale is a computer engineering student at Northeastern University. His interests include high performance computing, machine learning, and general-purpose graphics processors. He has previously worked on scalable deep neural network training on many-GPU distributed systems, developed graph algorithms for GPU clusters, and built tools to study the memory reliability of GPUs. In 2016, Trevor interned at Samsung Research America on the General-Purpose Acceleration Framework team, where he worked on the dMath distributed mathematics library and the Expresso deep learning framework.
Steven Eliuk Project Lead, Samsung
Steven Eliuk is a graduate of the University of Alberta Computing Science department, where he completed his Ph.D. in distributed algorithms in applied sciences. He was awarded numerous awards from the Natural Science and Engineering Council of Canada, Alberta Ingenuity fund, University of Alberta, and more. Previous experience includes IBM, where he was director at the Servier Virtual Cardiac Center, which focused on reducing radiation and enhancing CT scans in pediatrics. He currently leads a distributed algorithms group at Samsung Electronics, where they have focused on primitive acceleration for machine learning since 2013. The team has produced the most performant version of distributed Caffe while considering numerical stability and providing strict mathematic when possible, all while providing better accuracy and no loss of precision when scaling from one to 128 GPUs.

Next-generation GPUs have revealed that data loading and augmentation can be a major bottleneck to accelerating deep neural network training on many-GPU distributed systems. This work presents the design and implementation of a high-performance data loading and augmentation system for the Expresso deep learning framework developed by Samsung. Our system leverages multiple levels of parallelism and automatic runtime performance tuning to achieve speedups of 15.5% on average across our experiments.

Level: Advanced
Type: Talk
Tags: Deep Learning and AI; HPC and Supercomputing

Day: TBD
Time: TBD
Location: TBD

S7570 - vGPU-Enabled Linux VDI for the Masses

Mike Bantz Virtualization Engineer, DigitalGlobe
Mike Bantz is a senior virtualization engineer at DigitalGlobe.

DigitalGlobe is in the multi-year process of a company-wide effort to further enable its workforce and its customers to access the world's largest archive of satellite imagery quickly and easily from almost anywhere in the world. We're now several hundred active users into an initiative that has positioned DigitalGlobe's Linux virtual desktop environment as a core requirement for onboarding development teams, rather than just a complement to their Windows VDI sessions. We've gone all-in with Dell and NVIDIA to provide all Linux VDI users with a vGPU-backed desktop to better serve the needs of a rapidly widening array of skillsets and workloads across the enterprise. We'll describe our challenges, our frustrations, our somewhat unexpected popularity and growth, and what has become our continued success.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Data Center and Cloud Computing
Industry Segments: IT Services; Software; Other

Day: TBD
Time: TBD
Location: TBD

S7571 - High-Performance Deep Learning on Embedded Devices MXNet

Aran Khanna Software Engineer, Amazon Web Services
Aran Khanna is a software engineer working on machine learning infrastructure and applications for embedded devices in the AWS Deep Learning group. Aran is a recent graduate from Harvard College, where he studied computer science and mathematics and conducted research on digital privacy with Harvard's Institute for Quantitative Social Science.

Learn how to compile and run an optimized version of the MXNet deep learning framework for various embedded (IoT) devices, as well as see the wide range of exciting applications that running deep-network inference in near-realtime on "edge" devices opens up. Specifically, we'll be showing performance numbers for a variety of deep learning models based in MXNet running on Raspberry Pis as well as TK1 processors, demonstrating the massive efficiency gains on embedded devices MXNet yields over comparable frameworks. We'll then demo the power of real-time image processing via deep learning models with an example application walkthrough. Finally, we'll demonstrate how to use AWS IoT services to massively augment the flexibility and reliability of the models running in our example application.

Level: Intermediate
Type: Talk
Tags: Intelligent Machines and IoT; Performance Optimization; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7572 - Extending Mahout-Samsara Linear Algebra DSL to Support GPU Clusters

Suneel Marthi Senior Principal Engineer , Redhat Inc
Suneel Marthi is a senior principal engineer in the Office of Technology at Redhat Inc. He is a member of Apache Software Foundation and a committer and PMC member on Apache Mahout, Apache Pirk, and Apache OpenNLP, as well as a regular contributor to several big data projects like Apache Flink, Apache Streams, and Apache PredicitionIO. Suneel is a regular speaker at several conferences and has spoken in the past at Flink Forward 2015 and 2016, Apache Big Data Europe and North America, and Hadoop Summit Europe 2014.
Trevor Grant Open Source Analytics Technical Evangelist Committer, Apache Mahout Project, IBM
Trevor Grant is a data scientist and committer on the Apache Mahout, and contributor on Apache Streams (incubating), Apache Zeppelin, and Apache Flink projects and Open Source Technical Evangelist at IBM. Trevor holds an M.S. in applied math and an MBA from Illinois State University. He is an organizer of the newly formed [Military] Vets in Big Data and has presented at Flink Forward, ApacheCon, Apache Big Data, and other meetups nationwide.

Data scientists love tools like R and Scikit-Learn, as they offer a convenient and familiar syntax for analysis tasks. However, these systems are limited to operating serially on datasets that can fit on a single node and don't allow for distributed execution. Mahout-Samsara is a linear algebra environment that offers both an easy-to-use Scala DSL and efficient distributed execution for linear algebra operations. Data scientists transitioning from R to Mahout can use the Samsara DSL for large-scale data sets with familiar R-like semantics. Machine learning and deep learning algorithms built with the Mahout-Samsara DSL are automatically parallelized and optimized to execute on distributed processing engines like Apache Spark and Apache Flink accelerated natively by CUDA, OpenCL, and OpenMP. We'll look at Mahout's distributed linear algebra capabilities and demonstrate an EigenFaces classification using Distributed SSVD executing on a GPU cluster. Machine learning practitioners will come away from this talk with a better understanding of how Samsara's linear algebra environment can help simplify developing highly scalable, CPU/GPU-accelerated machine learning and deep learning algorithms by focusing solely on the declarative specification of the algorithm without having to worry about the implementation details of a scalable distributed engine or having to learn to program with native math libraries.

Level: Intermediate
Type: Talk
Tags: Accelerated Analytics; Deep Learning and AI; Algorithms

Day: TBD
Time: TBD
Location: TBD

S7574 - Streaming 10K Video Using GPUs and the Open Projection Format

Sean Safreed Cofounder/Product Owner, Pixvana
Highly-Rated Speaker
Sean Safreed is a veteran of the computer graphics industry and has been a product manager, strategist, developer, and marketing manager for leading software packages used by hundreds of thousands of professional filmmakers and motion graphics designers. Among his software credits are Commotion, Knoll Light Factory, QuickTime VR, and the industry-leading desktop color correction suite Magic Bullet. Prior to Pixvana, Sean co-founded Red Giant, which has grown to offer more than 50 products with a team that spans the United States and Canada. In addition to being very successful in the film and video industry for its product offering and user experience, Red Giant has financed and produced small independent films that have gone on to win awards, serve as tutorials, and inspire the storytelling community. Before founding Red Giant in 2002, Sean spent the '90s on the Apple QuickTime team and at Silicon Graphics, where he was part of the Open GL product team.

Pixvana has developed a cloud-based system for processing VR video that can stream up to 12K video at HD bit rates. The process is called field-of-view adaptive streaming (FOVAS). FOVAS converts equirectangular spherical format VR video into tiles on AWS in a scalable GPU cluster. Pixvana's scalable cluster in the cloud delivers over an 80x improvement in tiling and encoding times. The output is compatible with standard streaming architectures and the projection is documented in the Open Projection Format. We'll cover the cloud-architecture, GPU processing, Open Projection Format, and current customers using the system at scale.

Level: Intermediate
Type: Talk
Tags: Video and Image Processing; Virtual Reality and Augmented Reality; Media and Entertainment
Industry Segments: Cloud Services; Media & Entertainment

Day: TBD
Time: TBD
Location: TBD

S7575 - From Cracks to Hard Hats: Focusing on Industrial Computer Vision

Sean True Director of Machine Learning, Smartvid.io, Inc.
Sean True is director of Machine Learning at Smartvid.io. Prior to Smartvid.io, Sean worked at Semantic Machines, Ab Initio, and Interactions Corporation. He co-founded Inboxer and Audiotrieve, and was director of Development/Multimedia Technology at Dragon Systems. He's excited about machine learning for text, images, and speech, and has worked on both biological and machine vision as a Ph.D. candidate at MIT. His interests include fishing, welding, BBQ, and embedded systems, preferably in creative combinations.

We'll present, in a case study driven presentation, specific examples of how GPU-enabled deep neural networks are powering new methods for analyzing the content of photos and videos from industrial contexts. First, we'll present a collaboration between Smartvid.io and Engineering News-Record, the leading publication in the architecture, engineering, and construction vertical. This ongoing initiative leverages computer vision techniques and semantic approaches to help identify and indicate safe and unsafe situations in jobsite photos. Second, we'll present a collaboration with Arup, a London-based engineering firm, on the use of specific classifiers to localize and measure cracks and related defects in infrastructure.

Level: All
Type: Talk
Tags: AEC Industries; Deep Learning and AI; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7577 - Data Science Bowl Lung Challenge

Bram van Ginneken Professor of Functional Image Analysis , Radboud University Medical Center
Bram van Ginneken is professor of functional image analysis at Radboud University Medical Center. He is chair of the Diagnostic Image Analysis Group within the Department of Radiology and Nuclear Medicine. Bram also works for Fraunhofer MEVIS in Bremen, Germany, and is one of the founders of Thirona, a company that provides quantitative analysis of chest CT scans. Bram studied physics at the Eindhoven University of Technology and at Utrecht University. In 2001, he obtained his Ph.D. at the Image Sciences Institute on computer-aided diagnosis in chest radiography. From 2001 through 2009, he led the Computer-Aided Diagnosis group at ISI, where he still has an associated faculty position. He has authored or co-authored over 150 publications in international journals. He is also associate editor of IEEE Transactions on Medical Imaging, a member of the editorial board of Medical Image Analysis, and is involved in organizing challenges in medical image analysis.

Deep learning is currently overhauling the field of medical image analysis and computer-aided diagnosis. Recent results in various areas show that deep networks that analyze the contents of medical images, trained with large amounts of data, obtain results close to or better than human experts for diagnostic tasks in radiology, pathology, ophthalmology, and dermatology. One particular area is the analysis of chest computed tomography (CT) scans. This is of particular interest because screening with low-dose CT for lung cancer is currently being implemented on a large scale in the Unitied States and other countries, after large studies have shown that this is the most promising strategy to reduce the number of deaths due to lung cancer, by far the largest cancer killer. Screening for lung cancer will produce many millions of CT scans that under current guidelines would have to be analyzed by radiologists. Automation could streamline and improve that process, and reduce the high costs associated with screening. We'll show the background of CT image analysis, explain how clinical experts read CT scans following the current guidelines, and show results from deep learning, in particular

Level: Beginner
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Video and Image Processing; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7578 - Accelerating your VR Applications with VRWorks

Cem Cebenoyan Director of Engineering, NVIDIA
Cem Cebenoyan heads up the Game Engines and Core Tech team at NVIDIA, focusing on working with game engines and building solutions to next-gen rendering challenges for games. He's been at NVIDIA for (practically) his whole life, leading teams in graphics, games, computer vision, and automotive.
Edward Liu Developer Technology Engineer, NVIDIA
Edward Liu is a developer technology engineer at NVIDIA as well as a huge graphics enthusiast. He graduated from Georgia Institute of Technology as a graduate student, where he also worked as a research assistant on fluid simulation, optimizations and rendering. Now at NVIDIA, he does research and development on exploring cutting-edge rendering techniques that could improve the visual quality and performance of future graphics applications.

Across graphics, audio, video, and physics, the NVIDIA VRWorks suite of technologies helps developers maximize performance and immersion for VR applications. We'll explore the latest features of VRWorks, explain the VR-specific challenges they address, and provide application-level tips and tricks to take full advantage of these features. Special focus will be given to the details and inner workings of our latest VRWorks feature, Lens Matched Shading, along with the latest VRWorks integrations into Unreal Engine and Unity.

Level: Intermediate
Type: Talk
Tags: Virtual Reality and Augmented Reality; Game Development

Day: TBD
Time: TBD
Location: TBD

S7579 - Towards Practical Problems in Deep Learning for Radiology Image Analysis

Quanzheng Li Associate Professor, Massachusetts General Hospital
Dr. Quanzheng Li is an associate professor of radiology at Massachusetts General Hospital, Harvard Medical School, where he is the director of the image reconstruction and artificial intelligent program in the Gordon Center and a principal investigator at the Center for Clinical Data Science. Quanzheng is the recipient of 2015 IEEE Nuclear and Plasma Sciences Society early achievement award, an associate editor of IEEE Transaction on Image Processing, and an editorial board member of Theronostics. His research interests include image reconstruction and analysis methods in PET, SPECT, CT, and MRI, as well as data science in health and medicine.

There have been growing interest in applying deep learning for radiology image analysis such as tissue characterization, which is a key component of computer-aided diagnosis systems used for automatic lesion detection and further clinical planning. However, in practice the development of a robust and reliable deep learning model for computer-aided diagnosis is still highly challenging due to the combination of the high heterogeneity in the medical images and the relative lack of training samples. Specifically, annotation and labeling of the medical images is much more expensive and time-consuming than other applications and often involves manual labor from multiple domain experts. We'll propose a multi-stage, self-paced learning framework using a convolutional neural network (CNN) to classify computed tomography image patches. The key contribution is that we augment the size of training samples by refining the unlabeled instances with a self-paced learning CNN. By implementing the framework on the high performance computing server of the NVIDIA DGX-1 machine, we obtained the experimental result, showing that the self-pace boosted network consistently outperformed the original network even with very scarce manual labels. Such performance gain is obtained by increasing the computational load, which is becoming feasible thanks to the computational power provided by DGX-1, in exchange of human labor. Applications with limited training samples such as medical image analysis can benefit from using the proposed framework.

Level: All
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7580 - ZipML: Faster Machine Learning via Low-Precision Communication and Computation

Ce Zhang Assistant Professor, ETH Zurich
Ce Zhang is an assistant professor in computer science at ETH Zürich. He believes that by making data—along with the processing of data—easily accessible to non-CS users, we have the potential to make the world a better place. His current research focuses on building data systems to support machine learning and help facilitate other sciences. Before joining ETH, Ce was advised by Christopher Ré. He finished his Ph.D. round-tripping between the University of Wisconsin-Madison and Stanford University, and spent another year as a postdoctoral researcher at Stanford. His Ph.D. work produced DeepDive, a trained data system for automatic knowledge-base construction. He participated in the research efforts that won the SIGMOD Best Paper Award (2014) and SIGMOD Research Highlight Award (2015), and was featured in special issues including "Best of VLDB" (2015) and Nature (2015).
Dan Alistarh Assistant Professor, IST Austria and ETH Zurich
Dan Alistarh is an Assistant Professor at IST Austria, currently visiting ETH Zurich on an SNF Ambizione Fellowship. Previously, he was a Researcher at Microsoft Research, Cambridge, UK, and a Postdoctoral Associate at MIT CSAIL. He received his PhD from the EPFL, under the guidance of Prof. Rachid Guerraoui. His research focuses on distributed algorithms and concurrent data structures, and spans from algorithms and lower bounds, to practical implementations.

We'll present new techniques for training machine learning models using low-precision computation and communication. We'll start by briefly outlining new theoretical results proving that, surprisingly, many fundamental machine learning tools, such as dense generalized linear models, can be trained end-to-end (samples, model, and gradients) using low precision (as little as one bit per value), while still guaranteeing convergence. We'll then explore the implications of these techniques with respect to two key practical applications: multi-GPU training of deep neural networks, and compressed sensing for medical and astronomical data.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Performance Optimization

Day: TBD
Time: TBD
Location: TBD

S7582 - Synergistic Decisions Through Multistream Deep Learning

William Rorrer Program Manager, Harris
Will Rorrer has worked with the Harris Corporation for over 15 years providing management and guidance to key business units. These areas include the Night Vision operations team, the Jagwire program for streaming, cataloging and analyzing full-motion video, and leading research and development on deep learning tools and applications. Throughout his career, Will has been honored to support the National Geospatial Intelligence Agency and other parts of the Department of Defense in using high-tech capabilities for solving global security problems.

We'll walk through several use cases Harris has developed to illustrate the benefit of harnessing multiple input sources for deep learning using NVIDIA GPUs and discuss the implications of this research in the wider remote sensing community. Deep learning is rapidly being applied to virtually every industry that has a "big data" problem. In the realm of remote sensing, deep learning has been applied to automatic feature extraction for a variety of individual data types such as electro-optical panchromatic and multi-spectral imagery, point clouds, and video. One of the less explored benefits of deep learning application to remote sensing data is the ability to incorporate multiple streams of data into the same neural network. When leveraging multiple modalities in the same model, a synergistic decision can be made from the data that reveals more information than either of the individual data types can provide alone.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Accelerated Analytics; Federal

Day: TBD
Time: TBD
Location: TBD

S7585 - Building the World's First AI for Retail Banking, or, How to Do Deep Learning in a 185-Year-Old Bank

Stephen Piron founder, DeepLearni.ng
Stephen Piron is a technology entrepreneur and a very mediocre computer programmer. He's started tech businesses on both sides of the Atlantic that have been featured in Wired, Fortune, and the Economist. Between startups, he worked trading FX for the world's largest hedge fund. He tried to be a VC once; he was fired in less than two months.

We'll discuss putting into production the world's first AI for retail banking. This work was done in 2015, by software startup DeepLearni.ng for a large international bank. The algorithm we built ingests vast amounts of data and, over time, learns how to select the most effective treatment option for a customer based on past behavior – for example, through phone calls, email alerts, or SMS notifications. Our system is the foundation for the bank's multiyear plan to embed AI into virtually all areas of its retail business. We'll discuss (1) training and iterating through different neural network models on a GPU cloud cluster -- hardware at the time, which was foreign to the bank, (2) strategies for working within the constraints of a large organization with inconsistent and disperate datasets and embedded legacy systems, and (3) navigating privacy concerns of the bank as an external software company.

Level: Intermediate
Type: Talk
Tags: Finance; Deep Learning and AI; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7588 - Deep Watershed Transform for Instance Segmentation

Min Bai PhD Student, University of Toronto
Min Bai is a Ph.D. student at the University of Toronto in the machine learning and computer vision group, supervised by Professor Raquel Urtasun. Min received a bachelor's degree in electrical engineering from the University of Waterloo in 2013, during which she completed numerous internships, including two at NVIDIA in the Advanced Technologies Group. Min then spent two years in the wireless systems team at Apple, before joining the exciting field of machine learning.

Learn about the design, training, and analysis of a state-of-the-art, deep learning-based, instance-level segmentation pipeline enabled by NVIDIA DGX-1. Instance segmentation is the task of assigning semantic class labels to each pixel of an image (for example, car, person, etc.), as well as a coherent instance identifier such that every pixel belonging to the same object instance shares the same identifier. This has a wide array of applications, including object recognition and tracking, pose estimation, and scene understanding. In the context of autonomous driving, this will allow vehicles to accurately delineate multiple vehicles and pedestrians within an image. We'll present a simple yet powerful end-to-end convolutional neural network to tackle this task with state-of-the-art performance on the challenging Cityscapes Instance-Level Segmentation task. Our model consists of two independently trained individual deep neural networks with innovative training targets, followed by joint fine-tuning. The 30 million parameter network is trained on the new NVIDIA DGX-1 deep learning accelerator in approximately 30 hours. This is a 50% speedup compared to the NVIDIA Maxwell TITAN X, and is immeasurably faster than any CPU implementation.

Level: Intermediate
Type: Talk
Tags: Computer Vision and Machine Vision; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7590 - Passengers: Awakening VR, When Film Meet VR

Damien FAGNOU CTO, MPC
Highly-Rated Speaker
Damien Fagnou is the global head of VFX Operations at MPC, where he brings together his expertise in software and production to evolve and refine the creation processes across all feature film VFX work. After finishing university with an M.S. in computer science in France, he worked for an animated series implementing the technology to speed up the motion capture pipeline and rendering. He later accepted a job to help set up the workflow at Attitude studios and then took on the role of Tools and Workflow Programmer at Climax in the U.K. In 2003, he transferred his skills to the film industry and started at leading VFX post-production studio MPC to work on Troy, implementing preview tools and city rendering scripts. In 2005, Damien became R&D lead on Charlie and the Chocolate Factory, 10,000 BC, and Narnia. He then moved closer to production and became MPC's stereographer working on movies, including Pirates of the Caribbean: On Stranger Tides, the Harry Potter films, and Prometheus. After a few years in production, he returned to his software roots and became global head of Software overseeing software development efforts across the company.
Francesco Giordana Researcher, MPC
Francesco Giordana started his career in real-time rendering and video games first in a research lab and then at Guerrilla Games. He then spent 4 years at Double Negative VFX, where he wrote a GPU accelerated fur system and led an R&D team dedicated to the development of digital creatures for film. After that he joined ILM for two years, focusing on real-time digital acting, in particular facial performance capture and real-time rendering of characters. Finally, he joined MPC to lead the development of real-time technologies for film with a special focus on virtual production and VR.

We'll present our journey to create a real-time VR experience leveraging Film VFX workflows and assets. We'll illustrate this by talking about our work to create the Passengers: Awakening VR Experience and also some work we are doing in the Virtual Production space. We'll detail some of the challenges the developers needed to overcome — from asset build technique complexity to major differences in offline rendering and 90fps real-time VR workflows. Finally, we'll conclude on future work and discussion about where these VR workflows can directly apply to Film VFX creation and virtual production.

Level: Beginner
Type: Talk
Tags: Media and Entertainment; Virtual Reality and Augmented Reality; Real-Time Graphics
Industry Segments: Media & Entertainment

Day: TBD
Time: TBD
Location: TBD

S7592 - AI and Deep Learning in Trading

Gaurav Chakravorty Head of Trading Strategy Development, qplum
Gaurav Chakravorty is a co-founder and head of strategy development at qplum, an online asset management firm with completely data science-driven strategies. qplum is also the first successful money manager to provide all its strategies via APIs. Gaurav has been one of the early pioneers in machine learning based high-frequency trading. He built the most profitable algorithmic trading group at Tower Research from 2005-2010 and was the youngest partner in the firm.

We'll talk about how artificial intelligence has led to market-leading innovation in trading and the huge opportunity of using deep learning in trading today. There are three dominant trades: fast information extraction ("speed trade"), trade construction ("stat arb"), and prediction ("market timing"). AI has been very successful in all three aspects. We have been key innovators in the speed trade, having started with a $10,000 risk limit and, over the last 10 years, making more than $1.4 billion in profits. The reason is a purist adherence to AI. There is a huge opportunity for using deep learning in the prediction part of the trade, which is not latency sensitive and is mostly about high accuracy. Our mission is to make investing a science, a research-driven utility, and not a competition or a game that it is today. Deep learning has had a lot of success in bringing method to social science settings. We believe over the next five to 10 years that every trading operation will become deep learning based. However, at this time there is a lot of opportunity for innovation using deep learning in trading.

Level: All
Type: Talk
Tags: Finance; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7595 - Building Truly Large-Scale Medical Image Databases: Deep Label Discovery and Open-Ended Recognition

Le Lu Staff Scientist, National Institutes of Health
Le Lu has served as a staff scientist since 2013 in the Department of Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Bethesda, Maryland. His research is focused on medical image understanding and semantic parsing to fit into new clinical practices, especially in the areas of preventive early cancer detection/diagnosis and developing precise novel imaging bio-markers, via large-scale imaging protocols and statistical (deep) learning principles. Le worked on various core R&D problems in colonic polyp and lung nodule CADx systems, and vessel, bone imaging at Siemens Corporate Research and Siemens Healthcare from 2006 to 2013, and his last post was a senior staff scientist. He has been named on 18 U.S. and international patents and is the inventor or co-inventor of 32 inventions. Le has authored over 90 peer-reviewed papers. He received his Ph.D. in computer science from Johns Hopkins University in 2007. He won the Mentor of the Year award in the staff scientist/clinician category.

The recent rapid and tremendous success of deep neural networks on many challenging computer vision tasks derives from the accessibility of the well-annotated ImageNet and PASCAL VOC datasets. Nevertheless, unsupervised image categorization (that is, without ground-truth labeling) is much less investigated, critically important, and difficult when annotations are extremely hard to obtain in the conventional way of "Google Search" + crowd sourcing (exactly how ImageNet was constructed). We'll present recent work on building two truly large-scale radiology image databases at NIH to boost the development in this important domain. The first one is a chest X-ray database of 110,000+ images from 30,000+ patients, where the image labels were obtained by sophisticated natural language processing-based text mining and the image recognition benchmarks were conducted using weakly supervised deep learning. The other database contains about 216,000 CT/MRI images with key medical findings from 618,000+ patients, where a new looped deep pseudo-task optimization framework is proposed for joint mining of deep CNN features and image labels. Both medical image databases will be released to the public

Level: Intermediate
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Computer Vision and Machine Vision; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7596 - DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Song Han Ph.D. candidate, Stanford University
Song Han is a fifth-year Ph.D. student with Prof. Bill Dally at Stanford University. His research focuses on energy-efficient deep learning computing at the intersection between machine learning and computer architecture. He proposed deep compression that can compress state-of-the-art CNNs by 10-49x while fully preserving prediction accuracy. Song designed EIE: Efficient Inference Engine, a hardware accelerator that can make inference directly on the compressed sparse model, which gives significant speedup and energy saving. His work has been covered by TheNextPlatform, TechEmergence, Embedded Vision, and O'Reilly. His work received the Best Paper Award in ICLR'16, Best Poster Award in Stanford Cloud Workshop'16, and Best Paper Honorable Mention in NIPS'16 EMDNN workshop. Before joining Stanford, Song graduated from Tsinghua University.

Learn a new technique to prevent deep learning optimizers from getting stuck in a local minima, and to produce better optimization results. We'll introduce DSD, a dense-sparse-dense training method that regularizes neural networks by pruning and then restoring connections. Our method learns which connections are important during the initial dense solution. Then it regularizes the network by pruning the unimportant connections and retraining to a sparser and more robust solution with same or better accuracy. Finally, the pruned connections are restored and the entire network is retrained again. This increases the dimensionality of parameters, and thus model capacity, from the sparser model. DSD training achieves superior optimization performance. We'll highlight our experiments using GoogLeNet, VGGNet, and ResNet on ImageNet; NeuralTalk on Flickr-8K; and DeepSpeech-1&2 on the WSJ dataset. This shows that the accuracy of CNNs, RNNs, and LSTMs can significnatly benefit from DSD training. At training time, DSD incurs only one extra hyper-parameter: the sparsity ratio in the S step. At testing time, DSD doesn't change the network architecture or incur any inference overhead. The consistent and significant performance gain of DSD in our numerical experiments highlights the inadequacy of current deep learning training methods, while DSD effectively achieves superior optimization performance for finding better solutions.

Level: All
Type: Talk
Tags: Deep Learning and AI; Algorithms
Industry Segments: Higher Education / Research

Day: TBD
Time: TBD
Location: TBD

S7600 - ChainerMN: Scalable Distributed Deep Learning with Chainer

Takuya Akiba Researcher, Preferred Networks, Inc.
Takuya Akiba is a researcher at Preferred Networks, Inc., working on research and development for making deep learning faster and more scalable. He received a Ph.D. in information science and technology from the University of Tokyo, Japan, in 2015.

We'll present ChainerMN, a multi-node distributed deep learning framework, together with the basics of distributed deep learning. Even though GPUs are continuously gaining more computation throughput, it is still very time-consuming to train state-of-the-art deep neural network models. For better scalability and productivity, it is paramount to accelerate the training process by using multiple GPUs. To enable high-performance and flexible distributed training, we developed ChainerMN, built on top of Chainer. We'll first introduce the basic approaches to distributed deep learning. Then, we'll explain the design choice, basic usage, and implementation details of Chainer and ChainerMN. We'll report benchmark results and discuss the future directions of distributed deep learning.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; HPC and Supercomputing; AI Startup

Day: TBD
Time: TBD
Location: TBD

S7601 - Caffe2: A New Lightweight, Modular, and Scalable Deep Learning Framework

Yangqing Jia Research Scientist, Facebook
Yangqing Jia is a research scientist at Facebook.

Caffe2 is a new lightweight, modular, and scalable deep learning framework refactored from the previous Caffe. Caffe2 is widely used at Facebook for production to enable new AI experiences. We'll explain the strengths of Caffe2 and many improvements we made from the original Caffe.

Level: All
Type: Talk
Tags: Deep Learning and AI; Tools and Libraries

Day: TBD
Time: TBD
Location: TBD

S7602 - Zoom, Enhance, Synthesize! Magic Image Upscaling and Material Synthesis Using Deep Learning

Andrew Edelsten Senior Developer Technologies Manager, NVIDIA
Andrew Edelsten has worked in the games and visual arts industry for 20 years. Starting his career making computer games and 3D engines in Australia in the mid 90s, Andrew had a two-year sojourn in Europe before starting work at NVIDIA in 2010. For the last year, Andrew and his team have been researching novel deep learning approaches to games industry pain points with the overarching goal to make games more beautiful and engaging.

Recently deep learning has revolutionized computer vision and other recognition problems. Everyday applications using such techniques are now commonplace with more advanced tasks being automated at a growing rate. During 2016, "image synthesis" techniques started to appear that used deep neural networks to apply style transfer algorithms for image restoration. We'll review some of these techniques and demonstrate their application in image magnification to enable "super resolution" tools. We'll also discuss recent discoveries by NVIDIA Research that uses approaches based on AI, machine learning, and deep learning to greatly improve the process of creating game-ready materials. Using these novel techniques, artists can use standard DSLR, or even cell-phone cameras, to create full, renderable materials in minutes. We'll conclude by showing how developers can integrate these methods into their existing art pipelines.

Level: Intermediate
Type: Talk
Tags: Game Development; Media and Entertainment; Deep Learning and AI
Industry Segments: Games; Media & Entertainment; Architecture / Engineering / Construction; Manufacturing

Day: TBD
Time: TBD
Location: TBD

S7603 - In the Midst of a Revolution: How Computational Pathology is Transforming Clinical Practice and Biomedical Research

Thomas Fuchs Associate Professor, Memorial Sloan Kettering Cancer Center
Thomas Fuchs is director of computational pathology at Memorial Sloan Kettering Cancer Center and teaches biomedical machine learning as associate professor at Weill-Cornell in New York City. Excited by the tremendous potential of large-scale machine learning in medicine, Thomas is leading the transformation of pathology from a qualitative to a quantitative discipline at Memorial Sloan Kettering, the world's leading cancer center. Previously, Thomas was a rocket scientist at NASA's Jet Propulsion Laboratory, where he developed computer vision systems for space exploration to make sure the Mars Rover Curiosity didn't get stuck. Thomas conducted his postdoctoral studies in computer vision at the California Institute of Technology after receiving a Ph.D. in machine learning from ETH Zurich and an M.S. in mathematics from TU Graz.

At Memorial Sloan Kettering, we're building a computational pathology AI based on hundreds of NVIDIA GPUs and a petabyte of clinical data to change the future of medical diagnosis and research. Pathology is a cornerstone of clinical care and cancer research since it underpins all diagnostic grading and staging of specimens. Despite its importance, the microscopic histopathologic assessment of tumor tissue is still a manual and hence laborious, error-prone, and highly subjective process. Our goal is to facilitate the transformation of pathology from a qualitative to a quantitative discipline by building an artificial intelligence for pathology. An AI that changes how pathologists work in their clinical routines and how they conduct research. An AI that will enable them to be not only faster and more efficient, but also that helps their work be more reproducible and more objective. In addition to clinical care, it will facilitate large-scale, quantitative screening for correlations between tissue morphology and genetic panels like MSK-IMPACT.

Level: All
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI; Medical Imaging

Day: TBD
Time: TBD
Location: TBD

S7604 - Wide Learning: Why Deeper Isn't Necessarily Better in Deep Learning

Anton van den Hengel Professor, University of Adelaide (Australia)
Anton van den Hengel is the director of the Australian Centre for Visual Technologies, and a professor of computer science at the University of Adelaide. Anton has been a CI on over $50 million in research funding, and leads a group of over 60 researchers working in computer vision and machine learning. He has had research funding from Canon, BAES, Google, Bayer, and Microsoft, and has received a number of awards, including the Pearcey Award for Innovation and the CVPR Best Paper Award. He has published over 250 papers and has eight patents being exploited.

The current trend towards ever deeper neural networks is not the best. Progressively deeper neural networks have seen an increase in performance, and some incredibly large architectures. New methods have been developed to allow these very deep networks to be trained, and specifically to overcome the vanishing gradient problem. One interpretation of these very deep networks is as an ensemble classifier, made up of many sub-structures of varying sizes. We'll propose a new, wider architecture that outperforms these very deep networks. Our architecture not only produces better accuracy, it also generalizes better. The approach arises from a new interpretation of large neural networks as ensemble classifiers, and the paths that gradients take through the network during training. The result is a network that is more accurate, faster to train, and more robust, but which is also better suited to resource-constrained situations, as occur in many practical applications. We'll provide a host of examples, including semantic image segmentation, 3D from a single image, visual question answering, object detection, and more.

Level: All
Type: Talk
Tags: Deep Learning and AI; Video and Image Processing

Day: TBD
Time: TBD
Location: TBD

S7605 - Convolutional Neural Networks for Modeling Temporal Biomarkers and Disease Predictions

Narges Razavian Assistant Professor, New York University Langone Medical Center
Narges Razavian is an assistant professor in the prediction analytics group at the New York University Medical Center, where she focuses on artificial intelligence, deep learning, and graphical models for high-dimensional structured input and output problems. She also focuses on using temporal electronic health records and medical imaging data for early disease detection. Previously, she was a postdoc researcher at New York University in the clinical machine learning lab working with Dr. David Sontag. Before that, she got her Ph.D. from Carnegie Mellon University on the topic of probabilistic graphical models for protein structure dynamics modeling.

Lab values and biomarkers are often irregularly and asynchronously measured, making them difficult to use in predictive modeling. However, temporal trends can still be recovered from these measurements and are important for predicting disease onsets. We'll present a novel model of high-dimensional temporal input and high-dimensional output. Our model is composed of two convolutional neural network components. The first component is an efficient convolution-based formulation of multivariate kernel regression, which allows us to estimate each biomarker at each time point from the rest of the biomarker time series. The second component is a multi-resolution, multi-task convolutional neural network that recovers temporal trends most predictive of up to 170 diseases. We'll show how this multi-task formulation allows us to retain the correlation structure among the diseases throughout the training. Our experiments on data from 298K individuals over 8 years, up to 100 common lab measurements, and 171 diseases show that the temporal signatures learned via convolution are significantly more predictive than baselines commonly used for early disease diagnosis.

Level: Intermediate
Type: Talk
Tags: Healthcare and Life Sciences; Deep Learning and AI

Day: TBD
Time: TBD
Location: TBD

S7608 - Exploring the Latent Visual Space Between Adjectives with Generative Adversarial Networks

Damian Borth Director Deep Learning Competence Center, German Research Center for Artificial Intelligence (DFKI)
Damian Borth is the director of the Deep Learning Competence Center at the German Research Center for Artificial Intelligence (DFKI) in Kaiserslautern. Damian's research focuses on large-scale multimedia opinion mining applying machine learning and in particular deep learning to mine insights (trends, sentiment) from online media streams. His work has been awarded the Best Paper Award at ACM ICMR 2012, the McKinsey Business Technology Award 2011, and a Google Research Award in 2010. Damian serves as a member of the steering group at the VolkswagenStiftung, the review committee at the Baden-Württemberg Stiftung, and several other steering- and program committees of international conferences and workshops. Damian did his postdoctoral research at UC Berkeley and the International Computer Science Institute in Berkeley. He received his Ph.D. from the University of Kaiserslautern. He was also a visiting researcher at the Digital Video and Multimedia Lab at Columbia.
Federico Raue Researcher, German Research Center for Artificial Intelligence (DFKI)
Federico Raue is a researcher at the German Research Center for Artificial Intelligence, where he works on synchronization of deep learning architectures for multimodal signal fusion.

Generative adversarial networks (GANs) have been applied for multiple cases, such as generating images and image completion. One interesting feature of GANs is the exploration in latent space, where new elements can appear caused by the interpolation between two seed elements. With this in mind, we're interested in exploring latent space in terms of adjective-noun pairs (ANP) able to capture subjectivity in visual content such as "cloudy sky" vs. "pretty sky." Although it is challenging for humans to find a smooth transition between two ANPs (similar to color gradient or color progression), the presented GANs are capable of generating such a gradient in the adjective domain and find new ANPs that lie in this (subjective) transition. As result, GANs offer a more quantified interpretation for this subjective progression and an explainability of the underlying latent space.

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Computer Vision and Machine Vision

Day: TBD
Time: TBD
Location: TBD

S7611 - Mitigating Disasters with GPU-Based Deep Learning from Twitter?

Ian Lumb Solutions Architect, Univa Corporation
Ian Lumb is a solutions architect at Univa Corporation, where he works to collaboratively develop and deliver solutions for customers and partners subject to their stated requirements. Through this engagement, Ian continues to develop his expertise in HPC and big data analytics, an undertaking that routinely involves making use of GPUs. Ian has presented and published on the application of machine learning in various scientific disciplines. Over about two decades, Ian has developed an international reputation as an evangelist who focuses on the intersection of IT and the sciences. Having pursued geophysics at the postgraduate level, and as the author of numerous articles published by peer-reviewed journals, Ian is well placed to explore IT-science intersections from a unique perspective.

By including "credible data" extracted from the Twitter social networking service, the study of earthquakes and tsunamis is being systematically transformed into a big data analytics problem. The challenge of establishing geophysically credible tweets is considered through a combination of deep learning and semantics (that is, knowledge representation). More specifically, tweet classification via GPU-based platforms for deep learning is compared and contrasted with previous work based on the use of in-memory computing via Apache Spark. Although there remains cause for optimism in augmenting traditional scientific data with that derived from social networking, ongoing research is aimed at providing utility in practice. The motivation for success remains strong, as establishing a causal relationship between earthquakes and tsunamis remains problematical, and this in turn complicates any ability to deliver timely, accurate messaging that could prove life-critical. Finally, we'll consider the applicability of this approach to other disaster scenarios (for example, the Deepwater Horizon oil spill).

Level: Intermediate
Type: Talk
Tags: Deep Learning and AI; Earth Systems Modeling
Industry Segments: Higher Education / Research; Energy / Oil & Gas; Internet / Telecommunications; Software

Day: TBD
Time: TBD
Location: TBD

S7612 - ZhuSuan: a Deep Learning Library on GPUs

Jun Zhu Associate Professor, Tsinghua University
Jun Zhu is an associate professor at Tsinghua University, an adjunct faculty at Carnegie Mellon University, and a deputy director of the State Key Lab for Intelligent Technology and Systems. His principal interests lie in statistical machine learning for solving scientific and engineering problems. Jun received Ph.D. in computer science from Tsinghua in 2009. He did post-doctoral research at Carnegie Mellon University from 2009 to 2011. Jun has published over 80 peer-reviewed papers in prestigious conferences and journals. He is an associate editor for IEEE Trans. on PAMI and Artificial Intelligence. He served as area chair/senior PC for ICML (2014-2017), NIPS (2013, 2015), IJCAI (2013-2017), UAI (2014-2017), AAAI (2016, 2017), and AISTATS (2017). He was a local co-chair of ICML 2014. Jun is a recipient of CCF Distinguished Ph.D. Thesis Award, IEEE Intelligent Systems "AI's 10 to Watch" Award, NSFC Excellent Young Scholar Award, and CCF Young Scientist Award.

We'll introduce ZhuSuan, a new Python library for deep learning on GPUs. Unlike existing deep learning libraries, which are mainly designed for supervised tasks, ZhuSuan is featured for its deep root into Bayesian inference, thus supporting various kinds of generative models--both the traditional hierarchical Bayesian models and the recent deep generative models, which have been extensively studied for unsupervised lea