Computer architectures and hardware acceleration for deep learning
Monday, 10am - noon
In the last decade, deep learning has emerged as the dominating paradigm in a wide spectrum of AI tasks. Both big industrial companies and the research community have been exploring and developing deep- learning systems to drive current and future applications, for automatic speech recognition and intelligent photo tagging on social networks, to the disruptive deployment of self-driving cars and autonomous drones. Nevertheless, the predictive power of deep-learning models comes at the cost of high compute requirements for both the training and inference stages. Effectively serving such workloads has a direct impact on the long-term quality of a model due to long training cycles that prohibit regular model updates, the response time that is experience by the users and the power consumption on mobile platforms and data centers. To deal with these challenges, specialized computing solution are being sought for both training and inference stages. In this tutorial, we will present the current landscape of computer architectures for deep-learning workloads, covering both the training and inference stages. Starting with characterizing the workload and performance requirements of training and inference, emphasis will be placed on accelerator design approaches for efficient server-based training as well as custom hardware architectures tailored for performing inference on embedded and mobile platforms. The examined hardware designs will cover novel designs proposed by the research community together with industrial chipsets that currently pave the way on the high-performance execution of deep-learning workloads.
- Characterization of deep-learning workloads
- Performance requirements of deep-learning applications
- Computer architectures and custom hardware designs for deep neural networks from both the research and industrial communities
- Future opportunities and open problems in hardware design for deep learning
10:00 - 10:30 Workload characteristics and performance requirements of deep-learning applications
10:30 - 10:50 Hardware accelerators from the industry: Strengths and limitations
10:50 - 11:00 Break
11:00 - 11:30 Hardware accelerators in research: Current landscape
11:30 - 12:00 Open challenges and future opportunities
Dr. Stylianos Venieris is a Researcher at Samsung AI Center Cambridge in UK. He received his PhD in Reconfigurable Hardware and Deep Learning from Imperial College London in 2018 and his MEng degree in Electrical and Electronic Engineering from Imperial College London in 2014. His research interests include methodologies for the principled and automated mapping of deep learning algorithms on mobile and embedded computing platforms, as well as the design of custom hardware accelerators for the high-performance, energy-efficient deployment of deep neural networks.Ilias Leontiadis
Dr. Ilias Leontiadis is a Senior Research Scientist at Samsung AI. Before, he was a Senior Researcher at Telefonica Research and a research fellow at the University of Cambridge. He received his PhD from University College London (UCL).
His research interests include mobile systems, deep learning and networks. He is currently working towards enabling mobile devices to support on-device AI. Furthermore, he conducts research towards enabling edge and cloud offloading of complex AI tasks from resource-constrained devices such as wearables and IoTs. Finally, he is an associate professor at the Graduate School of Economics in Barcelona.Royson Lee
Royson Lee is a software engineer at Samsung AI Center Cambridge in UK. He received his MPhil degree in Computer Science from the University of Cambridge in 2018 and his BEng degree in Computing from Imperial College London in 2017. Before he went to university, he won numerous hackathons and Capture the Flag hacking competitions and is a gold medalist at the WorldSkills Singapore competition in 2011 for IT Network Systems Administration. He also received his Diploma in Information Security from Nanyang Polytechnic in 2012 and is a recipient of the CSIT-Nanyang Scholarship. His current research interests include automated machine learning, computer vision, and distributed deep learning.
COSMOS (Cloud-Enhanced Open Software Defined Mobile Wireless Testbed for City-Scale Deployment)
Organizers: Ivan Seskar, Dipankar Raychaudhuri (Rutgers University), Gil Zussman (Columbia University)
Monday, 2 - 5pm
Presenters: Tingjun Chen (Columbia University), Michael Sherman (Rutgers University)
Wireless network testbeds are important for realistic, at-scale experimental evaluation of new radio technologies, protocols and network architectures. With a somewhat belated reality check on 5G, larger tests and demonstration sites have become even more important in the validation of next generation wireless platforms. In order to address at least some of the challenges of advancing fundamental wireless research, the US National Science Foundation (NSF), in collaboration with the 28-member industry consortium, has formed a public-private partnership to support the creation of up to four city-scale experimental platforms - the NSF’s Platforms for Advanced Wireless Research (PAWR) initiative.
This tutorial will introduce the PAWR COSMOS ("Cloud enhanced Open Software defined MObile wireless testbed for city-Scale deployment") platform. COSMOS is a joint project involving Rutgers, Columbia, and NYU along with several partner organizations including New York City, City College of New York, University of Arizona, Silicon Harlem, and IBM. The COSMOS advanced wireless testbed is being deployed in New York City with technical focus on ultra-high-bandwidth and low-latency wireless communications with tightly coupled edge computing, and emphasis on the millimeter-wave (mmWave) radio communications and dynamic optical switching.
Once fully deployed, the COSMOS testbed will support at-scale experimentation of novel advanced wireless broadband and communication technologies in both sub-6 GHz and mmWave frequency bands in West Harlem in New York City, which is a representative of a densely populated urban environment. The COSMOS testbed platform provides a mix of fully programmable software-defined radio (SDR) nodes for flexible wireless experimentation. It also includes novel 100 Gbps+ fiber, free space optical, and microwave backhaul technologies interconnected with a software-defined network (SDN) switching fabric for minimum latency and flexibility in setting up experimental network topologies. Moverover, the remote accessibility of COSMOS lowers the barrier for experimentation in the area of radio and wireless technology and thus improves education and research productivity. The goal of this tutorial is to provide an introduction to COSMOS testbed management framework OMF and measurement library OML and main technology capabilities.
The first part of the tutorial will focus on the SDR aspects, where attendees will learn the basics of testbed usage and the OMF testbed management framework. These include how to manage reservations, image the nodes, orchestrate their experiments and collect measurements. Experimenters will be able to play with two SDR-based examples:
- A channel sounding experiment supporting up to 100 MHz baseband bandwidth based on a customized FPGA implementation (https://wiki.cosmos-lab.org/wiki/tutorials)
- A real-time full-duplex wireless link demonstration using customized self-interference cancellation hardware circuitry integrated with the SDRs (Columbia FlexICoN project).
The second part of the tutorial will focus on experimentation with heterogeneous cloud computing capabilities (i.e., CPUs, GPUs, and server-side FPGAs) of the COSMOS platform. To illustrate the use of distributed computational resources, attendees will deploy the OpenAirInterface (OAI) SDR-based LTE experimental ecosystem by using the Open Source MANO (OSM) orchestrator.
The third part of the tutorial is devoted to optical experimentation and will show the tools and services designed to configure and monitor the performance of optical paths and topologies of the COSMOS testbed. In particular, the SDN framework will allow testbed users to implement experiments with application-driven control of optical and data networking functionalities. Customized python scripts along with a Ryu OpenFlow controller will be used to demonstrate the programmability of the COSMOS optical network.Schedule (Monday, Oct. 21, 2019):
14:00 - 14:30 COSMOS testbed introduction
14:30 - 15:30 Part 1: Basic testbed usage with SDR - management and measurement tools
15:30 - 15:45 Coffee Break
15:45 - 16:15 Part 2: Edge cloud computing capabilities
16:15 - 17:00 Part 3: Optical experimentation tools and services
- A whiteboard for instructions, and a projector for slides
- Reliable internet: wifi for attendees with at least one wired high-speed link
- Bring a laptop with an SSH client installed
- Register for an account
- Please use "2019 Mobicom" as the "Organization" in the form.
- Set up SSH client and upload key
Democratizing video analytics
Friday, 10am - noon
The goal of this tutorial is to bring to a wider audience a highly extensible video analytics software stack and empower everyone to build practical real-world video analytics applications with cutting edge machine learning algorithms. We will introduce Rocket, a hybrid edge-cloud live video analytics system (built on C# .NET Core), and host three hands-on labs to walk through:
- How to setup and run the video analytics system
- Plug-in DNN models across the edge and cloud
- Develop end-to-end applications based on live videos
Tutorial participants should bring their own laptops. Laptops with cuda-supported GPU are required. We will be providing Azure cloud services (e.g., VMs, DBs) and on-premise hardware (e.g., cameras) to all participants during the tutorial. All attendees must register in advance, to secure their place. Research experience in machine learning, computer vision is NOT required.Session details
10:00 – 10:15 Rocket pipeline introduction
10:15 – 10:45 Hands-on Lab 1: Setting up Rocket to do alerting & counting in videos based on objects in a region of interest. The Rocket pipeline will use a cascade of DNNs.
10:45 – 10:50 Break
10:50 – 11:05 Hands-on Lab 2: Containerizing the video analytics module and executing it across the edge and cloud using AzureML.
11:05 – 11:10 Break
11:10 – 11:50 Hands-on Lab 3: Plugging in a live camera feed and building an end-to-end video analytics application (smart crosswalk).
Affiliation: Microsoft Research
Address: 14865 NE 36th St, Redmond, WA 98052, United States
Bio: Ganesh Ananthanarayanan is a Researcher at Microsoft Research. His research interests are broadly in systems & networking, with recent focus on live video analytics, cloud computing & large scale data analytics systems, and Internet performance. He has published over 30 papers in systems & networking conferences such as USENIX OSDI, ACM SIGCOMM and USENIX NSDI. His work on "Video Analytics for Vision Zero" on analyzing traffic camera feeds won the Institute of Transportation Engineers 2017 Achievement Award as well as the "Safer Cities, Safer People" US Department of Transportation Award. He has collaborated with and shipped technology to Microsoft’s cloud and online products like the Azure Cloud, Cosmos (Microsoft’s big data system) and Skype. He is a member of the ACM Future of Computing Academy. Prior to joining Microsoft Research, he completed his Ph.D. at UC Berkeley in Dec 2013, where he was also a recipient of the UC Berkeley Regents Fellowship.Yunxin Liu
Affiliation: Microsoft Research Asia
Address: No.5, Danling street, T2-12467, Haidian District, Beijing, P.R.China 100080
Bio: Yunxin Liu is a Senior Researcher at Microsoft Research Asia (MSRA). His research interests are mobile and edge systems, focusing on system optimizations, power management, security and privacy, sensing, and edge AI. His research work has been published in top conferences and journals such as MobiSys, MobiCom, NSDI, CCS, ToN and TMC; transferred into multiple Microsoft products including Visual Studio, XBOX XDK, and Windows Phone; and featured in news media including ABC News, The Register, NetworkWorld and many others. He served as a TPC member of a set of conferences such as MobiSys, WWW and INFOCOM. He received MobiCom 2015 Best Demo Award, PhoneSense 2011 Best Paper Award, and SenSys 2018 Best Paper Runner-up Award.Yuanchao Shu
Affiliation: Microsoft Research
Address: 14865 NE 36th St, Redmond, WA 98052, United States
Bio: Yuanchao Shu is currently a Researcher with Mobility and Networking Research Group at Microsoft Research. His research interests lie broadly in mobile and wireless systems, networked control and optimization, and mobile security and privacy. His previous research results have been published at top-tier venues including MobiCom, MobiSys, Ubicomp, SenSys, JSAC, TMC, TPDS, USENIX Security etc. He served as Registration Chair of MobiCom 19, Award Chair of ACM TURC 17/18, Publication Co-Chair of SmartCom 17, and member of TPC of SEC, Globecom, ICC, EdgeSys etc. He won IEEE WCNC Best Paper Award, ACM SenSys Best Paper Runner-up Award, IEEE INFOCOM Best Demo Award, and was the recipient of ACM China Doctoral Dissertation Award and IBM PhD Fellowship.
Building Embedded AI Systems - A Practical Approach
Friday, 2 - 5pm
In this tutorial, we will bring together embedded software engineering and machine learning expertise to teach a practical approach in designing and developing embedded AI systems. In particular, we will use eSense earable computing platform and Raspberry PI Zero to develop end-to-end inference pipelines for two learning objectives - recognition of physical activities and spoken keywords. Twenty-five pairs of earable and RPI Zero will be used during the tutorial to teach the participants techniques for efficient and robust 1) data acquisition, 2) data exploration, 3) data preparation, 4) model development, 5) model optimisation and 6) model execution on such resource-constrained environments. Participants are expected to have some familiarity with Node.js, Python, and Tensorflow (and Tensorflow Lite), and Deep Neural Networks.
By the end of this tutorial, participants will have acquired an understanding of the challenges and opportunities of embedded AI systems, design and development processes and critical enabling technologies.Session Details
Part 1 (2 PM to 2:45 PM): Embedded AI Systems - A Gentle Introduction
Part 2 (3.00 PM to 3.45 PM): Device and Data Exploration
Part 3 (4.00 PM to 4.45 PM): Model Development and Execution
Tutors: Pervasive Systems Research, Nokia Bell Labs Cambridge
Dr. Chulhong Min is a research scientist at Nokia Bell Labs Cambridge and works on Pervasive AI Systems.
Dr. Alessandro Montanari is a research scientist at Nokia Bell Labs Cambridge and works on Pervasive AI Systems.
Dr. Mo Alloulah is a research scientist at Nokia Bell Labs Cambridge and works on Pervasive AI Systems.
Dr. Fahim Kawsar leads Pervasive Systems research at Nokia Bell Labs Cambridge and holds a design united professorship at TU Delft.