Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Steve Steele
Director, ML Platforms | Arm
© 2017 Arm Limited
Arm Technical Symposium 2017
Agenda What is Artificial Intelligence? What are the opportunities and challenges in AI?
Innovation
Growth
Maturity
Machine learning
Arm technology for AI •
Software
•
Specialized Acceleration
•
Hardware
Platform & tools
v
2 2
© 2017 Arm Limited
What is Artificial Intelligence?
© 2017 Arm Limited
AI Presents Significant Opportunity for Innovation VR/MR
Robotics
Drones
Shipping & logistics
IoT
Home, surveillance & analytics
Automotive
Mobile
4 4
© 2017 Arm Limited
The Opportunity and Challenge of AI AI and Machine Learning in 2020
Devices, algorithms, and connected services
$4.8 billion for chips •
Autonomous driving and industrial applications
•
Robotics open up server and knowledge-sharing services
•
Connected services predicted to be very valuable
•
Algorithms change daily
5 5
© 2017 Arm Limited
Machine Learning is a Subset of Artificial Intelligence AI means many things to many people Artificial Intelligence
ML itself has a lot of depth
Machine Learning Perception & Vision Natural Language Processing Knowledge Representation Planning & Navigation Generalized Intelligence 6 6
© 2017 Arm Limited
Why Artificial Intelligence(AI) is Exploding Now Availability of increased data sourced at the edge with ubiquitous powerful compute! Compute 2010
Data 2016 – 1 zettabyte
IP Traffic
2015
2020 – 2.3 zettabyte
7 7
© 2017 Arm Limited
zettabyte = 1021 bytes
Neural Networks (NN) Can Now Outperform Humans Data for ImageNet Large Scale Visual Recognition Challenge
Top 5 Error on ImageNet
Top-5 Error Rate (%)
30
25
20
Deep learning introduced in 2012, resulting in big improvements
15
10
5
Human Error Rate
0 Series1
Series2
Computer Vision
Series3
Series4
Series5
Series6
Deep Learning
(Source: ImageNet and Andrej Karpathy) 8
8
© 2017 Arm Limited
Series7
Error rates have now stabilized at ~3%
Distributed Intelligence Cloud servers
Regional servers
Edge devices
Training + inference
Training + inference
Sensing, training, inference & actuation
Capabilities Migrating to the Edge 9 9
© 2017 Arm Limited
Why is On-device ML Driving AI to the Edge?
Bandwidth
Power
Cost
10 10
© 2017 Arm Limited
Latency
Privacy
AI Applications at the Edge on Arm
Detect plant diseases
Sort cucumbers 11
11
© 2017 Arm Limited
Detect Caltrain delays
The Arm ML Platform
© 2017 Arm Limited
Arm ML Platform Enables
Efficiency
Flexibility 13
13
© 2017 Arm Limited
Freedom
Components of Arm ML Platform Software
Hardware
Specialized Acceleration
14 14
© 2017 Arm Limited
Software Development
15 15
© 2017 Arm Limited
Software Architecture Overview Applications Third-party libraries and benchmarks
Compute libraries for NEON, GPU
Programmable
CPUs Arm Cortex-M
Domain-specific high-level libraries: Mobile, Autonomous, People
CPUs Cortex-A
© 2017 Arm Limited
Caffe
MXNet
Torch
Android NN
GPUs Arm Mali
16 16
Tensorflow
Spirit
3rd party accelerators
Compute Library from Arm Faster, advanced processing What is the Compute Library?
Delivers faster processing
Offers OpenCV and Open VX compatibility
Functions for CV and deeplearning algorithms
4.6x faster than stock OpenCV on NEON
Use as a plug-in backend for your own runtime implementation
Optimized for Arm CPU and GPU OS and platform agnostic No fee, MIT license
Available now: https://developer.arm.com/technologies/compute-library
17 17
© 2017 Arm Limited
Compute Library from Arm Partners
Functions
+ 80
18 18
© 2017 Arm Limited
Hardware
19 19
© 2017 Arm Limited
ML on Cortex CPUs
© 2017 Arm Limited
Instruction Sets for AI Cortex-A •
Additional dot product instructions (Cortex-A55 and Cortex-A75)
•
New Scalable Vector Extension (SVE) instructions
•
Flexibility in multi-core computing with Arm DynamIQ technology
Closely-coupled acceleration
Cortex-M •
Optimized CMSIS-DSP libraries for matrix multiplication
21 21
© 2017 Arm Limited
•
Improved performance and efficiency (for broader use cases)
•
Connect accelerators with DynamIQ
New DynamIQ-based CPUs for New Possibilities Cortex-A75 processor
Cortex-A55 processor
>50%
2.5x
more performance compared to current devices
greater power efficiency compared to current devices
Estimated device performance using SPECINT2006, final device results may vary Comparison using Cortex-A73 at 2.4GHz vs Cortex-A75 at 3GHz
Comparison using Cortex-A53 in 28nm devices vs Cortex-A55 in 16nm devices 22
22
© 2017 Arm Limited
DynamIQ: New Cluster Design for New Cores ..
Arm DynamIQ big.LITTLE systems: •
Greater product differentiation and scalability
•
Improved energy efficiency and performance
•
SW compatibility with Energy Aware Scheduling (EAS)
Private L2 and shared L3 caches •
Local cache close to processors
Cortex-A75
Cortex-A55
32b/64b Core
32b/64b Core
Private L2 cache
Private L2 cache
SCU
Peripheral Port
Async Bridges
ACP
AMBA4 ACE
Shared L3 cache
DynamIQ Shared Unit (DSU)
•
L3 cache shared between all cores
DynamIQ Shared Unit (DSU) •
Contains L3, Snoop Control Unit (SCU) and all cluster interfaces
1b+7L
2b+6L
4b+4L
Additional instructions for ML 23 23
© 2017 Arm Limited
1b+2L 1b+3L Example: DynamIQ big.LITTLE configurations
1b+4L
Instruction Sets for AI Cortex-A •
Additional dot product instructions (Cortex-A55 and Cortex-A75)
•
New Scalable Vector Extension (SVE) instructions
•
Flexibility in multi-core computing with Arm DynamIQ technology
Closely-coupled acceleration
Cortex-M •
Optimized CMSIS-DSP libraries for matrix multiplication
24 24
© 2017 Arm Limited
•
Improved performance and efficiency (for broader use cases)
•
Connect accelerators with DynamIQ
Instruction Sets for AI Cortex-A •
Additional dot product instructions (Cortex-A55 and Cortex-A75)
•
New Scalable Vector Extension (SVE) instructions
•
Flexibility in multi-core computing with Arm DynamIQ technology
Closely-coupled acceleration
Cortex-M •
Optimized CMSIS-DSP libraries for matrix multiplication
25 25
© 2017 Arm Limited
•
Improved performance and efficiency (for broader use cases)
•
Connect accelerators with DynamIQ
ML on Mali GPUs
© 2017 Arm Limited
Mali GPUs: Increasing ML Throughput and Efficiency Increasing efficiency
Relative Eneryg Efficency
1.2
17%
• GEMM depicts core functionality of ML algorithms • Mali-G72 has several optimizations to improve ML inference • Less power-hungry FMA unit • Bigger L1 cache in the execution engine • Mali-G72 is the most efficient Mali GPU for machine learning
Efficiency gain
Series1
Series2
1.15 1.1 1.05 1 0.95 0.9
1
2
27 27
© 2017 Arm Limited
Specialized Acceleration
28 28
© 2017 Arm Limited
Computer Vision (CV)
© 2017 Arm Limited
Spirit: Object Detection at the Edge Head facing right
Direct from sensor (no ISP)
Head facing forwards
Upper body facing right
Real-time High resolution, wide range of scale
Upper body facing forward
Very detailed object description Person being tracked
Full body facing right
Full body facing forward
Image analysis Small area Energy efficient
Trajectory 30 30
© 2017 Arm Limited
Pose
Identity
Gesture
Spirit for Object Detection and Localization Sensor
ISP Image stream
CPU
Spirit CV pre-processor
GPU
Classifier 2
Classifier 1
Feature extraction
Sensor interface
Acceleration
Metadata stream (Regions of interest)
Beth Ben 31 31
© 2017 Arm Limited
Comparison with Neural Network Framework Solutions
Yolo
Spirit
SSD Neural Network
32 32
© 2017 Arm Limited
Summary
© 2017 Arm Limited
Arm’s ML Computing Platform Power-efficient and scalable architecture enables AI on batteryconstrained devices
+
Flexible software with standard APIs and ML frameworks simplifies implementation and provides portability
+
Greater capability for ML solutions 34 34
© 2017 Arm Limited
World’s largest ecosystem for devices delivers broad applicability and rich capabilities
For further information…
https://developer.arm.com
[email protected] 35 35
© 2017 Arm Limited
Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 36
© 2017 Arm Limited
The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks 37
© 2017 Arm Limited