Comprehensive Arm Solutions for Innovative Machine

© 2017 Arm Limited Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Steve Steele Director, ML Pl...

11 downloads 596 Views 2MB Size
Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Steve Steele

Director, ML Platforms | Arm

© 2017 Arm Limited

Arm Technical Symposium 2017

Agenda What is Artificial Intelligence? What are the opportunities and challenges in AI?

Innovation

Growth

Maturity

Machine learning

Arm technology for AI •

Software



Specialized Acceleration



Hardware

Platform & tools

v

2 2

© 2017 Arm Limited

What is Artificial Intelligence?

© 2017 Arm Limited

AI Presents Significant Opportunity for Innovation VR/MR

Robotics

Drones

Shipping & logistics

IoT

Home, surveillance & analytics

Automotive

Mobile

4 4

© 2017 Arm Limited

The Opportunity and Challenge of AI AI and Machine Learning in 2020

Devices, algorithms, and connected services

$4.8 billion for chips •

Autonomous driving and industrial applications



Robotics open up server and knowledge-sharing services



Connected services predicted to be very valuable



Algorithms change daily

5 5

© 2017 Arm Limited

Machine Learning is a Subset of Artificial Intelligence AI means many things to many people Artificial Intelligence

ML itself has a lot of depth

Machine Learning Perception & Vision Natural Language Processing Knowledge Representation Planning & Navigation Generalized Intelligence 6 6

© 2017 Arm Limited

Why Artificial Intelligence(AI) is Exploding Now Availability of increased data sourced at the edge with ubiquitous powerful compute! Compute 2010

Data 2016 – 1 zettabyte

IP Traffic

2015

2020 – 2.3 zettabyte

7 7

© 2017 Arm Limited

zettabyte = 1021 bytes

Neural Networks (NN) Can Now Outperform Humans Data for ImageNet Large Scale Visual Recognition Challenge

Top 5 Error on ImageNet

Top-5 Error Rate (%)

30

25

20

Deep learning introduced in 2012, resulting in big improvements

15

10

5

Human Error Rate

0 Series1

Series2

Computer Vision

Series3

Series4

Series5

Series6

Deep Learning

(Source: ImageNet and Andrej Karpathy) 8

8

© 2017 Arm Limited

Series7

Error rates have now stabilized at ~3%

Distributed Intelligence Cloud servers

Regional servers

Edge devices

Training + inference

Training + inference

Sensing, training, inference & actuation

Capabilities Migrating to the Edge 9 9

© 2017 Arm Limited

Why is On-device ML Driving AI to the Edge?

Bandwidth

Power

Cost

10 10

© 2017 Arm Limited

Latency

Privacy

AI Applications at the Edge on Arm

Detect plant diseases

Sort cucumbers 11

11

© 2017 Arm Limited

Detect Caltrain delays

The Arm ML Platform

© 2017 Arm Limited

Arm ML Platform Enables

Efficiency

Flexibility 13

13

© 2017 Arm Limited

Freedom

Components of Arm ML Platform Software

Hardware

Specialized Acceleration

14 14

© 2017 Arm Limited

Software Development

15 15

© 2017 Arm Limited

Software Architecture Overview Applications Third-party libraries and benchmarks

Compute libraries for NEON, GPU

Programmable

CPUs Arm Cortex-M

Domain-specific high-level libraries: Mobile, Autonomous, People

CPUs Cortex-A

© 2017 Arm Limited

Caffe

MXNet

Torch

Android NN

GPUs Arm Mali

16 16

Tensorflow

Spirit

3rd party accelerators

Compute Library from Arm Faster, advanced processing What is the Compute Library?

Delivers faster processing

Offers OpenCV and Open VX compatibility

Functions for CV and deeplearning algorithms

4.6x faster than stock OpenCV on NEON

Use as a plug-in backend for your own runtime implementation

Optimized for Arm CPU and GPU OS and platform agnostic No fee, MIT license

Available now: https://developer.arm.com/technologies/compute-library

17 17

© 2017 Arm Limited

Compute Library from Arm Partners

Functions

+ 80

18 18

© 2017 Arm Limited

Hardware

19 19

© 2017 Arm Limited

ML on Cortex CPUs

© 2017 Arm Limited

Instruction Sets for AI Cortex-A •

Additional dot product instructions (Cortex-A55 and Cortex-A75)



New Scalable Vector Extension (SVE) instructions



Flexibility in multi-core computing with Arm DynamIQ technology

Closely-coupled acceleration

Cortex-M •

Optimized CMSIS-DSP libraries for matrix multiplication

21 21

© 2017 Arm Limited



Improved performance and efficiency (for broader use cases)



Connect accelerators with DynamIQ

New DynamIQ-based CPUs for New Possibilities Cortex-A75 processor

Cortex-A55 processor

>50%

2.5x

more performance compared to current devices

greater power efficiency compared to current devices

Estimated device performance using SPECINT2006, final device results may vary Comparison using Cortex-A73 at 2.4GHz vs Cortex-A75 at 3GHz

Comparison using Cortex-A53 in 28nm devices vs Cortex-A55 in 16nm devices 22

22

© 2017 Arm Limited

DynamIQ: New Cluster Design for New Cores ..

Arm DynamIQ big.LITTLE systems: •

Greater product differentiation and scalability



Improved energy efficiency and performance



SW compatibility with Energy Aware Scheduling (EAS)

Private L2 and shared L3 caches •

Local cache close to processors

Cortex-A75

Cortex-A55

32b/64b Core

32b/64b Core

Private L2 cache

Private L2 cache

SCU

Peripheral Port

Async Bridges

ACP

AMBA4 ACE

Shared L3 cache

DynamIQ Shared Unit (DSU)



L3 cache shared between all cores

DynamIQ Shared Unit (DSU) •

Contains L3, Snoop Control Unit (SCU) and all cluster interfaces

1b+7L

2b+6L

4b+4L

Additional instructions for ML 23 23

© 2017 Arm Limited

1b+2L 1b+3L Example: DynamIQ big.LITTLE configurations

1b+4L

Instruction Sets for AI Cortex-A •

Additional dot product instructions (Cortex-A55 and Cortex-A75)



New Scalable Vector Extension (SVE) instructions



Flexibility in multi-core computing with Arm DynamIQ technology

Closely-coupled acceleration

Cortex-M •

Optimized CMSIS-DSP libraries for matrix multiplication

24 24

© 2017 Arm Limited



Improved performance and efficiency (for broader use cases)



Connect accelerators with DynamIQ

Instruction Sets for AI Cortex-A •

Additional dot product instructions (Cortex-A55 and Cortex-A75)



New Scalable Vector Extension (SVE) instructions



Flexibility in multi-core computing with Arm DynamIQ technology

Closely-coupled acceleration

Cortex-M •

Optimized CMSIS-DSP libraries for matrix multiplication

25 25

© 2017 Arm Limited



Improved performance and efficiency (for broader use cases)



Connect accelerators with DynamIQ

ML on Mali GPUs

© 2017 Arm Limited

Mali GPUs: Increasing ML Throughput and Efficiency Increasing efficiency

Relative Eneryg Efficency

1.2

17%

• GEMM depicts core functionality of ML algorithms • Mali-G72 has several optimizations to improve ML inference • Less power-hungry FMA unit • Bigger L1 cache in the execution engine • Mali-G72 is the most efficient Mali GPU for machine learning

Efficiency gain

Series1

Series2

1.15 1.1 1.05 1 0.95 0.9

1

2

27 27

© 2017 Arm Limited

Specialized Acceleration

28 28

© 2017 Arm Limited

Computer Vision (CV)

© 2017 Arm Limited

Spirit: Object Detection at the Edge Head facing right

Direct from sensor (no ISP)

Head facing forwards

Upper body facing right

Real-time High resolution, wide range of scale

Upper body facing forward

Very detailed object description Person being tracked

Full body facing right

Full body facing forward

Image analysis Small area Energy efficient

Trajectory 30 30

© 2017 Arm Limited

Pose

Identity

Gesture

Spirit for Object Detection and Localization Sensor

ISP Image stream

CPU

Spirit CV pre-processor

GPU

Classifier 2

Classifier 1

Feature extraction

Sensor interface

Acceleration

Metadata stream (Regions of interest)

Beth Ben 31 31

© 2017 Arm Limited

Comparison with Neural Network Framework Solutions

Yolo

Spirit

SSD Neural Network

32 32

© 2017 Arm Limited

Summary

© 2017 Arm Limited

Arm’s ML Computing Platform Power-efficient and scalable architecture enables AI on batteryconstrained devices

+

Flexible software with standard APIs and ML frameworks simplifies implementation and provides portability

+

Greater capability for ML solutions 34 34

© 2017 Arm Limited

World’s largest ecosystem for devices delivers broad applicability and rich capabilities

For further information…

https://developer.arm.com [email protected] 35 35

© 2017 Arm Limited

Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 36

© 2017 Arm Limited

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks 37

© 2017 Arm Limited