Chris Goodyer
Overview of Arm
• HPC engagements
Arm partner information
• Latest deployment information
Arm Software Ecosystem
• Software stack enabling
• Arm’s priorities on libraries and applications,
Filmed at the Arm HPC User Group at SC17 in Denver.
The Mont-Blanc European Exascale supercomputing project based on ARM power-efficient technology, using Cavium ThunderX2 ARM server processor to power its new High Performance Computing (HPC) prototype with HPC SW infrastructure for ARM with tools, code stacks and libraries and more. The ambition of the Mont-Blanc project is to define the architecture of an Exascale-class compute node based on the ARM architecture, and capable of being manufactured at industrial scale. The Mont-Blanc 3 system being built by a consortium which includes Atos, ARM, AVL (Austrian power train developer) and seven academic institutions, including the Barcelona Supercomputer Center (BSC), implements this ARM for HPC with high memory bandwidth and high core count on Cavium’s custom ARMv8 core architecture with out-of-order execution that can run at 3 GHz. The ThunderX2 might be delivering twice the integer and floating point performance compared with ThunderX1 with also twice the memory bandwidth.
Filmed in 4K60 at Supercomputing 2017 in Denver using Panasonic GH5 ($1999 at Amazon.com) on firmware 2.1 (aperture priority, AF continuous tracking) with Leica 12mm f1.4 ($1297 at Amazon.com) with Sennheiser MKE440 stereo shotgun microphone ($325 at Amazon.com), get $25 off renting cameras and lenses with my referral link at https://share.lensrentals.com/x/wWbHqV
Dell EMC shows some of their latest machine and deep learning products for the Enterprise market, enabling enterprises to address opportunities in areas such as fraud detection, image processing, financial investment analysis, personalized medicine and more. The new Dell EMC PowerEdge C4140 Machine Learning and Deep Learning Ready Bundle accelerator-based platform for demanding cognitive workloads, powered by latest generation NVIDIA V100 GPU accelerators with PCIe and NVLink high-speed interconnect technology, two Intel Xeon Scalable Processors, to bring high performance computing (HPC) and data analytics capabilities to mainstream enterprises worldwide.
Dell EMC’s Supercomputers power some of the fastest supercomputers in the world such as the one built for The Texas Advanced Computing Center (TACC) at The University of Texas at Austin, the “Stampede2” supercomputer with Intel Xeon Phi 7250 processors across 4,200 nodes connected with Intel Omni-Path Fabric, developed in collaboration with Dell EMC, Intel and Seagate and ranks No. 12 on the TOP500 list of the most powerful computer systems worldwide. Simon Fraser University’s “Cedar” supercomputer was built for big data, including artificial intelligence, with 146 Dell EMC PowerEdge C4130 servers with NVIDIA Tesla P100 GPUs. Canada’s most powerful academic supercomputer ranks No. 94 on the TOP500 and No. 13 on the Green500, helping researchers chart new territory across several areas such as to study the continually changing DNA code in bacteria.
Filmed in 4K60 at Supercomputing 2017 in Denver using Panasonic GH5 ($1999 at Amazon.com) on firmware 2.1 (aperture priority, AF continuous tracking) with Leica 12mm f1.4 ($1297 at Amazon.com) with Sennheiser MKE440 stereo shotgun microphone ($325 at Amazon.com), get $25 off renting cameras and lenses with my referral link at https://share.lensrentals.com/x/wWbHqV
Nvidia DGX Station is the world’s first and fastest personal supercomputer for leading-edge AI development at Supercomputing developers desk, it has the computing capacity of four server racks in a desk-friendly package, using less than one twentieth the power. It’s the only personal supercomputer with four Nvidia Tesla V100 GPUs, next generation Nvidia NVLink, and new Tensor Core architecture. DGX Station delivers 3X the training performance of today’s fastest workstations, with 480 TFLOPS of water cooled performance (3X Faster Than the Fastest Workstations) and FP16 precision. It’s designed to be whisper quiet at one tenth the noise of other deep learning workstations, it’s designed for easy experimentation at the office.
Filmed in 4K60 at Supercomputing 2017 in Denver using Panasonic GH5 ($1999 at Amazon.com) on firmware 2.1 (aperture priority, AF continuous tracking) with Leica 12mm f1.4 ($1297 at Amazon.com) with Sennheiser MKE440 stereo shotgun microphone ($325 at Amazon.com), get $25 off renting cameras and lenses with my referral link at https://share.lensrentals.com/x/wWbHqV
Red Hat Enterprise Linux is now fully supported on ARM server-optimized SoC’s designed for cloud and hyperscale, telco and edge computing, as well as high-performance computing, for SoC’s such as the Cavium ThunderX2 and the Qualcomm Centriq2400, and OEM partners, like HPE for the Apollo 70, through the culmination of a multi-year collaboration with silicon and hardware partners and the upstream community. Over the past 7 years, Red Hat has helped to drive open standards and develop communities of customers, partners and a broad ecosystem. Our goal was to develop a single operating platform across multiple 64-bit ARMv8-A server-class SoCs from various suppliers while using the same sources to build user functionality and consistent feature set that enables customers to deploy across a range of server implementations while maintaining application compatibility.
Filmed in 4K60 at Supercomputing 2017 in Denver using Panasonic GH5 ($1999 at Amazon.com) on firmware 2.1 (aperture priority, AF continuous tracking) with Leica 12mm f1.4 ($1297 at Amazon.com) with Sennheiser MKE440 stereo shotgun microphone ($325 at Amazon.com), get $25 off renting cameras and lenses with my referral link at https://share.lensrentals.com/x/wWbHqV
Fujitsu is developing a very powerful ARM processor for its Post-K exascale supercomputer, to have a much wider impact on the HPC market than just a single system. Riken, Japan’s largest and most prestigious scientific research institute, will be the recipient of the Post-K system. This HPC optimized ARM processor design is being done in collaboration with ARM integrating SVE (Scalable Vector Extension), extending the vector processing capabilities associated with AArch64 (64bit) execution in the ARM architecture, enabling implementation choices for vector lengths that scale from 128 to 2048 bits, enabling High Performance Scientific Compute featuring advanced vectorizing compilers to extract more fine-grain parallelism from existing code to reduce software deployment effort. SVE also supports a vector-length agnostic (VLA) programming model that can adapt to the available vector length. When the Post-K Supercomputer is ready, which may be around 2020-2022, and if it lives up to its near-exascale performance promise, it will be eight times faster than today’s most powerful supercomputer in the world, China’s Sunway TaihuLight. The Post-K system will be used to model climate change, predict disasters, develop drugs and fuels, and run other scientific simulations. The Fujitsu Post-K ARM processors are likely to be 10nm FinFET chips fabricated by TSMC, and will feature high-bandwidth memory and the Tofu 6D interconnect mesh that was developed for the original K Supercomputer.
Filmed in 4K60 at Supercomputing 2017 in Denver using Panasonic GH5 ($1999 at Amazon.com) on firmware 2.1 (aperture priority, AF continuous tracking) with Leica 12mm f1.4 ($1297 at Amazon.com) with Sennheiser MKE440 stereo shotgun microphone ($325 at Amazon.com), get $25 off renting cameras and lenses with my referral link at https://share.lensrentals.com/x/wWbHqV
Cray announces the world’s first production-ready ARM Powered supercomputer based on the Cavium ThunderX2 64bit ARMv8-A processor, added to the Cray XC50 supercomputer enabling the world’s most flexible supercomputers, available in both liquid-cooled cabinets and air-cooled cabinets, to be available in the second quarter of 2018. Featuring a full software environment, including the Cray Linux Environment, the Cray Programming Environment, and ARM-optimized compilers, with ARM’s upcoming SVE technology as the most efficient path to achieving the vision of exascale, ARM libraries, and tools for running today’s supercomputing workloads, with the Cray Aries interconnect. Cray’s enhanced compilers and programming environment achieves more performance out of the Cavium ThunderX2 processors, up to 20 percent faster performance compared with other public domain ARMv8 compilers such as LLVM and GNU.
Cray is currently working with multiple supercomputing centers on the development of the ARM-based supercomputing systems, including various labs in the United States Department of Energy and the GW4 alliance, a coalition of four leading, research-intensive universities in the UK. Through an alliance with Cray and the Met Office in the UK, GW4 is designing and building “Isambard,” an Arm-based Cray XC50 supercomputer. The GW4 Isambard project aims to deliver the world’s first Arm-based, production-quality HPC service. My video includes an interview with Professor Simon McIntosh-Smith from the University of Bristol who says that Ease of use, robustness, and performance, are all critical for a production service, and their early experiences with Cray’s ThunderX2 systems and end-to-end ARM software environment are very promising. All of the real scientific codes they’ve tried so far have worked out of the box, and they’re also seeing performance competitive with the best in class. Having access to Cray’s optimized HPC software stack of compilers and libraries in addition to all of the open-source tools has been a real advantage.
Filmed in 4K60 at Supercomputing 2017 in Denver using Panasonic GH5 ($1999 at Amazon.com) on firmware 2.1 (aperture priority, AF continuous tracking) with Leica 12mm f1.4 ($1297 at Amazon.com) with Sennheiser MKE440 stereo shotgun microphone ($325 at Amazon.com), get $25 off renting cameras and lenses with my referral link at https://share.lensrentals.com/x/wWbHqV
Cavium announces ThunderX2 ARM Server systems now available for customers in server and high performance Supercomputing, partners include Bull/Atos, Cray, Gigabyte, Penguin, Ingrasys/Foxconn and HPE. After 7 years of work by partners in the ARM Server ecosystem (and 7 years of my ARM Server video-blogging), now is finally the time high performance ARM Server systems are launched for cloud computing, high performance computing markets worldwide. The Cavium ThunderX2 server SoC integrates fully out-of-order, high-performance custom cores supporting single and dual-socket configurations. ThunderX2 is optimized to drive high computational performance delivering outstanding memory bandwidth and memory capacity. The new line of ThunderX2 processors includes multiple SKUs for both scale up and scale out applications and is fully compliant with Armv8-A architecture specifications as well as the Arm Server Base System Architecture and Arm Server Base Boot Requirements standards.
ThunderX2 SoC family is supported by a comprehensive software ecosystem ranging from platform level systems management and firmware to commercial Operating Systems, Development Environments and Applications. Cavium has actively engaged in server industry standards groups such as UEFI and delivered numerous reference platforms to a broad array of community and corporate partners. Cavium has also demonstrated its leadership role in the Open Source software community driving upstream kernel enablement and toolchain optimization, actively contributing to Linaro’s Enterprise and Networking Groups, investing in key Linux Foundation projects such as DPDK, OpenHPC, OPNFV and Xen and sponsoring the FreeBSD Foundation’s Armv8 server implementation.
Filmed in 4K60 at Supercomputing 2017 in Denver using Panasonic GH5 ($1999 at Amazon.com) on firmware 2.1 (aperture priority, AF continuous tracking) with Leica 12mm f1.4 ($1297 at Amazon.com) with Sennheiser MKE440 stereo shotgun microphone ($325 at Amazon.com), get $25 off renting cameras and lenses with my referral link at https://share.lensrentals.com/x/wWbHqV
HP Enterprise unveils their HPC optimized Cavium ThunderX2 ARM Powered High Performance Computing platforms, the Apollo 70 is a disruptive ARM HPC processor technology with maximum memory bandwidth, familiar management and performance tools, and the density and scalability required for large HPC cluster deployments. And then HPE Labs unveils The Machine which is also powered by a Cavium ThuderX2, it is HPE’s vision for the future of computing as by 2020, one hundred billion connected devices will generate far more demand for computing than today’s infrastructure can accommodate.
The Machine is a custom-built device made for the era of big data. HPE says it has created the world’s largest single-memory computer. The R&D program is the largest in the history of HPE, the former enterprise division of HP that split apart from the consumer-focused division. If the project works, it could be transformative for society. But it is no small effort, as it could require a whole new kind of software. HPE’s prototype can accomodate up to 160 terabytes of memory, capable of simultaneously working with the data held in every book in the Library of Congress five times over — or approximately 160 million books. According to HPE, it has never been possible to hold and manipulate whole data sets of this size in a single-memory system, and this is just a glimpse of the immense potential of Memory-Driven Computing. Following the GenZ Consortium’s vision, based on the current prototype, HPE expects the architecture can scale to an exabyte-scale single-memory system and, beyond that, to a nearly limitless pool of memory — 4,096 yottabytes. For context, that is 250,000 times the entire digital universe today. With that amount of memory, HPE said it will be possible to simultaneously work with every digital health record of every person on earth, every piece of data from Facebook, every trip of Google’s autonomous vehicles, and every data set from space exploration all at the same time — getting to answers and uncovering new opportunities at unprecedented speeds.
Filmed in 4K60 at Supercomputing 2017 in Denver using Panasonic GH5 ($1999 at Amazon.com) on firmware 2.1 (aperture priority, AF continuous tracking) with Leica 12mm f1.4 ($1297 at Amazon.com) with Sennheiser MKE440 stereo shotgun microphone ($325 at Amazon.com), get $25 off renting cameras and lenses with my referral link at https://share.lensrentals.com/x/wWbHqV
Wu Feng, co-founder of the Green500, talks about the challenges to reach exascale through energy efficient super computing, with massive parallel processing at the Denver Supercomputing 2017 conference. Wu Feng is a Professor and Turner Fellow of Computer Science with additional appointments in Electrical & Computer Engineering, Health Sciences, and Biomedical Engineering and Mechanics at Virginia Tech (VT). At VT, he directs the Synergy Laboratory, which conducts research at the synergistic intersection of systems software, middleware, and application software; of particular note is his high-performance computing (HPC) research in the areas of green supercomputing, accelerator-based parallel computing, and bioinformatics. Prior to joining VT, he spent seven years at Los Alamos National Laboratory, where he began his journey in green supercomputing in 2001 with Green Destiny, a 240-node supercomputer in 5 square feet and consuming only 3.2 kW of power when booted diskless. This work ultimately created the impetus for the Green500.
Filmed in 4K60 at Supercomputing 2017 in Denver using Panasonic GH5 ($1999 at Amazon.com) on firmware 2.1 (aperture priority, AF continuous tracking) with Leica 12mm f1.4 ($1297 at Amazon.com) with Sennheiser MKE440 stereo shotgun microphone ($325 at Amazon.com), get $25 off renting cameras and lenses with my referral link at https://share.lensrentals.com/x/wWbHqV