eMAG is a family of high-performance ARM server processors designed by Ampere Computing. Ampere’s introduction of eMAG to the market concludes and follows on the X-Gene3 design started out by AppliedMicro. eMAG processors targets server workloads capable of taking advantage of a high core count with high throughput. First generation eMAG processors are based on the Skylark microarchitecture, a design that started out by AppliedMicro. Fabricated on TSMC’s 16FF+ process, those processors feature up to 32 cores operating at up to 3.3 GHz. DDR4 channels, up to 2666 MT/s with ECC; 1 TiB/socket I/O: 42 PCIe Gen 3 lanes TDP: Up to 125 W Second generation eMAG processors are planned for 2019. Those chips will be based on Ampere’s Quicksilver microarchitecture and feature an array of new features and improvements developed also with the new staff that Ampere hired over from Qualcomm’s ARM Server team.
SmugMug achieves 40% cost savings by migrating their photo-serving tier to EC2 A1 instances. SmugMug is able to move their software stack (PHP, Nginx, HAProxy) to A1 instances with minimal effort. And getting everything up and running on A1 instances was like any other EC2 instance for SmugMug.
The Neoverse N1 CPU is optimized for a wide range of cloud native server workloads executing at a world-class compute efficiency. This enables an infrastructure transformation where processing is pushed to the edge where data is generated, thereby providing more scalability than moving all data to centralized datacenters.
The Arm Neoverse E1 CPU delivers best-in-class throughput efficiency. It incorporates a new simultaneous multithreading (SMT) microarchitecture design. With SMT, the processor can execute two threads concurrently resulting in better aggregate throughput performance.
The Neoverse E1 delivers 2.1x more compute performance, 2.7x more throughput performance and 2.4x better throughput efficiency compared to the Cortex-A53. The design is highly scalable to support throughput demands for next generation edge to core data transport.
Jon Masters is the Computer Architect, Chief Arm Architect with extensive experience at Red Hat with the deeper levels of CPU and Software optimization, adapting and preparing the ecosystem of Cloud, Supercomputing, talks about all the latest Arm Servers, including those coming up with the Marvell ThunderX2, Qualcomm Centriq 2400, Ampere eMag, talking about ARM 10nm vs Intel 14nm, talking about his involvement fighting the industry’s Meltdown and Spectre vulnerabilities, explaining some of the latest things done by Linaro in this space.
Fujitsu A64FX is the new fastest Arm processor in the world, built on 7nm it has 2.7 TFLOPS performance per chip suitable for high-end HPC and AI, they aim to create with it the world’s fastest supercomputer with it by 2021. A64FX is the first processor using the new Armv8-A Scalable Vector Extension (SVE) to accelerate a wide range of large-scale scientific computing, including deep learning. Fujitsu is working closely with Linaro to enrich the Arm HPC ecosystem. A64FX will be featured in the post-K computer, a supercomputer being developed by Fujitsu and RIKEN as a successor to the K computer, which achieved the world’s highest performance in 2011. The organizations are striving to achieve post-K application execution performance up to 100 times that of the K computer. It offers a number of features, including broad utility supporting a wide range of applications, massive parallelization through the Tofu interconnect, low power consumption, and mainframe-class reliability.
You can watch Fujitsu’s keynote at Linaro Connect here
The Works on Arm cluster is run by Packet for Arm to provide test, development, and data center CI/CD resources for community projects to build on arm64. The project also includes a weekly video office hours, a weekly newsletter, and a channel on the Packet Community Slack and Freenode IRC (#worksonarm) for community discussion.
Tao Wang, Leader of the Talent Development Working Group at the Green Computing Consortium, to bring better energy efficiency for the Chinese server market. China might mandate that ARM Servers must be used to reduce power consumption for cloud services, a demand that is growing very fast in China. Filmed at the Linaro Connect Hong Kong.
You can find the slideshow about this here: https://www.slideshare.net/linaroorg/hkg18319-dr-tao-wang-gcc-step-into-green-computing-cornsortium
HXT Semiconductor is a partnership between Qualcomm and China local Government of Guizhou Province to create ARM Server chipsets for the Chinese market. HXT is working with Linaro in the Open Source community working in the LEG Linaro Enterprise Group, to get good Linux support on the ARM Server. With eventual announcements for the HXT ARM Server products to come.
Arm ServerReady is a program to make sure that the ecosystem is enabled to support the ARM server, making sure that all the operating systems just work and can be installed without a lot of patches and stuff. They ask ODM and Silicon Providers to work with ARM to comply with the standards to make sure everything just is working. Linaro LEG also did an SBSA QEMU effort, that is well aligned with the Arm ServerReady Program letting people run the tests even before the hardware is available.
You can find the slideshow about this here: https://www.slideshare.net/linaroorg/hkg18317-arm-server-ready-program
GIGABYTE shows their Cavium ThunderX2 Workstation, an upcoming product for ARM software developmers to optimize their code for the ARM server market. It will feature dual Cavium ThunderX2 processors with 4 channels of RDIMM/LRDIMM DDR4 2666/2400Mhz memory per socket, and total capacity of 16 x DIMMS. Networking will include a dedicated MLAN port. Other specifications are still under adjustment for the final product which is something that GIGABYTE and Cavium are discussing with potential customers to satisfy a demand.
The R181-T90 is a 1U height dual socket general purpose ThunderX2 rack server with 8 channels RDIMM / ECC UDIMM DDR4 memory, 24 x DIMM slots, 1 x 25GbE SFP28 LAN port, 1 x 10GbE SFP+ LAN port (optional), 12 x 2.5” hot-swap HDD bays, 2 x OCP mezzanine slots (PCIe 3.0 x16), Aspeed AST2500 management controller, and 1+1 1600W 80 PLUS Platinum PSU. The R281-T91 is a 2U height dual socket general purpose ThunderX2 rack server with 8 channels RDIMM / ECC UDIMM DDR4 memory, 24 x DIMM slots, 1 x 25GbE SFP28 LAN port, 1 x 10GbE SFP+ LAN port (optional), 24 x 2.5” hot-swap HDD bays, 8 x PCIe 3.0 expansion slots, Aspeed AST2500 management controller, and 1+1 1600W 80 PLUS Platinum PSU. The R181-T90 and R281-T91 will be available to order from July 2018. The H261-T60 is a 2U height 4 node density optimized ThunderX2 server with dual ThunderX2 CN9975 sockets for each node (8 x sockets in total) and rear access to the node trays. The sockets will support a CPU up to 195W TDP. Each node supports 4 channels RDIMM / ECC UDIMM DDR4 memory, with 64 x DIMM slots for the system in total. The system contains in total 8 x SFP28 10G/25G LAN ports, 4 dedicated management ports, 12 x 3.5” SATA/SAS hot-swap HDD/SSD bays, 8 x low profile PCIe Gen3 expansion slots, 4 x OCP Gen3 mezzanine slots, and the system includes Aspeed AST2500 remote management controller and 1 + 1 2200W 80 PLUS Platinum redundant PSU. The H261-T60 will be available for shipping in late September or early October 2018. Find more information on GIGABYTE’s server products at http://b2b.gigabyte.com
Patrick Kennedy, Editor-in-Chief at ServeTheHome.com talks about the independent benchmarks on ThunderX2 that he published at ServeTheHome.com as Cavium announced General Availability of the ThunderX2 ARM Server at their event in San Francisco last month.
The ThunderX2 family includes over 40 different SKUs for both scale up and scale out applications, ranging from top bin 32 core 2.5GHz parts to 16-core 1.6GHz parts, mapping directly across Intel’s Xeon Skylake server CPUs from highest end Platinum to low end SKUs. With list prices for volume SKUs (32 core 2.2GHz and below) ranging from $1795 to $800, the ThunderX2 family offers 2-4X better performance per dollar compared to Xeon Skylake family of processors. The ThunderX2 family is fully compliant with Armv8-A architecture specifications as well as the Arm Server Base System Architecture and Arm Server Base Boot Requirements standards. The ThunderX2 SoC family is supported by a comprehensive software ecosystem, ranging from platform level systems management and firmware to commercial Operating Systems, Development Environments and Applications. Cavium has actively engaged in server industry standards groups such as UEFI and delivered numerous reference platforms to a broad array of community and corporate partners. Cavium has also demonstrated its leadership role in the Open Source software community driving upstream kernel enablement and toolchain optimization, actively contributing to Linaro’s Enterprise and Networking Groups, investing in key Linux Foundation projects such as DPDK, OpenHPC, OPNFV and Xen and sponsoring the FreeBSD Foundation’s Armv8 server implementation.
The new Arm Allinea Studio release is a comprehensive and integrated tools suite to help Scientific computing, HPC and Enterprise developers to achieve best performance on modern server-class Arm-based platforms. Check out https://developer.arm.com/hpc for more info.
Singularity enables users to have full control of their environment. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data. This means that you don’t have to ask your cluster admin to install anything for you – you can put it in a Singularity container and run. Did you already invest in Docker? The Singularity software can import your Docker images without having Docker installed or being a superuser. Need to share your code? Put it in a Singularity container and your collaborator won’t have to go through the pain of installing missing dependencies. Do you need to run a different operating system entirely? You can “swap out” the operating system on your host for a different one within a Singularity container. As the user, you are in control of the extent to which your container interacts with its host. There can be seamless integration, or little to no communication at all. Read more: http://singularity.lbl.gov/index.html
Here I film at the UK launch of the Blade Shadow cloud PC gaming service, at the Meltdown London e-sports bar, where they have setup a bunch of Shadow PCs for cloud based gaming PC streaming service
here powering their League of Legends tournament. Blade Shadow is a French startup who I also interviewed at CES here they have now launched their service to cover the UK and they have also activated their West Coast USA server to serve customers in California. For about $35 per month you get remote access to “your own” Xeon gaming desktop with an Nvidia GTX1080 GPU, 12GB RAM, 256GB SSD running Windows 10 Pro.
Since 2011, Mont-Blanc pushes the adoption of Arm technology in the High Performance Computing deploying Arm-based prototypes, enhancing system software ecosystem and projecting performance of current systems for developing new, more powerful and less power hungry HPC computing platforms based on Arm SoC.
In the talk Filippo introduces the last Mont-Blanc system, called Dibona, designed and integrated by the coordinator and industrial partner of the project, Bull/ATOS. He also talks about tests performed at BSC of the Arm software tools (HPC compiler and mathematical libraries) as well as the Dynamic Load Balancing (DLB) technique and the MUltiscale Simulator Architecture (MUSA).
At SC17, Qualcomm and Mellanox jointly showcase super-fast 100 Gb/s networking card Mellanox ConnectX-5. This Arm-based solution is enabled and ready for the most demanding compute and storage workloads on the Qualcomm Centriq 2400, the world’s first 10nm server processor.
Qualcomm Centriq 2400 processor, based on the Qualcomm Falkor CPU, QDT’s own Armv8-based custom CPU core design, delivers leading-edge aggregate performance, the world’s first and only 10nm server processor. The Qualcomm Centriq 2400 processor delivers a phenomenal performance-per-dollar. With a list price of $1,995, the 48-core Qualcomm Centriq 2460 processor delivers 4X better performance-per-dollar versus Intel’s highest-performance Skylake processor, the Intel Xeon Platinum 8180. With a list price of $1373, the 46-core Qualcomm Centriq 2452 processor offers 3X better performance-per-dollar versus Intel Xeon Gold 6152. And, with a list price of $888, the 40-core Qualcomm Centriq 2434 processor offers 2X better performance-per-dollar versus Intel Xeon Silver 41166. Qualcomm Centriq 2400 delivers 2.5x better performance per watt than competing x86 server processor running the same SPECint_rate2006 benchmark.