Machine Inference Performance

The core aspects of the Xavier platform are its machine inferencing performance characteristics. The Volta GPU alongside the DLA core represent significant processing power in a compact and low-power platform.

To demonstrate the machine learning inference prowess of the system, NVIDIA provides the Jetson boards with a slew of software development kits as well as hand-tuned frameworks. The TensorRT framework in particular does a lot of heavy lifting for developers and represents the main API through which the GPU’s Tensor units as well as the DLA will be taken advantage of.

NVIDIA prepared a set of popular ML models for us to test out, and we’d be able to precisely configure the models in terms of how they were run on the platform. All the models running on the GPU and its Tensor core were able to run at either quantized INT8 forms, or in FP16 or FP32 forms. The batch sizes were also configurable, but we’ve kept it simple at just showcasing the results with a batch size of 32 images as NVIDIA claims this is the more representative use-case for autonomous machines.

Tegra Xavier AGX - NVIDIA TensorRT - GPU Performance

The results of the GPU benchmarks are a bit esoteric because we have few comparison points against which we can evaluate the performance of the AGX. Among the more clear results we see here is that the inferencing performance in absolute terms is reaching rather high rates, particularly in the INT8 and FP16 modes, representing sufficient performance to run a variety of inferencing tasks on a large number of input sets per second. The only real figure we can compare to anything in the mobile market is the VGG16 results compared to the AImark results in our most recent iPhone XS review, where Apple’s new NPU scored a performance of 39 inferences/second.

Tegra Xavier AGX - NVIDIA TensorRT - DLA vs GPU Performance

NVIDIA also made it possible to benchmark the DLA blocks, however this came with some caveats: The current version of the TensorRT framework was still a bit immature and thus doesn’t currently allow for running the models in INT8 mode, forcing us to resort to comparisons in FP16 mode. Furthermore I wasn’t able to run the tests with the same large batch size as on the GPU, so I’ve reverted to using smaller sizes of 16 and 8 where appropriate. Smaller batch sizes have more overhead as it takes proportionally longer time on the API side of things and less actual processing time on the hardware.

The performance of the DLA blocks at first glance seems a bit disappointing, as their performance is just a fraction of what the Volta GPU is able to showcase. However raw performance isn’t the main task of the DLA, it serves as a specialized offloading block which is able to operate at higher efficiency points than the GPU. Unfortunately, I wasn’t able to directly measure the power differences between the GPU and the DLA, as introducing my power measurement equipment into the DC power input of the board led to system instabilities, particularly during the current spikes when the benchmarks were launching their workloads. The GPU inference workloads did see the board power reach around ~45W while in its peak performance mode.

NVIDIA's VisionWorks Demos

All the talk about the machine vision and inferencing capabilities of the platform can be something that’s very hard to grasp if you don’t have a more intimate knowledge of the use-cases in the industry. Luckily, NVIDIA’s VisionWorks SDK comes with a slew of example demos and source code projects that one can use as a baseline for one’s commercial applications. Compiling the demos was a breeze as everything was set up for us on the review platform.

Alongside the demo videos, I also opted to showcase the power consumption of the Jetson AGX board. Here we’re measuring the power of the platform at the 19V DC power input with the board at its maximum unlimited performance mode. I had board’s own fan disabled (it can be annoyingly loud) and instead used an externally-powered 120mm bench fan blowing onto the kit. At a baseline power level, the board used ~8.7-9W while sitting idle and actively outputting to a 1080p screen via HDMI while also being connected to Gigabit Ethernet.

The first demo showcases the AGX’s feature tracking capabilities. The input source is a pre-recorded video to facilitate testing. While the video output was limited to 30fps, the algorithm was running in excess of 2-300fps. I did see quite a wide range of jitter in the algorithm fps, although this could be attributed to scheduling noise due to the low duration of the workload while in a limited FPS output mode. In terms of power, we see total system consumption hover around 14W, representing an active power increase of 5W above idle.

The second demo is an application of a Hough transform filter which serves as a feature extraction algorithm for further image analysis. Similarly to the first demo, the algorithm can run at a very high framerate on a single stream, but usually we’d expect a real use-case to use multiple input streams. Power consumption again is in the 14W range for the platform with an average active power of ~4.5W.

The motion estimation demo determines motion vectors of moving objects in a stream, a relatively straightforward use-case in automotive applications.

The fourth VisionWorks demo is the computational implementation of EIS (Electronic image stabilisation), were given an input video stream the system will crop out margins of the frame and use this space as the stabilisation window in which the resulting output stream will be able to elastically bounce against, reducing smaller juddery motions.

Finally, the most impressive demo which NVIDIA provided was the “DeepStream” demo. Here we see a total of 25 720p video input streams played back all simultaneously all while the system is performing basic object detection in every single one of them. This workload represented a much more realistic heavy use-case being able to take advantage of the processing power of the AGX module. As you might expect, power consumption of the board also rose dramatically, averaging around 40W (31W active work).

Introduction - What Is It? NVIDIA's Carmel CPU Core - SPEC2006 Speed
POST A COMMENT

51 Comments

View All Comments

  • xype - Friday, January 4, 2019 - link

    AnandTech is my reminder to turn the ad blocker back on if I turned it off for some reason. It’s insane how big of improvement in experience it is to block ads on AnandTech. Reply
  • Cellar Door - Friday, January 4, 2019 - link

    It is just a matter of time before we will get a message 'turn of your adblocker to proceed' - at that point I will abandon this site. For now, ublock origin keeps this site in check for me. Reply
  • DanNeely - Friday, January 4, 2019 - link

    FYI, 99% of the time I've found I could block notice complaining about having blocked various 3rd party malware distribution domains and still read the site with my crap blockers running. Reply
  • TheinsanegamerN - Friday, January 4, 2019 - link

    Or just use the anti ad blocker blocker in ublock origin. Reply
  • HollyDOL - Friday, January 4, 2019 - link

    I have to admit, AT taught me to install adblock, the level of ad annoyance climbed too high for me.
    I am still willing to pay a sub for a spam-free AT access.
    Reply
  • linuxgeex - Friday, November 8, 2019 - link

    It was THG that got me using AdBlock, but these days I turn off AdBlock on most of the sites I frequent and instead rely on ScriptSafe and Stylus to selectively disable the cruft. It's a little more work for me, but it allows sites I care about to still get revenue from the less annoying ad content, and I cross my fingers that they will learn to insert less annoying ads. Animated = blocked. Sound = bocked. Video = blocked. Causes content to jump around while loading = blocked. Inserts ads that look like navigation features = blocked (I'm looking at You, Google) Reply
  • Ryan Smith - Friday, January 4, 2019 - link

    "Why are there video ads automatically playing on each one of the Anandtech pages?"

    Our publisher (Future) has decided that they want to have this ad unit on every page. Unfortunately there's not much more I can say than that; it's their call.
    Reply
  • thesavvymage - Friday, January 4, 2019 - link

    :( Reply
  • thesavvymage - Friday, January 4, 2019 - link

    Could you at least speak to them on ad appropriateness? Mine are the usual low effort clickbait spam ads, or "The One Thing All Cheaters Have In Common" and "Seattle: Cable Companies are furious over this tiny device".

    Like I understand your publishers have to advertise, but crappy advertising like this gets the adblock treatment, point blank. Its an extremely frustrating experience for what is supposed to be a professional site.
    Reply
  • Ryan Smith - Saturday, January 5, 2019 - link

    "Could you at least speak to them on ad appropriateness?"

    It's something we discuss on a regular basis. Like any other ad-supported operation we're largely at the whims of the overall advertising market: who is willing to buy ads and at what price. On the whole, advertisers are being very cautious right now, especially with written publications.

    Future's size helps a lot with this, since they're a top publisher and can move some very large deals. Not that it's a dire situation or anything nearly like that, but continual erosion in ad rates makes it difficult to get any ads rolled back.
    Reply

Log in

Don't have an account? Sign up now