Related Discussions
...

ExNull-Tyelor [.. ] and by also setting the Animation Update setting to "In Update" rather than "Manual Update", I was able to get between 40-50 FPS depending on which SkeletonAnimation we are rendering.

Did you perhaps confuse the two update modes? In Update would be the default, and Manual Update would mean that no automatic update happens and you need to manually call update on the SkeletonAnimation component yourself.

    Mario Could you (or your artists) show the stats from the metrics view for each skeleton?

    Hey, certainly! Here is the Metrics View for each of the three skeletons we have rendering on the GameScene:

    Player:

    Opponent:

    Crowd:

    Mario As a general guideline, have a look at the performance section and see how it applies to your specific setup: http://en.esotericsoftware.com/spine-metrics#Performance

    I've read through those guidelines before while we were still working in Cocos2d-x, but it never hurts to read through them again, maybe I missed something or forgot something crucial to the performance here (though these same skeletons ran quite well in Cocos2d-x compared to Unity).

    Harald Did you perhaps confuse the two update modes? In Update would be the default, and Manual Update would mean that no automatic update happens and you need to manually call update on the SkeletonAnimation component yourself.

    No, and I apologize if my question was confusing. I initially had the skeletons set to Manual Update mode
    and was calling Update on the skeleton myself manually within a controller class. I had meant that I commented out the manual calls to Update and then switched the Skeletons to In Update instead of manually calling Update on the Skeleton in the controller class's Update method. I'm not sure whether or not this actually affected performance at this point in time however. I can confirm lowing the number of skeletons rendered from 3 to 2 to 1 all gave performance improvements though. Nearly up to 55 45 FPS for just the Player skeleton.

      Thanks for posting the additional info. In general, the skeleton metrics values are all on the high side, which however might be justified depending on your scene setup, having only three skeletons visible. Obviously it would always help to reduce the vertex, bone, constraint, etc. counts, and especially avoiding using clipping polygons.

      FYI: There is also a section on performance recommendations on the spine-unity documentation page here:
      https://esotericsoftware.com/spine-unity#Performance

      ExNull-Tyelor Opponent:

      Please note that the opponent skeleton is using a clipping polygon, and also has a high vertex count of 1900 vertices. So if the clipping polygon could be avoided, this would be much recommended as mentioned in the documentation. You could have a quick try whether disabling clipping at the opponent SkeletonAnimation component improves the situation by changing the Inspector to Debug mode and at the SkeletonAnimation component disabling Use Clipping. (Afterwards you can change the Inspector back to normal mode of course).

      Please also be sure to measure FPS and general timings with Deep profile setting disabled and with release builds, otherwise profiler augmentations might distort the actual timings.

        Harald Thanks for posting the additional info. In general, the skeleton metrics values are all on the high side, which however might be justified depending on your scene setup, having only three skeletons visible. Obviously it would always help to reduce the vertex, bone, constraint, etc. counts, and especially avoiding using clipping polygons.

        If you want a bit of extra context, check out Election Year Knockout. We are making a sequel to our game, and so the Spine characters are essentially the entire game, hence the high metric count.

        Harald FYI: There is also a section on performance recommendations on the spine-unity documentation page here:

        Thanks! I went and read through there, but didn't see any "a-ha!"s or gotchas that I think I might've missed there unfortunately...

        Harald You could have a quick try whether disabling clipping at the opponent SkeletonAnimation component improves the situation by changing the Inspector to Debug mode and at the SkeletonAnimation component disabling Use Clipping.

        Harald Please also be sure to measure FPS and general timings with Deep profile setting disabled and with release builds, otherwise profiler augmentations might distort the actual timings.

        I went and gave these a quick test, making sure I wasn't using a development build/profiling. With only the Opponent spine rendering, and nothing else, we get around 28 FPS on the Android TV regardless of whether or not the clipping is enabled or disabled (we are targeting 60 FPS in game and 30 FPS in menu). The moment another SkeletonAnimations is added (either the player or crowd) we lose about 10 FPS.

        Something else I'd like to point out is that the exact same Player and Crowd Skeletons, along with different Opponent Skeletons are currently used in Cocos2d-x and run at around 60 FPS on the same Android TV. I don't think any of us anticipated such a massive difference in performance for the Spine Runtimes between C++/Cocos2d-x and C#/Unity, and it's quite disheartening to be honest since most of our players on Election Year Knockout are on Android TV...

          ExNull-Tyelor I wouldn't necessarily put the blame on the runtime itself, it sounds like the Unity renderer might be at fault here, especially if the Cocos2d-x version runs smoothly. Our Cocos2d-x renderer is actually a custom thing that directly calls into OpenGL.

          I'm not super familiar with Unity's renderer options, but my guess is whatever you selected might be too much for Android TV devices. These devices generally use SoCs with GPUs that are terrible when it comes to alpha blended overdraw and complex fragment shaders. If you're using a Unity renderer that uses a complex fragment shader internally, that might be what is causing the performance issues. What Unity version and renderer are you using?

            Mario I wouldn't necessarily put the blame on the runtime itself, it sounds like the Unity renderer might be at fault here, especially if the Cocos2d-x version runs smoothly. Our Cocos2d-x renderer is actually a custom thing that directly calls into OpenGL.

            If that were indeed the case I'd expect the bottleneck to be in the 'GPU Usage' and not the 'CPU Usage' in the Unity Profiler, however I see the exact opposite. Specifically in my first post you can see the calls on the CPU that are taking the most time are all stemming from SkeletonAnimation.LateUpdate with over 55 ms used to LateUpdate the Skeleton's mesh.

            Mario I'm not super familiar with Unity's renderer options, but my guess is whatever you selected might be too much for Android TV devices. These devices generally use SoCs with GPUs that are terrible when it comes to alpha blended overdraw and complex fragment shaders. If you're using a Unity renderer that uses a complex fragment shader internally, that might be what is causing the performance issues. What Unity version and renderer are you using?

            We're not using HDRP/URP or anything like that, just the default renderer that Unity is packaged with for a 2D Mobile Project template. Specifically we are using Unity 2021.3.20f1 at the moment, one of currently supported and suggested versions for Unity. We have the Adaptive Performance package as well, but haven't done anything with that API yet. As far as I'm aware this should be one of the most lightweight setups for Unity rendering and Spine.

            We also aren't doing any complex blending or anything like that on our Crowd nor our Player, and are just using the default Spine Shaders/Materials for those as well. On the opponent we are doing a slightly modified version of the Spine shader for outlines, but even with only the Player and Crowd instantiated we get about _ FPS (on a release build without profiling). The Crowd and Player do both have their slots colored programatically on their Awake calls, to allow for player customization and matching the crowd to the background colors, but this is only done once on Awake. We also did this same exact thing, on the same exact skeletons, in EYKO using Cocos2d-x, so it seems very unlikely that this could be the issue (especially since the bottleneck definitely appears to be in the CPU usage). 😞

            Edit: After profiling without Deep Profiling (as I had done for my very first post), I do indeed see the issue is likely GPU bound, though I'm struggling to see how Unity's renderer is so much less efficient than Cocos2d-x's OpenGL renderer. I know that Unity's renderer is heavy but I still assumed far better performance than this on TV. Removing the use of an outline (or custom rim-light shader) does actually get us about 10 extra FPS, from an unstable 20. However we are still not getting 60 FPS. But after removing that we are seeing around 30 FPS (though extremely unstable) with all three skeletons. The profiler now is indicating that it is indeed likely GPU bound since the biggest CPU call is now Gfx.WaitForPresentOnGfxThread which indicates that the CPU is waiting a while for the GPU to finish rendering. Why this might be the case I'm not sure, as without any Skeletons it easily manages to hit a stable 60 FPS as well with these scenes.

            With the nature of our game we can't really afford to change the way the artists create these skeletons, and I don't really see much way to simplify our renderer or shaders much more than they already are to get better performance, unless there is a simplified Spine shader suitable for our use case on Mobile/TV devices.

            Edit 2: I've opened a forum post here on the Unity Forums to hopefully get some opinions, suggestions, and recommendations from Unity-specific experts/developers.

              ExNull-Tyelor We're not using HDRP/URP or anything like that, just the default renderer that Unity is packaged with for a 2D Mobile Project template. Specifically we are using Unity 2021.3.20f1 at the moment, one of currently supported and suggested versions for Unity. We have the Adaptive Performance package as well, but haven't done anything with that API yet. As far as I'm aware this should be one of the most lightweight setups for Unity rendering and Spine.

              Please note that the standard render pipeline is not the most-lightweight pipeline. Universal Render Pipeline (URP), which is the successor of the Lightweight Render Pipeline (LWRP), is usually the first recommendation for mobile devices. It's the lightweight counterpart to High Definition Render Pipeline (HDRP)

              Could you please share a screenshot of your Project Settings - Player settings? You could have a try whether using the IL2CPP scripting backend improves the situation.

                Harald Could you please share a screenshot of your Project Settings - Player settings? You could have a try whether using the IL2CPP scripting backend improves the situation.

                Certainly! (Sorry the image is so long, the spoiler tag doesn't shorten the empty space the image takes up on these forums, or I'd have hidden it in a spoiler.)

                After setting our scripting backend to IL2CPP (which I thought I already did, but must've done only for iOS), as well as disabling the outline shader effect, we get... an unstable 30 FPS still with all three skeletons (though it might be more stable than without IL2CPP). Most of the time here is still being spent waiting for the GPU in the script as far as I can tell (Gfx.WaitForPresentOnGfxThread).

                Some other testing has shown that using URP, with the default Spine URP Example 2D Scene from Spine, I only get about 15 FPS on the Android TV (while profiling), with most of the time being spent waiting for the GPU in the script (Gfx.WaitForPresentOnGfxThread).

                After disabling all the point and directional lights in that scene I get around 57-59 FPS (while profiling) but with spikes down to 30 FPS, seemingly on a cyclical thing. Again these spikes seem to also be due to time being spent waiting for the GPU (Gfx.WaitForPresentOnGfxThread).

                I'll report back here once I test all three skeletons rendering with URP next.

                Edit: Welp. Trying URP with the three skeletons, Player, Opponent, and Crowd, is not boding well so far.

                Using URP with the default Spine Skeleton shaders only gives us about 15 FPS now, whereas the default renderer with default shaders got us around 30 FPS (albeit unstably). Using URP with the Universal Render Pipeline/2D/Spine/Sprite shader gives us a whopping 6 FPS 😳 Then finally using URP with the Universal Render Pipeline/2D/Spine/SkeletonLit shader gives us 10 FPS. GPU Instancing doesn't seem to make a difference here either.

                In all three of these tests there were no normal maps, emission maps, fragment shaders (outside of any you've used for your shaders in Spine's Unity Package), no light sources, etc. It was quite literally three skeletons, a camera, a global light 2d, and a single canvas with a single TextMeshPro on it to display the FPS averaged over the last 60 frames. About as simple of a scene as we could make for this test.

                After disabling all the point and directional lights in that scene

                How many point and directional lights do you have? I assume you had no such thing in Cocos2d-x. The way those are implemented in Unity's renderers means your fragment shader is likely super complex, which would explain the high GPU side load on those low-performance Android TV chipsets.

                  Mario How many point and directional lights do you have? I assume you had no such thing in Cocos2d-x. The way those are implemented in Unity's renderers means your fragment shader is likely super complex, which would explain the high GPU side load on those low-performance Android TV chipsets.

                  We have no lighting in either our Cocos2d-x game or our Unity game. I was referring to Spine's URP 2D Example Scene which is preloaded with several point and directional lights. (Specifically the one that has two StretchymanURPs and the RaptorProURP skeleton in it.)

                  Edit: I've updated the post above with more testing using the URP with our three skeletons (the Player, Opponent, and Crowd). The results were... less than promising 😞

                  Edit 2: I'd also like to add that I just attempted to disable all post-processing, shadows, and lighting in the URP Asset data, and only get about 20 FPS with that setup (though its still unstable with spikes down to 15 FPS).

                  @ExNull-Tyelor Very sorry to hear that the situation did not improve with using the URP pipeline! To be sure, could you perhaps share the settings of the used Universal Render Pipeline Asset that is assigned at Project Settings - Graphics - Scriptable Render Pipeline Settings. Also please check whether under Project Settings - Quality there is no Render Pipeline Asset assigned which overrides the one in the Graphics section. If you haven't already, please be sure to set everything under the Lighting section to the minimal settings, to e.g. not use multiple secondary per-pixel lights.

                  Additionally, could you please share your Material settings you are using at your skeletons? Does the situation change if you are using an unlit URP shader (either Universal Render Pipeline/Spine/Skeleton)?

                    Harald To be sure, could you perhaps share the settings of the used Universal Render Pipeline Asset that is assigned at Project Settings - Graphics - Scriptable Render Pipeline Settings.

                    Sure! Here are the settings I used for my URP Asset during testing:

                    Then the settings in the Renderer Data Asset are:

                    Harald Also please check whether under Project Settings - Quality there is no Render Pipeline Asset assigned which overrides the one in the Graphics section.

                    I assigned the URP Asset to each Render Pipeline Asset slot for each Quality Level in Project Settings - Quality and not in the Project Settings - Graphics since I assumed this would give us greater control to have more complex Render Pipelines for Desktop builds vs Mobile builds. Should I assign it through the Graphics settings instead?

                    Harald Additionally, could you please share your Material settings you are using at your skeletons? Does the situation change if you are using an unlit URP shader (either Universal Render Pipeline/Spine/Skeleton)?

                    That was one of the shaders I was attempting to use for this experiment actually. The settings I'm using are:

                    I'm using the same general settings for the other two skeletons materials as well (one for the Opponent and two for the Player).

                      Hate to be a bother, but any extra help or information you might be able to offer @Harald ?

                      A small test showed a rotating capsule was able to hit 60 FPS on this Android TV while using URP. Though the RenderFrame was taking nearly 10 ms 😳

                        ExNull-Tyelor Sorry for the wait! Yesterday was a public holiday and Harald will be back today, so please wait a little longer for him.

                          Sorry for the late reply!

                          ExNull-Tyelor I assigned the URP Asset to each Render Pipeline Asset slot for each Quality Level in Project Settings - Quality and not in the Project Settings - Graphics since I assumed this would give us greater control to have more complex Render Pipelines for Desktop builds vs Mobile builds. Should I assign it through the Graphics settings instead?

                          No, that's fine as well, you just need to be sure which render pipeline asset will be used, to tweak the actually used asset.

                          While I don't think it will improve the situation, you could have a try changing Light Blend Styles entry Rim to Mask Texture Channel from R to None. I doubt that any unnecessary pass will be triggered if no lights are active at all, but just to be sure no unnecessary overhead is generated.

                          ExNull-Tyelor That was one of the shaders I was attempting to use for this experiment actually. The settings I'm using are:

                          Here you are using a lit 2D shader (truncated, but likely Universal Render Pipeline/2D/Spine/Skeleton Lit). Please use an unlit non-2D shader if you don't need any lighting applied. Like Universal Render Pipeline/Spine/Skeleton.

                            Misaki ExNull-Tyelor Sorry for the wait! Yesterday was a public holiday and Harald will be back today, so please wait a little longer for him.
                            Harald Sorry for the late reply!

                            Not a problem at all, and I apologize for trying to bother you all on your holiday!

                            Harald Here you are using a lit 2D shader (truncated, but likely Universal Render Pipeline/2D/Spine/Skeleton Lit). Please use an unlit non-2D shader if you don't need any lighting applied. Like Universal Render Pipeline/Spine/Skeleton.

                            Interestingly, changing the Skeleton's shaders to the one you suggested, Universal Render Pipeline/Spine/Skeleton, causes the Skeletons to stop rendering entirely.

                            Edit: Actually both the Universal Render Pipeline/Spine/Skeleton and Universal Render Pipeline/Spine/Skeleton Lit cause the Skeleton to stop rendering entirely, as does Universal Render Pipeline/Spine/Sprite. Though Universal Render Pipeline/Spine/Outline/Skeleton-OutlineOnly does render only the outline of the Skeleton. Only the Spine shaders under Universal Render Pipeline/2D/Spine/ are working (Skeleton Lit and Sprite) are actually rendering somewhat properly, but the Sprite shader creates outlines between all the symbols that aren't in the actual atlas file.

                            Harald While I don't think it will improve the situation, you could have a try changing Light Blend Styles entry Rim to Mask Texture Channel from R to None. I doubt that any unnecessary pass will be triggered if no lights are active at all, but just to be sure no unnecessary overhead is generated.

                            Doing this with the Lit Skeleton shaders doesn't seem to give any noticeable improvement in performance. Though I can't seem to use the un-lit shaders like you suggested above to see if there is a difference there...

                              Not a problem at all, and I apologize for trying to bother you all on your holiday!

                              No need to apologize, how should you know! 🙂

                              ExNull-Tyelor Interestingly, changing the Skeleton's shaders to the one you suggested, Universal Render Pipeline/Spine/Skeleton, causes the Skeletons to stop rendering entirely.

                              Terribly sorry, my mistake, for some reason I incorrectly assumed that the unlit Universal Render Pipeline/Spine/Skeleton shader would render normally with 2D renderer, however it is not finding the respective shader pass and does not render anything at all. We have just pushed a commit to the 4.1 branch which adds an unlit URP 2D shader, available under Universal Render Pipeline/2D/Spine/Skeleton.

                              A new Spine URP Shaders 4.1 UPM package is available for download here as usual:
                              https://esotericsoftware.com/spine-unity-download
                              Please let us know if this improves the situation.

                                Harald erribly sorry, my mistake, for some reason I incorrectly assumed that the unlit Universal Render Pipeline/Spine/Skeleton shader would render normally with 2D renderer, however it is not finding the respective shader pass and does not render anything at all. We have just pushed a commit to the 4.1 branch which adds an unlit URP 2D shader, available under Universal Render Pipeline/2D/Spine/Skeleton.

                                A new Spine URP Shaders 4.1 UPM package is available for download here as usual:
                                https://esotericsoftware.com/spine-unity-download
                                Please let us know if this improves the situation.

                                Hey no problem at all! I'm happy to report that after updating the package and then switching the shaders to Universal Render Pipeline/2D/Spine/Skeleton it basically doubles the frame rate from 15 FPS to 30-34 FPS on our Android TV device while profiling a Development Build!

                                Still not the 60 FPS we had hoped for, but we can probably get away with having a lower FPS for TVs 😅

                                Thank you and the rest of the Esoteric Software team so much for the help you have provided us! If you ever have any other help, suggestions, improvements or fixes that you think might help us out, we'd love to hear them as well.

                                For a little bit more info the frame debugger is showing that two of our meshes can't be SRP batched, but I'm not entirely sure why. Could it have something to do with this CBUFFER thing, since I assume the Android TV wouldn't support Vulkan.

                                The Gfx.WaitForPresentOnGfxThread seems to be quicker on a cyclical pattern as well, which seems odd to me. You can see this on the CPU Usage with these Gfx.PresentFrame spikes as well, taking between 13ms and 28ms to complete, which causes Gfx.WaitForPresentOnGfxThread to fluctuate between 5ms and 22ms.

                                Anywho, thank you again so much for the help you've already given us and for this un-lit 2d skeleton patch 😃

                                  ExNull-Tyelor Hey no problem at all! I'm happy to report that after updating the package and then switching the shaders to Universal Render Pipeline/2D/Spine/Skeleton it basically doubles the frame rate from 15 FPS to 30-34 FPS on our Android TV device while profiling a Development Build!

                                  Very glad to hear that it improved the situation, thanks for the info and for your kind words! 🙂 Please also be sure to judge the final timings with Development Build disabled.

                                  ExNull-Tyelor For a little bit more info the frame debugger is showing that two of our meshes can't be SRP batched, but I'm not entirely sure why. Could it have something to do with this CBUFFER thing, since I assume the Android TV wouldn't support Vulkan.

                                  This is indeed strange. Which Graphics API are you using in your Player settings? You could explicitly set a graphics API like OpenGLES3 instead of "Auto" to be sure which one is used. We will do some more investigations on our end, so far we received the expected SRP batching of skeletons with the new unlit shader on our end when using e.g. OpenGLES3.

                                  Apart from that, I see 29 draw calls in your profiler screenshot, but you mentioned you only have 3 skeletons active. I assume that each of your skeletons is not using a single atlas page texture but multiple. How many are you using per skeleton? While I'm, not sure how you setup the skins of your characters, you could consider either packing combined skins to a single atlas page, or grouping attachment images differently to atlases (or changing draw order, if possible), so that you don't have as many necessary texture switches, like avoiding ABABABA and instead grouping to e.g. AABBBAA which will then result in draw calls ABA.

                                  ExNull-Tyelor The Gfx.WaitForPresentOnGfxThread seems to be quicker on a cyclical pattern as well, which seems odd to me. You can see this on the CPU Usage with these Gfx.PresentFrame spikes as well, taking between 13ms and 28ms to complete, which causes Gfx.WaitForPresentOnGfxThread to fluctuate between 5ms and 22ms.

                                  A quick guess is that this could be due to unlucky timing when waiting for VSync, taking a bit longer and then having to wait for the next VSync timepoint.