Related Discussions
...

ExNull-Tyelor Hey no problem at all! I'm happy to report that after updating the package and then switching the shaders to Universal Render Pipeline/2D/Spine/Skeleton it basically doubles the frame rate from 15 FPS to 30-34 FPS on our Android TV device while profiling a Development Build!

Very glad to hear that it improved the situation, thanks for the info and for your kind words! 🙂 Please also be sure to judge the final timings with Development Build disabled.

ExNull-Tyelor For a little bit more info the frame debugger is showing that two of our meshes can't be SRP batched, but I'm not entirely sure why. Could it have something to do with this CBUFFER thing, since I assume the Android TV wouldn't support Vulkan.

This is indeed strange. Which Graphics API are you using in your Player settings? You could explicitly set a graphics API like OpenGLES3 instead of "Auto" to be sure which one is used. We will do some more investigations on our end, so far we received the expected SRP batching of skeletons with the new unlit shader on our end when using e.g. OpenGLES3.

Apart from that, I see 29 draw calls in your profiler screenshot, but you mentioned you only have 3 skeletons active. I assume that each of your skeletons is not using a single atlas page texture but multiple. How many are you using per skeleton? While I'm, not sure how you setup the skins of your characters, you could consider either packing combined skins to a single atlas page, or grouping attachment images differently to atlases (or changing draw order, if possible), so that you don't have as many necessary texture switches, like avoiding ABABABA and instead grouping to e.g. AABBBAA which will then result in draw calls ABA.

ExNull-Tyelor The Gfx.WaitForPresentOnGfxThread seems to be quicker on a cyclical pattern as well, which seems odd to me. You can see this on the CPU Usage with these Gfx.PresentFrame spikes as well, taking between 13ms and 28ms to complete, which causes Gfx.WaitForPresentOnGfxThread to fluctuate between 5ms and 22ms.

A quick guess is that this could be due to unlucky timing when waiting for VSync, taking a bit longer and then having to wait for the next VSync timepoint.

    Harald This is indeed strange. Which Graphics API are you using in your Player settings? You could explicitly set a graphics API like OpenGLES3 instead of "Auto" to be sure which one is used. We will do some more investigations on our end, so far we received the expected SRP batching of skeletons with the new unlit shader on our end when using e.g. OpenGLES3.

    I had it set to use Vulkan only, since on Android with Unity only Vulkan supports Dynamic Resolution. Setting it back to only OpenGLES3 causes the spikes above 60 FPS to become more narrow, gaining us about 2 FPS interestingly enough.

    Harald Apart from that, I see 29 draw calls in your profiler screenshot, but you mentioned you only have 3 skeletons active. I assume that each of your skeletons is not using a single atlas page texture but multiple. How many are you using per skeleton? While I'm, not sure how you setup the skins of your characters, you could consider either packing combined skins to a single atlas page, or grouping attachment images differently to atlases (or changing draw order, if possible), so that you don't have as many necessary texture switches, like avoiding ABABABA and instead grouping to e.g. AABBBAA which will then result in draw calls ABA.

    Is there some more information about this somewhere that I can give to the artists? I'm certain that this is what's happening, as only one of the skeletons has more than one atlas and uses skins (the player spine uses two atlases with several skin variations). In the inspector because of this it creates 29 materials for this one skeleton. But I don't do the art or work with Spine directly myself as the engineer on the team, so I'm not exactly sure how to explain this to the artists or help them solve this issue. Some additional context is that we are also dynamically coloring many of the region and mesh attachments for the player's skeleton as well. Would this, in addition to the second atlas, be causing the ABABABA[...] rendering loop, or is it solely on how the spine itself was created by the artists/how the atlas was exported.

    Rendering only the other two skeletons, opponent and crowd, brings us up to 50 FPS as well.

    Harald A quick guess is that this could be due to unlucky timing when waiting for VSync, taking a bit longer and then having to wait for the next VSync timepoint.

    Unfortunately that couldn't be it, as not only do I have VSync disabled, but both Every VBlank and Every Second VBlank are both ignored on Android, iOS, and tvOS according to the Project Settings window.

    Another curiosity is that the crowd skeleton is the only one I can see in the Frame Debugger. No frame that I click on seems to contain the opponent skeleton anywhere that I can see (and it uses an entirely different atlas than the crowd does). This is a screenshot of the frame debugger without the player skeleton rendering:

    Harald Very glad to hear that it improved the situation, thanks for the info and for your kind words! 🙂 Please also be sure to judge the final timings with Development Build disabled.

    In testing this I found with all three skeletons the Release Build had no noticeable improvement in FPS over the Development Build oddly enough. Perhaps because this scene and project are so simple?

      ExNull-Tyelor I had it set to use Vulkan only, since on Android with Unity only Vulkan supports Dynamic Resolution. Setting it back to only OpenGLES3 causes the spikes above 60 FPS to become more narrow, gaining us about 2 FPS interestingly enough.

      Thanks for the info! We will try if we can reproduce this issue using Vulkan as rendering API then.

      ExNull-Tyelor Is there some more information about this somewhere that I can give to the artists? I'm certain that this is what's happening, as only one of the skeletons has more than one atlas and uses skins (the player spine uses two atlases with several skin variations).

      You can find a short explanation here:
      https://esotericsoftware.com/spine-unity#Material-Switching-and-Draw-Calls
      Improving the draw order or grouping attachments differently to atlas page textures could certainly reduce the number of draw calls when you have only few atlas textures (i.e. few Materials displayed at the bottom in the skeleton's Inspector). What does the current list of materials at your skeleton look like in the Inspector, do you only see two materials alternating many times, or do you see many different materials (e.g. because you're using blend modes at certain slots and slot material overrides)? In any case, grouping attachments (ordering them so that they are in sequence in the draw order) that require the same material would "fuse" two or more draw calls to a single one.

      You might also try if runtime repacking fits all your used attachments into a single atlas texture, see sub-section "Runtime Repacking" here:
      https://esotericsoftware.com/spine-unity#Combining-Skins

      ExNull-Tyelor Some additional context is that we are also dynamically coloring many of the region and mesh attachments for the player's skeleton as well.

      If you only use slot colors and not slot material overrides (the CustomMaterialOverride property), this should not matter, since it's only writing different vertex colors then. It can be done in a single draw call.

      BTW: Are you using Additive blend mode at some slots perhaps? If so, and if you're not already using the default PMA workflow, you might want to switch from straight alpha to PMA, then additive slots can be rendered in a single draw-call.

      ExNull-Tyelor Unfortunately that couldn't be it, as not only do I have VSync disabled, but both Every VBlank and Every Second VBlank are both ignored on Android, iOS, and tvOS according to the Project Settings window.

      Ok, thanks for the info.

      ExNull-Tyelor Rendering only the other two skeletons, opponent and crowd, brings us up to 50 FPS as well.

      Hm, this unfortunately still does not sound stellar. 🤕

      ExNull-Tyelor Another curiosity is that the crowd skeleton is the only one I can see in the Frame Debugger. No frame that I click on seems to contain the opponent skeleton anywhere that I can see (and it uses an entirely different atlas than the crowd does). This is a screenshot of the frame debugger without the player skeleton rendering:

      That's strange indeed. You could do a test in the Unity Editor and watch the Game view while going back and forth in the draw-call-timeline at the top of the screenshot, then you should see when the crowd appears in the list of draw calls. If the list is completely different, that won't help of course..

      ExNull-Tyelor In testing this I found with all three skeletons the Release Build had no noticeable improvement in FPS over the Development Build oddly enough. Perhaps because this scene and project are so simple?

      Ok, that at least saves compile time for diffent builds when testing then 🙂. A quick guess it that it could be because it's GPU limited and any optimizations would mainly affect performance on the CPU side. Anyway, that's just speculations of course. 🙂

        Thanks, I'll give this a read through and show it to the artists as well!

        Harald What does the current list of materials at your skeleton look like in the Inspector, do you only see two materials alternating many times, or do you see many different materials (e.g. because you're using blend modes at certain slots and slot material overrides)? In any case, grouping attachments (ordering them so that they are in sequence in the draw order) that require the same material would "fuse" two or more draw calls to a single one.

        I see the same two materials alternating back and fourth 29 times, which is why I had assumed that this could be part of the issue and that "fusing" these 29 draw calls into 2 or 3 I'd think would get us closer to the 50 FPS with all three skeletons rendering. We also aren't doing any type of blending or anything with these spines, just the "Normal" blending mode.

        Harald You might also try if runtime repacking fits all your used attachments into a single atlas texture, see sub-section "Runtime Repacking" here:
        https://esotericsoftware.com/spine-unity#Combining-Skins

        I can try that out when we start using the skins we've implemented in this test scene, but right now I have only applied a single skin through the SkeletonAnimation inspector window. (I might be doing something wrong too, because a very quick test I threw together to test the Runtime Repacking feature gave me a malformed atlas and several errors in the Console window with the following format:
        Graphics.CopyTexture called with region not fitting in source element: trying to copy from region (x:2218, y:1147, width:194, height:136) of mip 0 (size: 2048x2048)

        I did set those textures to Read/Write as well, and it didn't improve the situation there.

        The script I used for this test was mostly CnP from your Spine Examples for Mix and Match, but here it is anyways in case you spot something I'm doing entirely wrong. My biggest guess as to what I'm doing wrong is that I'm using the wrong material, since the skeleton uses two materials (alternating back and forth 29 times as I mentioned earlier).

        using UnityEngine;
        using Spine.Unity;
        using Spine.Unity.AttachmentTools;
        
        public class DummyRepacker : MonoBehaviour
        {
            public SkeletonAnimation Skeleton;
        
            public Texture2D runtimeAtlas;
            public Material runtimeMaterial;
            
            void Start()
            {
                // Create a repacked skin.
                Spine.Skin repackedSkin = new Spine.Skin("repacked skin");
                Spine.Skin skin = Skeleton.skeleton.Data.FindSkin("male");
                repackedSkin.AddSkin(skin);
                skin = Skeleton.skeleton.Data.FindSkin("male_head_1");
                repackedSkin.AddSkin(skin);
        
                repackedSkin = repackedSkin.GetRepackedSkin("repacked skin", Skeleton.SkeletonDataAsset.atlasAssets[0].PrimaryMaterial, out runtimeMaterial, out runtimeAtlas);
                Skeleton.skeleton.SetSkin(repackedSkin);
        
                // Use the repacked skin.
                Skeleton.Skeleton.Skin = repackedSkin;
                Skeleton.Skeleton.SetSlotsToSetupPose();
                Skeleton.AnimationState.Apply(Skeleton.Skeleton); // skeletonMecanim.Update() for SkeletonMecanim
        
                // You can optionally clear the cache after multiple repack operations.
                AtlasUtilities.ClearCache();
                Resources.UnloadUnusedAssets();
            }
        }

        Harald That's strange indeed. You could do a test in the Unity Editor and watch the Game view while going back and forth in the draw-call-timeline at the top of the screenshot, then you should see when the crowd appears in the list of draw calls. If the list is completely different, that won't help of course..

        Sorry about that, it was just my misunderstanding of how the Frame Debugger is displayed. With both the Crowd and the Scam Caller they get properly rendered with SRP Batch in two draw calls, and if I disable one or the other I see a single draw call in SRP Batch now. So it's only the player spine that isn't getting properly batched, even with OpenGLES3 enabled.

        Harald Ok, that at least saves compile time for diffent builds when testing then 🙂. A quick guess it that it could be because it's GPU limited and any optimizations would mainly affect performance on the CPU side. Anyway, that's just speculations of course. 🙂

        Actually that makes a lot of sense, I hadn't thought of that. Because the CPU isn't being hit almost at all with these test I'm running, everything is GPU bound. I'm sure that the Unity Profiler doesn't use hardly any GPU bandwidth at all, and likely just eats up CPU Usage instead.

          ExNull-Tyelor I see the same two materials alternating back and fourth 29 times, which is why I had assumed that this could be part of the issue and that "fusing" these 29 draw calls into 2 or 3 I'd think would get us closer to the 50 FPS with all three skeletons rendering. We also aren't doing any type of blending or anything with these spines, just the "Normal" blending mode.

          Thanks for the additional info, that's good news regarding potential improvement, as 29 draw calls for a single skeleton sounds close to worst-case order. Then we would definitely recommend changing draw order or grouping of attachment images to atlas pages.

          ExNull-Tyelor I can try that out when we start using the skins we've implemented in this test scene, but right now I have only applied a single skin through the SkeletonAnimation inspector window. (I might be doing something wrong too, because a very quick test I threw together to test the Runtime Repacking feature gave me a malformed atlas and several errors in the Console window with the following format:
          Graphics.CopyTexture called with region not fitting in source element: trying to copy from region (x:2218, y:1147, width:194, height:136) of mip 0 (size: 2048x2048)

          I did set those textures to Read/Write as well, and it didn't improve the situation there.

          Did you check whether (apart from "Read/Write Enabled") the other requirements are met? You can find them in the note box listing "Important Note: If repacking fails or creates unexpected results, it is most likely due to any of the following causes:" in the Runtime Repacking section.

          If all of these are met, could you perhaps send us a minimal Unity project that still shows this issue? You can send it as a zip package to contact@esotericsoftware.com, briefly mentioning this forum thread URL so that we know the context.

          ExNull-Tyelor Sorry about that, it was just my misunderstanding of how the Frame Debugger is displayed. With both the Crowd and the Scam Caller they get properly rendered with SRP Batch in two draw calls, and if I disable one or the other I see a single draw call in SRP Batch now. So it's only the player spine that isn't getting properly batched, even with OpenGLES3 enabled.

          No need to apologize. That's good to hear at least. Then merging at least some of the player-skeleton's draw calls by avoiding unnecessary atlas page texture switches should improve the situation.

            Harald Thanks for the additional info, that's good news regarding potential improvement, as 29 draw calls for a single skeleton sounds close to worst-case order. Then we would definitely recommend changing draw order or grouping of attachment images to atlas pages.
            Harald No need to apologize. That's good to hear at least. Then merging at least some of the player-skeleton's draw calls by avoiding unnecessary atlas page texture switches should improve the situation.

            When the artist is ready to make this change for the new player character skeleton, I'll be sure to let you know how much of an improvement, if any, that we see 🙂

            Harald Did you check whether (apart from "Read/Write Enabled") the other requirements are met? You can find them in the note box listing "Important Note: If repacking fails or creates unexpected results, it is most likely due to any of the following causes:" in the Runtime Repacking section.

            If all of these are met, could you perhaps send us a minimal Unity project that still shows this issue? You can send it as a zip package to [contact@esotericsoftware.com](mailto:contact@esotericsoftware.com), briefly mentioning this forum thread URL so that we know the context.

            As far as I can tell all of these requirements are met, except that the Player's second atlas isn't a power of two texture. But Unity claims it does default the Texture Import Setting Non-Power of Two to None, so I believe the final requirement is fulfilled there as well. I do say claim, because I cannot actually see the Non-Power of Two setting anywhere on any of the textures I selected, nor in the Project/Unity Settings for defaults.

            I'm sending the minimal Unity project over to your support email right now. Thanks again for all your help so far!

              ExNull-Tyelor I'm sending the minimal Unity project over to your support email right now. Thanks again for all your help so far!

              Thanks for the reproduction project, we received everything.

              The problem is that the player_character texture is of size 4096x4096 while the "Max Size" settings are set to 2048x2048. If you increase max size accordingly to 4096x4096, repacking finishes without any error.

                Harald Thanks for the reproduction project, we received everything.

                The problem is that the player_character texture is of size 4096x4096 while the "Max Size" settings are set to 2048x2048. If you increase max size accordingly to 4096x4096, repacking finishes without any error.

                Doh! 🤦‍♂️ I figured it would be something simple like that, that I overlooked. Thanks for the quick response!

                I'll report back when I gauge the performance after atlas repacking on Android TV.

                Sorry, I tried to edit my post, but somehow lost my permissions to edit my own posts?

                Any who, reporting back with my findings now. Interestingly while repacking the skin does reduce the Player skeleton's draw calls to just 1, it isn't SRP Batched with the statement:
                SRP: Node is not compatible with SRP batcher

                So it doesn't get batched with the Crowd and Opponent Spines in the RenderLoop.DrawSRPBatcher call in the Frame Debugger, as shown here:

                Albeit the result isn't what I hoped for, it's interesting nonetheless that Atlas Repacking doesn't seem to improve the performance on this Android TV significantly, if at all, as we still get around 36 FPS on average. Nearly all of this time still is spent on Gfx.PresentFrame still, so the GPU is still being railed with this small amount of, what I'd hope to be, relatively simple draw calls... 😥

                  ExNull-Tyelor Harald A quick guess is that this could be due to unlucky timing when waiting for VSync, taking a bit longer and then having to wait for the next VSync timepoint.

                  Unfortunately that couldn't be it, as not only do I have VSync disabled, but both Every VBlank and Every Second VBlank are both ignored on Android, iOS, and tvOS according to the Project Settings window.

                  What I forgot to mention earlier is that the target Android device may simply ignore the VSync settings and force VSync (which is rather likely). It's also explicitly mentioned in this video by Unity staff:

                  Sorry, I tried to edit my post, but somehow lost my permissions to edit my own posts?

                  Sorry for the troubles. Due to many AI-created posts recently, we unfortunately had to disable editing of posts for users, sorry about that!

                  So it doesn't get batched with the Crowd and Opponent Spines in the RenderLoop.DrawSRPBatcher call in the Frame Debugger, as shown here:

                  Batching will normally only be possible when using the same texture (and shader). One exception to this is when Unity behind-the-scene packs multiple textures which are set to import type "2D/Sprite" (which is the case with your textures) to a single atlas. This packing by Unity happens at build-time however. As a result, when repacking your player skin at runtime, it ends up in a separate newly created texture, which will take up a separate draw call.

                  I'm not sure why it reads "Node is not compatible with SRP batcher" though. You might have a try disabling SRP batching, which will use normal batching instead then. Sometimes (not so rare according to Unity forums) this will even lead to improved performance.

                  ExNull-Tyelor Albeit the result isn't what I hoped for, it's interesting nonetheless that Atlas Repacking doesn't seem to improve the performance on this Android TV significantly, if at all, as we still get around 36 FPS on average. Nearly all of this time still is spent on Gfx.PresentFrame still, so the GPU is still being railed with this small amount of, what I'd hope to be, relatively simple draw calls... 😥

                  Really sorry to hear that. We will perform some investigations on our side as well, we'll let you know once we figure something out.

                    ExNull-Tyelor While I haven't found any project settings which would have any noticable impact, I discovered something else with your characters which has a massive negative impact on overdraw, and is likely the cause of the low framerate:

                    Your animations seem to always have all attachments at slots always enabled, and "disabled" by setting the slot color alpha value to 0. This however will still render the attachment, even if the used shader then only draws fully transparent pixels. And given that mobile devices are very sensitive to overdraw, avoiding drawing invisible attachments should yield you some very noticable performance improvements.

                    So you could either

                    • a) Change animation data. Either in the Spine Editor or via an import post-processing step, e.g. using a SkeletonDataModifierAsset and add keys at animations that set all attachments at slots to disabled when alpha is 0 and add keys that re-enable them when they become visible (more coding work involved).
                    • b) Changing MeshGenerator code to not render such attachments at slots with alpha == 0.
                      To achieve that, in the file MeshGenerator.cs replace any occurrance of:
                      if (!slot.Bone.Active)
                      with
                      if (!slot.Bone.Active || slot.A == 0.0f).

                      Harald What I forgot to mention earlier is that the target Android device may simply ignore the VSync settings and force VSync (which is rather likely). It's also explicitly mentioned in this video by Unity staff:

                      Interesting point to note, thanks for the info! Unity should probably change that warning message to be more clear then (and show it when "Disabled" is selected still 😅 )

                      Harald Sorry for the troubles. Due to many AI-created posts recently, we unfortunately had to disable editing of posts for users, sorry about that!

                      No need to be sorry, that's totally understandable. Everyone and their dog is using ChatGPT now lol.

                      Harald Batching will normally only be possible when using the same texture (and shader). One exception to this is when Unity behind-the-scene packs multiple textures which are set to import type "2D/Sprite" (which is the case with your textures) to a single atlas. This packing by Unity happens at build-time however. As a result, when repacking your player skin at runtime, it ends up in a separate newly created texture, which will take up a separate draw call.

                      I'm not sure why it reads "Node is not compatible with SRP batcher" though. You might have a try disabling SRP batching, which will use normal batching instead then. Sometimes (not so rare according to Unity forums) this will even lead to improved performance.

                      Ah, your explanation actually makes a lot of sense. It probably just is reading that it's not compatible with the SRP Batcher because the texture was created at runtime instead of build time, like you pointed out. I'll have a go at disabling SRP Batching though, and see what changes (if any) that makes to performance.

                      Harald Really sorry to hear that. We will perform some investigations on our side as well, we'll let you know once we figure something out.

                      No need to be sorry, you've already helped so much! Without your awesome software this game project wouldn't have been possible to begin with 😄

                      Harald Your animations seem to always have all attachments at slots always enabled, and "disabled" by setting the slot color alpha value to 0. This however will still render the attachment, even if the used shader then only draws fully transparent pixels. And given that mobile devices are very sensitive to overdraw, avoiding drawing invisible attachments should yield you some very noticable performance improvements.

                      So you could either
                      a) Change animation data. Either in the Spine Editor or via an import post-processing step, e.g. using a SkeletonDataModifierAsset and add keys at animations that set all attachments at slots to disabled when alpha is 0 and add keys that re-enable them when they become visible (more coding work involved).
                      b) Changing MeshGenerator code to not render such attachments at slots with alpha == 0.
                      To achieve that, in the file MeshGenerator.cs replace any occurrance of:
                      if (!slot.Bone.Active)
                      with
                      if (!slot.Bone.Active || slot.A == 0.0f).

                      Oh wow! That was the trick to get us up to a comfortable 60 FPS. Specifically I used your second option, as I had actually already assumed that was how it worked because that's how the Cocos2d-x MeshRenderer.cpp works. Thank you so much for this, that one change brought the time to present a frame on the Android TV down from 25 ms to 4 ms, a 625% increase in FPS!

                      As for why the attachments are keyed the way they are, according to the artists, back when they first started the original game, back in late 2018, the attachment visibility keying was buggy for them, and so they had to use the alpha keying to disable/enable attachments instead. Then that work flow for them just stuck around, even after Spine improved these features to be usable for them (I started the on that project in April 2019, so I have no experience with Spine, or their workflow back then). But when we checked the Cocos2d-x Source it had multiple spots where the MeshRenderer returned early if the attachment or slot had an opacity of 0. I'm sure there's a good reason for this difference, but I suppose I just incorrectly assumed that the Spine's Unity Renderer worked similarly.

                      Thank you again so much for all your help and support, and thank everyone else on the Esoteric Team for this awesome piece of software 😄

                        ExNull-Tyelor Oh wow! That was the trick to get us up to a comfortable 60 FPS. Specifically I used your second option, as I had actually already assumed that was how it worked because that's how the Cocos2d-x MeshRenderer.cpp works. Thank you so much for this, that one change brought the time to present a frame on the Android TV down from 25 ms to 4 ms, a 625% increase in FPS!

                        Very glad to hear that it finally worked out, thanks for the quick feedback! 🙂

                        ExNull-Tyelor But when we checked the Cocos2d-x Source it had multiple spots where the MeshRenderer returned early if the attachment or slot had an opacity of 0. I'm sure there's a good reason for this difference, but I suppose I just incorrectly assumed that the Spine's Unity Renderer worked similarly.

                        While the core runtime functionality is behaving exactly the same (they are kept in sync code-wise), the engine-specific runtime wrappers for rendering, asset management, and so on differ.

                        The reason for rendering enabled attachments with alpha value of 0 is that users of the runtime might use a custom shader and utilize the vertex alpha channel in a custom way to transport whatever information. However, as this is a very rare case, we will modify the MeshGenerator class accordingly (on the 4.2-beta branch going forward) so that the default option is to discard attachments with an alpha value of 0.

                        ExNull-Tyelor Thank you again so much for all your help and support, and thank everyone else on the Esoteric Team for this awesome piece of software 😄

                        Thanks very much for your kind words, really glad to hear that you like using Spine! 😃

                        10 días más tarde

                        I TOLD YOU GUYS WEEKS AGO! Listen to your elders. 😃

                        These devices generally use SoCs with GPUs that are terrible when it comes to alpha blended overdraw and complex fragment shaders.