Changeset View
Standalone View
source/simulation2/system/ComponentDataHolder.cpp
- This file was added.
/* Copyright (C) 2022 Wildfire Games. | ||||||||||||||||
* This file is part of 0 A.D. | ||||||||||||||||
* | ||||||||||||||||
* 0 A.D. is free software: you can redistribute it and/or modify | ||||||||||||||||
* it under the terms of the GNU General Public License as published by | ||||||||||||||||
* the Free Software Foundation, either version 2 of the License, or | ||||||||||||||||
* (at your option) any later version. | ||||||||||||||||
* | ||||||||||||||||
* 0 A.D. is distributed in the hope that it will be useful, | ||||||||||||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of | ||||||||||||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||||||||||||||||
* GNU General Public License for more details. | ||||||||||||||||
* | ||||||||||||||||
* You should have received a copy of the GNU General Public License | ||||||||||||||||
* along with 0 A.D. If not, see <http://www.gnu.org/licenses/>. | ||||||||||||||||
*/ | ||||||||||||||||
#include "precompiled.h" | ||||||||||||||||
#include "ComponentDataHolder.h" | ||||||||||||||||
#include <cstdlib> | ||||||||||||||||
StanUnsubmitted Done Inline Actions
Stan: | ||||||||||||||||
Done Inline ActionsStill missing the newline :) Stan: Still missing the newline :) | ||||||||||||||||
Done Inline Actions
Stan: | ||||||||||||||||
size_t AlignRequirement(size_t size, size_t align) | ||||||||||||||||
{ | ||||||||||||||||
Done Inline ActionsAlignment must be accounted. vladislavbelov: Alignment must be accounted. | ||||||||||||||||
// https://stackoverflow.com/questions/61647725/custom-allocator-and-memory-align | ||||||||||||||||
return size % align == 0 ? size : size + (align - size % align); | ||||||||||||||||
Done Inline ActionsI suspect something should be happening with align here but I don't know what. Mercury: I suspect something should be happening with align here but I don't know what. | ||||||||||||||||
Done Inline Actionspropably: phosit: propably:
`m_Offset = std::max(size, align)`
`m_Start = aligned_alloc(align, m_PoolSize)` | ||||||||||||||||
Done Inline ActionsVulcan doesn't seem to like std::aligned_alloc, is it not on C++17? Mercury: Vulcan doesn't seem to like std::aligned_alloc, is it not on C++17? | ||||||||||||||||
Done Inline ActionsAnd std::max(size, align) is wrong, sorry for the advice. phosit: And `std::max(size, align)` is wrong, sorry for the advice. | ||||||||||||||||
Done Inline ActionsMight work with boost::align::aligned_alloc but only if you really need it. Stan: Might work with boost::align::aligned_alloc but only if you really need it. | ||||||||||||||||
Done Inline ActionsI found this expression: size % align == 0 ? size : size + (align - size % align); Might be overkill for our use case but supposedly covers all potential alignment situations. Regarding aligned_alloc I think that is only used either for interfacing with esoteric apis or as a potential performance optimization. My testing shows no performance improvement but rather a slight but consistent slowdown when using std::aligned_alloc or an equivalent cross platform function that I wrote. So it seems like malloc is the best choice. Here is the aligned alloc function I wrote, in case we do need it for some reason: std::pair<void*, void*> AlignedStorage(size_t align, size_t size) { void* storage; align = std::max(align, (size_t)32); void* start = storage = malloc(size + align); // the while loop could be replaced with bit shift while (reinterpret_cast<size_t>(start) % align != 0) start = static_cast<unsigned char*>(start) + 1; return std::make_pair(storage, start); } Mercury: I found this expression:
size % align == 0 ? size : size + (align - size % align);
on… | ||||||||||||||||
} | ||||||||||||||||
Done Inline Actions
Stan: https://trac.wildfiregames.com/wiki/Coding_Conventions | ||||||||||||||||
/** | ||||||||||||||||
Done Inline ActionsCan you reserve memory using an array or something ? https://code.wildfiregames.com/rP24517 Stan: Can you reserve memory using an array or something ? https://code.wildfiregames.com/rP24517 | ||||||||||||||||
Done Inline ActionsI could do Mercury: I could do
m_Start = static_cast<void*>(new unsigned char[m_PoolSize]);
But that seems less… | ||||||||||||||||
Done Inline ActionsCould be a unique_ptr of that size using make unique? ,(Just wondering) Stan: Could be a unique_ptr of that size using make unique? ,(Just wondering) | ||||||||||||||||
Done Inline ActionsYeah probably could be, I don't think there is any need but I guess there isn't really any cost either. Mercury: Yeah probably could be, I don't think there is any need but I guess there isn't really any cost… | ||||||||||||||||
* @class CComponentDataHolder | ||||||||||||||||
* @brief This is a pool allocator for a type with constant size. It can grow as needed but performs best with a high 'chunkCapacity' | ||||||||||||||||
Done Inline ActionsNot implemented on msvc and unlikely to ever be. There is _aligned_malloc/free instead. Which is worse than using boost. lyv: Not implemented on msvc and unlikely to ever be. There is _aligned_malloc/free instead. Which… | ||||||||||||||||
*/ | ||||||||||||||||
CComponentDataHolder::CComponentDataHolder(size_t size, size_t align, uint32_t chunkCapacity) | ||||||||||||||||
phositUnsubmitted Done Inline ActionsIs there a difference between chunk and pool? phosit: Is there a difference between chunk and pool? | ||||||||||||||||
MercuryAuthorUnsubmitted Done Inline ActionsRemoved reference to chunk, changed to pool. I don't know what the proper nomenclature is here. Mercury: Removed reference to chunk, changed to pool.
I don't know what the proper nomenclature is here. | ||||||||||||||||
{ | ||||||||||||||||
m_Offset = AlignRequirement(size, align); | ||||||||||||||||
Done Inline Actionsredundant phosit: redundant | ||||||||||||||||
Done Inline Actions
Stan: | ||||||||||||||||
m_OpenSlots.reserve(chunkCapacity / 2); | ||||||||||||||||
m_PoolSize = chunkCapacity * m_Offset; | ||||||||||||||||
phositUnsubmitted Done Inline ActionsThis should be chunkCapacity * size + m_Offset phosit: This should be `chunkCapacity * size + m_Offset` | ||||||||||||||||
MercuryAuthorUnsubmitted Done Inline ActionsSize is considered when calculating m_Offset with AlignRequirement. I renamed it m_SlotSize to make the purpose clearer. Mercury: Size is considered when calculating m_Offset with AlignRequirement.
I renamed it m_SlotSize to… | ||||||||||||||||
MercuryAuthorUnsubmitted Done Inline ActionsIncidentally I think AlignmentRequirement isn't needed and we could just use size directly. m_ChunkSize = size; Accounting for alignment beyond that seems like trying to solve a problem that will never come up in this use case, even if it is technically necessary if you are making a universal allocator which could handle any data type. Mercury: Incidentally I think AlignmentRequirement isn't needed and we could just use size directly. | ||||||||||||||||
MakePool(); | ||||||||||||||||
} | ||||||||||||||||
CComponentDataHolder::~CComponentDataHolder() | ||||||||||||||||
{ | ||||||||||||||||
for (std::byte* pool : m_Pools) | ||||||||||||||||
free(pool); | ||||||||||||||||
} | ||||||||||||||||
Done Inline Actions
nit: Could use similar idiom from ::ResetState here to free all buffers in the m_Pools vector jprahman: nit: Could use similar idiom from ::ResetState here to free all buffers in the m_Pools vector | ||||||||||||||||
Done Inline Actions
? Stan: ? | ||||||||||||||||
void CComponentDataHolder::ResetState() | ||||||||||||||||
{ | ||||||||||||||||
while (m_Pools.size() > 1) | ||||||||||||||||
phositUnsubmitted Done Inline ActionsI would use a for loop so .size() is only called once phosit: I would use a for loop so `.size()` is only called once | ||||||||||||||||
StanUnsubmitted Done Inline ActionsCould just cache the variable. Stan: Could just cache the variable. | ||||||||||||||||
phositUnsubmitted Done Inline ActionsThat was not what i ment. I descriped it wrong :') std::for_each(m_Pools.begin() + 1, m_Pools.end(), free); m_Pools.resize(1); phosit: That was not what i ment. I descriped it wrong :')
I think the loop-body should not interact… | ||||||||||||||||
{ | ||||||||||||||||
Done Inline ActionsI assume both are past the end, so it should be >= phosit: I assume both are past the end, so it should be `>=` | ||||||||||||||||
Done Inline ActionsShould we check for size ? Stan: Should we check for size ? | ||||||||||||||||
Done Inline ActionsLike checking if size is one? If it is then begin + 1 == end so for_each will do nothing anyway. Mercury: Like checking if size is one? If it is then begin + 1 == end so for_each will do nothing anyway. | ||||||||||||||||
free(m_Pools.back()); | ||||||||||||||||
m_Pools.pop_back(); | ||||||||||||||||
} | ||||||||||||||||
m_Back = m_Pools.front(); | ||||||||||||||||
Done Inline Actions
Stan: | ||||||||||||||||
m_OpenSlots.clear(); | ||||||||||||||||
Done Inline Actions
Shouldn't you reserve capacity for m_OpenSlots after it changes? Stan: Shouldn't you reserve capacity for m_OpenSlots after it changes? | ||||||||||||||||
} | ||||||||||||||||
Done Inline Actions
Stan: | ||||||||||||||||
std::byte* CComponentDataHolder::Allocate() | ||||||||||||||||
{ | ||||||||||||||||
std::byte* output; | ||||||||||||||||
Done Inline Actions
No need for the offset here ? Stan: No need for the offset here ? | ||||||||||||||||
Done Inline ActionsOffset is added on line 60 regardless of if realloc is called. Mercury: Offset is added on line 60 regardless of if realloc is called. | ||||||||||||||||
if (m_OpenSlots.empty()) | ||||||||||||||||
{ | ||||||||||||||||
if (m_Back == m_End) | ||||||||||||||||
MakePool(); | ||||||||||||||||
output = m_Back; | ||||||||||||||||
m_Back += m_Offset; | ||||||||||||||||
} | ||||||||||||||||
else | ||||||||||||||||
{ | ||||||||||||||||
output = m_OpenSlots.back(); | ||||||||||||||||
m_OpenSlots.pop_back(); | ||||||||||||||||
} | ||||||||||||||||
return output; | ||||||||||||||||
} | ||||||||||||||||
Done Inline Actionsconst ? Stan: const ? | ||||||||||||||||
Done Inline ActionsMy understanding of const keyword on methods is that it informs the compiler that none of the object's members will be altered within this call. Deallocate alters m_OpenList. Mercury: My understanding of const keyword on methods is that it informs the compiler that none of the… | ||||||||||||||||
void CComponentDataHolder::Deallocate(IComponent* ptr) | ||||||||||||||||
{ | ||||||||||||||||
Done Inline Actions
Stan: | ||||||||||||||||
ptr->SetInvalid(); | ||||||||||||||||
m_OpenSlots.push_back(reinterpret_cast<std::byte*>(ptr)); | ||||||||||||||||
} | ||||||||||||||||
void CComponentDataHolder::MakePool() | ||||||||||||||||
{ | ||||||||||||||||
m_Back = static_cast<std::byte*>(malloc(m_PoolSize)); | ||||||||||||||||
Done Inline Actions
Stan: | ||||||||||||||||
Done Inline Actions^ Stan: ^ | ||||||||||||||||
m_Pools.push_back(m_Back); | ||||||||||||||||
m_End = m_Back + m_PoolSize; | ||||||||||||||||
} | ||||||||||||||||
std::vector<std::byte*> CComponentDataHolder::Pools() const { return m_Pools; } | ||||||||||||||||
size_t CComponentDataHolder::Offset() const { return m_Offset; } | ||||||||||||||||
size_t CComponentDataHolder::PoolSize() const { return m_PoolSize; } | ||||||||||||||||
std::byte* CComponentDataHolder::Back() const { return m_Back; } | ||||||||||||||||
Done Inline Actionsnullptr; Stan: nullptr; | ||||||||||||||||
Done Inline Actions
Stan: | ||||||||||||||||
Done Inline Actions
Stan: | ||||||||||||||||
Done Inline ActionsDesign question, is there a particular reason to implement an explicit generator with a ::Next() method rather than having CComponentDataHolder follow the iterator protocol with ::begin() + ::end() methods? Main reason I can think of is that iterators don't always play nice when the underlying container mutates during iteration, but would be curious since implementing the iterator protocol would enable range for loops instead of while (item = gen.next()) jprahman: Design question, is there a particular reason to implement an explicit generator with a ::Next… | ||||||||||||||||
Done Inline Actions
Mercury: 1. I don't know how.
2. IMO it's more complicated without clear benefit. | ||||||||||||||||
Done Inline ActionsAlternative to checking IsValid() on each pointer when iterating (and wasting CPU on invalid elements) would be to allocate extra memory for a header placed before the object itself, and keep a linked list pointer there. CComponentDataHolder could then keep track of two linked lists, one of free elements ready to allocate, one of already allocated elements to iterate over. Iteration then is really really cheap as only allocated elements are iterated over. The list can also be kept in memory address order (with next being the first allocated/free element with a higher memory address, similarly for prev) so that scans through the list always increase in addresses so the CPU cache prefetching works as expected. I've done this sort of thing in my day job, and it's pretty simple to have a struct Header { Header* next; Header* prev; } just before each object (extra space needs to be allocated). Then allocation simply flips the pointers to move the header stub from the free list to the allocated list. If we're worried about the size of two pointers (16 bytes), the pointers could be replaced with smaller 32 or 16 bit offset values from a base pointer instead. Also, to increase the space savings, the double linked list could be replaced with a single linked list because each element is fixed size, and going backwards in the list could be implemented by iterating backwards until the next free/allocated element is located (free/allocated bit would be needed in the header, but that could be stashed in the low order bits of the next pointer since at least a few bits at the bottom of next would be 0). There is other metadata that could be added to that header too (things like ensuring the object is deallocated to the correct memory pool, and other assertions for correctness), but that's more of a stretch. jprahman: Alternative to checking IsValid() on each pointer when iterating (and wasting CPU on invalid… | ||||||||||||||||
Done Inline ActionsMy thinking here was that in order to reduce lag the primary target shouldn't be speed in general but rather speed in the worst case turns. Real-time specialized code. I think those worse case turns are at the start of a battle, when large numbers of projectiles spawn. Any performance boost down in the 'valleys' is just bonus to the primary goal of reducing the height of the 'peaks' of the turn length graph. During these peaks there will be no empty slots. The number of operations per step is 2 for iterating the pool directly or with a linked list. What led me to believe that not using the list would be better is the reduced bookkeeping on allocation and deallocation. Some probably half baked further reasoning based on my limited understanding of hardware:
So with this thinking in mind I decided the best plan would be to write both versions and test them against each other. So far I have only written this one. It could end up that even if no list is better in isolation list is better combined with code which takes advantage of the down time better. One option here is calculating a dynamic time budget for SpiderMonkey GC slice, rather then the current strategy of always giving 30ms per turn. Incidentally I think further optimization could be found in the GC code by removing the shrink every 500 turns. Similarly, down time could be used to compact the pools. I know some components related to pathing cannot be moved, will have to find out if any other component types can't move. Mercury: My thinking here was that in order to reduce lag the primary target shouldn't be speed in… | ||||||||||||||||
Done Inline ActionsI think we're ok with this version for now. Was mainly thinking back to some similar allocators I've written in the past. But this current version already gives a nice performance boost, so it's likely not worth the trouble in this case. On component movement, I'd expect it'll be pretty tricky to try moving objects around here. jprahman: I think we're ok with this version for now. Was mainly thinking back to some similar allocators… | ||||||||||||||||
// This is a generator to iterate over the allocator pool, it performs best under high strain due to density. | ||||||||||||||||
SComponentDataGenerator::SComponentDataGenerator(const CComponentDataHolder* data) | ||||||||||||||||
{ | ||||||||||||||||
m_Data = data; | ||||||||||||||||
Done Inline Actions
Stan: | ||||||||||||||||
m_LastIndex = m_Data->Pools().size() - 1; | ||||||||||||||||
Reset(); | ||||||||||||||||
} | ||||||||||||||||
void SComponentDataGenerator::SetPoolIndex(uint32_t index) | ||||||||||||||||
{ | ||||||||||||||||
Done Inline Actions::Pools() returns by value not reference, so this is copying the underlying vector. IMO CComponentDataHolder::Pools() should be updated to return by reference to avoid a copy jprahman: ::Pools() returns by value not reference, so this is copying the underlying vector. IMO… | ||||||||||||||||
m_PoolIndex = index; | ||||||||||||||||
m_Ptr = m_Data->Pools()[index]; | ||||||||||||||||
if (index == m_LastIndex) | ||||||||||||||||
m_Back = m_Data->Back(); | ||||||||||||||||
Done Inline ActionsCurious why these are separate cases, instead of only using the first case? jprahman: Curious why these are separate cases, instead of only using the first case? | ||||||||||||||||
Done Inline ActionsWe are storing when the end of the current pool is. If it's the one we are currently extending with new allocations when m_OpenList is empty then we need m_Back. Otherwise this is an older pool that has already hit max size so the end is the start plus the max pool size. Mercury: We are storing when the end of the current pool is. If it's the one we are currently extending… | ||||||||||||||||
else | ||||||||||||||||
m_Back = m_Ptr + m_Data->PoolSize(); | ||||||||||||||||
} | ||||||||||||||||
void SComponentDataGenerator::Reset() { SetPoolIndex(0); } | ||||||||||||||||
IComponent* SComponentDataGenerator::Next() | ||||||||||||||||
{ | ||||||||||||||||
IComponent* output; | ||||||||||||||||
while (m_Ptr < m_Back) | ||||||||||||||||
{ | ||||||||||||||||
output = reinterpret_cast<IComponent*>(m_Ptr); | ||||||||||||||||
Done Inline ActionsDo we expect there to be many "holes" of invalid objects? Say when lots of units are dying? jprahman: Do we expect there to be many "holes" of invalid objects? Say when lots of units are dying? | ||||||||||||||||
Done Inline Actionsyes, see response on line 86. Mercury: yes, see response on line 86. | ||||||||||||||||
m_Ptr += m_Data->Offset(); | ||||||||||||||||
if (output->IsValid()) | ||||||||||||||||
return output; | ||||||||||||||||
} | ||||||||||||||||
if (m_PoolIndex == m_LastIndex) | ||||||||||||||||
return nullptr; | ||||||||||||||||
SetPoolIndex(m_PoolIndex + 1); | ||||||||||||||||
return Next(); | ||||||||||||||||
} |