BTreeSet and BTreeMap append optimization
When the largest key of one tree is less than the smallest key of another tree, there is a way to merge trees using only O(log n) operations.
Maybe logarithmic merge should be a separate method.
I'm ready to work on it.
UPD: Also sometimes it may be better to push elements of one tree to another one with O(min_length log(max_length)) operations instead of rebuild with O(n + m) operations (like in #32526).
But the problem is that it's hard to determine which approach is better in a particular case.
CC @gereeter
Andrei Tonkikh at 2016-07-07 06:59:06
Am I missing something? I don't see append() on BTreeSet
Alfie John at 2016-11-25 11:17:36
@alfiedotwtf what's wrong with this one? https://doc.rust-lang.org/std/collections/struct.BTreeSet.html#method.append
Andrei Tonkikh at 2016-11-25 11:38:11
Gah! I don't know how I managed to land on old documentation:
https://doc.rust-lang.org/1.6.0/std/collections/struct.BTreeSet.html
Thanks @xosmig :)
Alfie John at 2016-11-25 11:43:32
I would love to see this fixed! I have recently developed a data structure [1], which is supposed to be faster than BTreeSet for a specific usage pattern, but I can't write a fair benchmark until BTreeSet supports a
O(k + log n)delete-range operation.[1] https://users.rust-lang.org/t/announcing-teardowntree/8259
Kirill Khazan at 2016-12-05 11:20:09
I think perhaps the current
appendimplementation is a little problematic. Arguably, one would expect a "batch" operation likeset.append(other_set)to be generally the better choice when you have to insert one set into the other, because presumably it is able to take into consideration certain optimizations that can not be made when inserting one item at a time. However, the complexityO(n + m)means that if, say,nis large andmis small, the cost of callingappendis enormous.I ran into this problem today, where for a relatively large computational problem, in which I called
appendroughly 500k times on withn ~ 100kandm ~ 10, my program would not finish for half an hour. After changing toinsertin a loop, the program finished in 2 seconds.There are two problems here, as I see it:
appendsounds like a straightforward batch operation, so at least I personally wouldn't expect it to have such a severe drawback compared to repeated insertion for unbalanced set sizes.- This drawback is not communicated by the docs for
appenditself.
Of course, ideally
appendwould work well in all cases, but deciding when to pick each strategy is surely not a simple problem...Andreas Borgen Longva at 2020-08-18 19:03:34
cc @ssomers (due to their many improvements to BTreeMap/BTreeSet in the last couple of years)
memoryruins at 2020-08-18 21:36:09
Rest assured, I was subscribed... It's not hard to imrpove this in many cases, as Andlon proved, but hard to prove it doesn't become worse in some other situations. A lot of benchmarking to do.
Stein Somers at 2020-08-18 21:56:47
I was bitten by this in the context of purposefully trying to merge a smaller
BTreeSetinto a larger one[^1]. Neither the function name nor the documentation helped to diagnose the issue. I think that improving the documentation forappendwould already help significantly.[^1]: "Small to Large" (😅) optimization: https://soi.ch/wiki/smaller-to-larger/
Matheus Cardoso at 2023-11-17 21:37:40