So we have enough interest. Cool! Nobody expressed a wish to host yet, so I will take the first turn.
I have always loved sorting algorithm animations, for example at
https://www.toptal.com/developers/sorting-algorithms. I even wrote a sorting algorithm library at one point, which I never published. Hence the first challenge: implement a sorting algorithm!
Task and rules
You will write a
sort function. At the very least, this function should be able to sort a sequence of integers in ascending order. You can use the following example sequences to help yourself assess whether you are on the right track, although I will be testing your submissions with other sequences as well. Input sequence on the left, expected output sequence on the right:
Code: Select all
[] -> []
[1] -> [1]
[1, 2, 3] -> [1, 2, 3]
[3, 2, 1] -> [1, 2, 3]
[8, 4, 7, 3, 9, 2, 7, 5] -> [2, 3, 4, 5, 7, 7, 8, 9]
[2, 1, 1, 1, 2, 1, 2, 2, 1, 2] -> [1, 1, 1, 1, 1, 2, 2, 2, 2, 2]
Below is an example of a Python function that meets this minimal requirement:
Code: Select all
# This is an implementation of gnome sort. You can think of it as a gnome
# comparing two adjacent elements at a time, optionally swapping the elements
# and moving one position left or right at a time.
def sort(data):
index = 0
end = len(data) - 1
while index < end:
left = data[index]
right = data[index + 1]
if right < left:
# not sorted, swap the elements and move back
data[index:(index + 2)] = [right, left]
index = max(0, index - 1)
else:
# already sorted, move forward
index += 1
Of course, a solution may include other functions as well, but the function with the name
sort is understood to be the entry point.
You may accept the input data in a form that is conventional for your programming language, for example a list in Haskell, a table in Lua or two separate iterator arguments in C++. It is also up to you whether you modify the data in place, like in the example above, or leave the input data intact and produce a sorted copy as output.
Not everyone may be familiar with sorting algorithms. It is allowed to study existing sorting algorithms before you start. You may implement an existing algorithm,
including the example above; you do not need to invent a new one. However, you are not allowed to look at example code, pseudocode, visualizations or other explanatory material on sorting algorithms
while programming. In the end, you have to get your code to work on your own.
You are allowed to submit multiple solutions. It is also allowed to cooperate with other participants. You can work alone or in a team on a case-by-case basis, so for example one solution alone, one with partner A and one with partner B.
Awards
Solutions, rather than people, are awarded emoji badges based on particular qualities. Of course, submitters will be credited for their submissions. Other than that, nobody will earn points and no definitive winner will be chosen.
There is a broad range of possible badges, in order to cater to different strengths. Note that some badges are negative.
Spoiler (Show/Hide)
A solution that meets the minimal requirement is considered to pass, which earns it a
. A solution can receive up to five additional
for additional features:
- Supporting arbitrary data types, for example also strings.
- Supporting descending order as an option.
- Supporting not just descending order but arbitrary special sorting criteria, for example case-insensitive for strings (note that there are ways to support custom criteria generically).
- Being stable, which means that equivalent elements of the input will keep the same relative order.
- Multithread support.
A solution can earn up to six
for ethical and esthetical code qualities:
- Being completely DRY ("Don't Repeat Yourself").
- Modularity, promoting generic code reuse.
- Being very clear in name choice and markup.
- Useful comments (documenting the "why" rather than the "what").
- Striking me as very creative or original (admittedly subjective).
- Being very amusing (again, subjective).
In each of the following categories, a
is awarded to the solution with the best asymptotic complexity. In the first two categories, a
is also awarded to the solution with the best complexity for input data with exactly eight elements. A
is awarded if a solution includes costly operations that do not fit in one of the below categories.
- Total number of order-determining operations, such as comparisons or radix computations.
- Total number of copies/moves. Swapping two elements always counts as three copies, even if your programming language has a notation that hides this or if you use arithmetic tricks that technically avoid copying. For fairer comparison with Haskell and similarly immutable programming languages, solutions that modify the input are pretended to work on a copy.
- Amount of additional memory required. Again, for fairer comparison with immutable programming languages, solutions that modify the input are pretended to copy it first.
- Cache misses: how many times a theoretical CPU with a L1 cache that can fit 10k elements would need to load a different part of the input from RAM or L2+ cache.
- Thread synchronization operations, such as mutex locks. This category is only awarded if there are at least two multi-threaded solutions.
A solution can earn a
for each type of input for which it performs particularly well and a
for each type of input that can seriously degrade its performance (or even break it).
If I find enough time, I will port all solutions to C++ and measure their actual speed. Solutions that modify the input will be wrapped to operate on a copy and lazy datastructures will be ported using
Okasaki (explanation
here), unless I see a (relatively) straightforward way to do it using just iterators. In each input size category from 10 to 10M, by steps of a factor 10, a
goes to the solution with the least total user CPU time. If any solution is multithreaded, there is also a
in the largest four categories for the least elapsed wall clock time.
If a solution has a positive quality I did not think of beforehand, but which I think deserves a reward, I may give it a single
.
Deadline
Sunday, January 1 2023 at 23:00 GMT. I will try to publish the results within two weeks after the deadline.
Questions
Please feel free to ask for clarifications. Discussion is also welcome.