普通视图

发现新文章,点击刷新页面。
昨天以前tristanbrindle.com

Compile-time Bounds Checking in Flux

2023年11月29日 08:00

When using bounds checked interfaces, there are often situations during optimisation where the compiler is able to statically prove that a bounds check will always pass, allowing it to omit the checking code entirely. The simplest example would be something like so:

int get_first_elem(const std::array<int, 5>& arr) {
    return arr.at(0);
}

Compiler Explorer

When optimising this function, the compiler knows that the array arr always has five elements, and it knows that we’re asking for the element at index zero. Therefore, even though we’re using the bounds-checked at() function, the optimiser has enough information to remove the check and unconditionally return the first element of the array.

This is a great optimisation, particularly for libraries like Flux which perform bounds checking universally by default. But if the compiler is able to detect when a bounds check will always pass, can it also do the converse – that is, detect when a bounds check will always fail?

The answer is that yes, of course it can! Let’s say we change the above function to get_last_elem(), but we make an off-by-one error in the implementation and ask for index 5 by mistake:

int get_last_elem(const std::array<int, 5>& arr) {
    return arr.at(5); // oops, off-by-one
}

Compiler Explorer

If we look at the generated code, we can see that the compiler is now unconditionally throwing an exception from this function, as it knows that the bounds check is certain to fail.

Generating code that fails efficiently is all very well, but what would be even better is if we could somehow turn this into a compile-time error instead: that is, to have the compiler report an error to the user if it detects a bounds violation during compilation.

The great news is that Flux can now do exactly this! Let’s change our erroneous get_last_elem() to use the flux::read_at() function instead of std::array::at():

int get_last_elem(const std::array<int, 5>& arr) {
    return flux::read_at(arr, 5); // oops, off-by-one
}

Now, with GCC or Clang at -O1 or higher, we’ll get a compiler error message instead:

error: call to 'flux::static_bounds_check_failed' declared with attribute error: out-of-bounds sequence access detected
  304 |                     static_bounds_check_failed();

Compiler Explorer

Of course, this is just a simple example, but GCC (in particular) is able to perform the equivalent “lifting” of errors in some surprisingly complicated sitations. The higher your optimisation settings, the more likely the compiler is going to be able to turn your run-time bounds check into a compile-time one.

How does it work?

Note that this is nothing whatsoever to do with constexpr evaluation: in that case we’ll always get a compile error if we attempt to read outside the bounds of an array, by the rules of the abstract machine. Rather, this is for code that is (notionally) run time, but where the optimiser has enough information to “see through” it using a combination of function inlining and constant propagation.

This trick works by making use of the GCC compiler extension __builtin_constant_p (also available in Clang). Given an expression, this function returns whether the compiler knows the expression to be a constant under the current optimisation settings. Using this, we can write a bounds-checking function that (if possible) will perform a compile-time bounds check:

template <typename Index>
void bounds_check(Index idx, Index limit)
{
    if (__builtin_constant_p(idx) && __builtin_constant_p(limit)) {
        if (idx < Index{0} || idx >= limit) {
            /* ...report error at compile time... */
        }
    } else {
        /* ...perform normal run-time bounds check... */
    }
}

Now, if the compiler statically knows the values of idx and limit, and the bounds check fails, then we can raise an error at compile time.

But wait a second – how do we deliberately cause a compile error here? We can’t use static_assert or anything similar because that would always fail unconditionally. Remember, we’re not dealing with things that the C++ language considers to be compile-time constants – things like template parameters or constexpr variables on which we could static_assert or if constexpr. Rather, these are things that the language considers to be run-time values; it just so happens that the optimiser has extra information about them.

The solution is to use another GCC extension, the error attribute (again, also available in Clang). If we declare a function with this attribute, like so:

[[gnu::error("out-of-bounds sequence access detected")]]
void static_bounds_check_failed();

then compilation will fail if a call to this function doesn’t eventually get optimised away. The nice thing is that (with GCC at least) we end up with a compiler backtrace which shows precisely where the original offending code is, even through a long chain of inlined functions.

Limitations

In case it isn’t clear, turning certain run-time errors into compile time errors like this relies on compiler optimisations, and extensions which allow us to ask the compiler what the optimiser “sees” and inject an error if the check fails.

If you’re compiling without optimisation, or with an unsupported compiler, then you’ll get a program which fails an ordinary bounds check when you run it – which is, of course, exactly what you asked for! Being able to detect certain bounds violations at compile time should be seen as a nice bonus, rather the something to rely on. (What was that, Mr Hyrum?)

If for some reason you don’t want this to happen – perhaps you’re trying to test your error handling code, for example – then you can disable it by defining the preprocessor symbol FLUX_DISABLE_STATIC_BOUNDS_CHECKING before #include-ing Flux.

Thanks

A big thank you to Matthias Kretz for showing me this trick at CppCon and suggesting I use it in Flux.

If anyone knows how to perform the equivalent trick with MSVC please let me know!

Parameter Passing in Flux versus Ranges

2023年8月16日 08:00

Regular readers of this irregular blog will know that I’m a huge fan of C++20 Ranges and the ‘collection-orientated’ style of code that they allow. For the last few months, I’ve been working on a new C++20 library called Flux, which aims to provide many of the same facilities as Ranges as well as offering improved safety, ease-of-use, and in some cases better runtime efficiency.

In the first of what might turn out to be a series comparing Flux and Ranges, I want to look at how the two libraries differ in their handling of arguments to sequence adaptors.

The similarities

Let’s start with the similarities. Both Flux and Ranges differentiate between eagerly evaluated algorithms and lazy adaptors, the latter of which are often called ‘views’ in the Ranges world.

The eager algorithms are functions that take one or more sequences and immediately iterate over them, either returning the result of a calculation or modifying a sequence in-place. Flux and Ranges use the same approach here: algorithm implementations take their arguments as forwarding references, meaning they can accept lvalues, rvalues, const and non-const arguments alike. Because algorithms immediately operate their arguments and return you the answer, there are no major lifetime issues to worry about: if you pass in an rvalue temporary and it disappears after the call is finished, everything is okay. The algorithm has done its job while the object was alive and we (usually) no longer need to worry about it.

Adaptors on the other hand are a little different. In both Flux and Ranges, adaptors don’t do any actual work when you first construct them. Instead, they return new sequences which only perform their operations as you later iterate over them. We call this kind of ‘just in time’ work lazy evaluation.

Both libraries implement adaptors in more-or-less the same way, as classes templated on some underlying sequence type, with a data member representing the underlying sequence:

template <typename Base>
class some_adaptor {
    Base base_;

    /* ...other stuff... */
};

If Base owns the elements it points to, then again, there are no lifetime issues here: Base owns the elements, and some_adaptor owns the Base, and so everything is fine. This works recursively as you build up an adaptor pipeline, with each adaptor templated on its ‘parent’ in the adaptor chain.

The problem comes in the case where Base is non-owning, and refers to elements with distinct lifetime – think of something like a span or string_view. In this case, it’s the programmer’s responsibility to ensure that the original elements outlive the adaptor object that refers to them, otherwise you’re going to have a bad time.

All of this is true for both Flux and Ranges – in C++ we don’t have a garbage collector to clean up after us, or a borrow checker to make sure our references’ lifetimes are in order. The difference between the libraries comes in how they pass arguments into their adaptor classes, and in particular how Flux tries to make potentially long-lived references explicit.

Reviewing Ranges

Before we get on to talking about Flux, let’s review how things are done in the Ranges world. Let’s say we have a line of code like this:

auto filtered = std::views::filter(get_range(), pred);

and have a look at what happens depending on what get_range() returns.

Although certain views::xxx() functions can do more sophisticated things depending on the types you provide, views::filter() is a relatively simple one: it always constructs a new ranges::filter_view.

But it’s still not all that simple. There are actually five possible different behaviours here, depending on whether get_range() returns an lvalue or an rvalue, whether the returned range satisfies the std::ranges::view concept or not, and in the latter case, whether the view is copyable.

We’ll start off with the easiest case. If get_range() returns an rvalue view, it’s simply moved into a newly constructed filter_view object as we’d probably expect. Nothing surprising about that.

If get_range() returns an lvalue view on the other hand, it will be copied into the new filter_view. Ranges requires that copying a view, if it works, is a constant-time operation, so this isn’t going to be too expensive. If the view is non-copyable, we’ll get a compile error.

What about if get_range() returns a non-view? Well, that’s where things get interesting. For an rvalue non-view, the object returned from get_range() will be wrapped in a ranges::owning_view object, which is basically a move-only wrapper around the returned range. (We can’t know whether some arbitrary range is O(1) copyable, so Ranges disallows copying in order to maintain the view semantic requirements.) As the name suggests, in this case the filter_view will end up owning the returned range.

This behaviour is a relatively recent change to Ranges – previously there was no owning_view and attempting to pass an rvalue non-viewable_range into an adaptor was a compile error. This was changed in P2415 in 2021 and retroactively applied to C++20.

There is one last case to consider: when get_range() returns an lvalue non-view. This time, the returned reference is wrapped in a ranges::ref_view, which stores a pointer to the returned range and passes that into the filter_view. Because the filter_view now holds a pointer to the underlying range, it means that we need to be careful to make sure that we don’t allow the underlying range to be destroyed prematurely. In fact, we need to be pretty careful about any changes we make to the underlying range while we’re using the filter_view, as we might accidentally end up invalidating it.

Let’s sum that up again: what happens when we pass a range argument r into an adaptor?

  • rvalue + view: r is moved into the adaptor
  • lvalue + view: r is copied into the adaptor (in constant time), or a compile error if r is non-copyable.
  • rvalue + non-view: r is moved into the adaptor via an owning_view, meaning the adaptor will be non-copyable
  • lvalue + non-view: the address of r is stored in a ref_view, which is in turn stored in the adaptor

It’s the implicit referencing in the last case that is potentially problematic, because it’s not particularly obvious when forming a range adaptor whether there might be lifetime issues. If you use Ranges a lot then the rules become etched in your memory, but not every C++ programmer uses Ranges (or even C++) every day. It also means that if we’re not careful, innocent-looking code can end up causing dangling views:

auto filter_even(const std::vector<int>& vec)
{
    return std::views::filter(vec, [](int i) { return i % 2 == 0; });
}

auto evens = filter_even({1, 2, 3, 4, 5}); // Oh no!!

Here, vec is initialised from an rvalue temporary and its address is then stored in the returned object, which becomes dangling when the temporary is destroyed at the end of the statement. Any later use of evens is going to end up attempting to read from deleted memory – but that’s not necessarily obvious unless you’re well-versed in how range adaptors work.

An aside: causing const confusion

There is another potential surprise with the way Ranges does things, which is when it comes to const iteration. This doesn’t affect views::filter because that’s never const-iterable, but it would be noticeable with something like views::take:

std::vector vec{1, 2, 3, 4, 5};

const auto taken = std::move(vec) | std::views::take(3);

taken.front() = 10; // Error, assigning to const reference

Here we’re moving vec into the take_view, meaning that the view owns the elements – and since we declared taken to be const, we can’t change those elements later. Specifically, ranges::range_reference_t<decltype(taken)> is const int&, meaning that the last line is a compile error.

On the other hand, if we were to do something like this:

std::vector vec{1, 2, 3, 4, 5};

const auto taken = vec | std::views::take(3);

taken.front() = 10; // Fine, vec is now [10, 2, 3, 4, 5]

Now taken holds a ref_view, which holds a std::vector<int>*, and the top-level const no longer has any effect on element access: range_reference_t<decltype(taken)> is non-const int&, meaning the assignment works just fine.

Again, if you’re comfortable enough with C++ to know the difference between T const* and T* const, and if you’ve used ranges enough to know when adaptors take implicit references, then this behaviour is exactly what you’d expect. For everyone else, the apparent lack of const-correctness in this case might come as a bit of a surprise.

Getting that sinking feeling

Although Flux treats arguments to sequence algorithms in much the same way as Ranges does, things are rather different when it comes to sequence adaptors.

In Flux, an adaptor is considered a sink function, meaning it always takes ownership of the object you pass to it. In other words, adaptors in Flux always behave as if they take their sequence arguments by value. Let’s have a look at an example:

auto is_even = [](int i) { return i % 2 == 0; };

auto evens = flux::filter(std::array{1, 2, 3, 4, 5}, is_even);

Here we’re passing an rvalue std::array into the filter adaptor function. The adaptor will take ownership of the array, which is moved into the returned object – similar to what would happen with the equivalent Ranges code since P2415, but without wrapping in an owning_view.

The same thing happens if we explicitly move() into the adaptor:

std::array array{1, 2, 3, 4, 5};

auto evens = flux::filter(std::move(array), is_even);

Again, we’re passing an rvalue argument that is moved into the resulting adaptor object.

Where things differ is when we pass an lvalue argument, like so:

std::array array{1, 2, 3, 4, 5};

auto evens = flux::filter(array, is_even);

As with all Flux adaptors, filter() behaves as if it takes its arguments by value, so evens will be initialised with a copy of the array. Value semantics means there are no possible lifetime issues in this case – the lifetime of evens is distinct from that of array – and modifying array cannot put the filter adaptor into an unexpected state.

But wait, isn’t there an obvious downside to this approach? Copying five ints is one thing, but I definitely don’t want to accidentally pass a multi-megabyte std::vector into a function by value!

Fortunately, Flux has you covered. Flux adaptors behave as if they take their arguments by value, but the ‘as if’ part is important. What actually happens is that an implicit copy is performed only for types for which std::is_trivially_copyable_v is true. Passing a non-trivially-copyable lvalue is a compile error:

std::vector<int> vec = get_huge_vector();

auto evens = flux::filter(vec, is_even); // ERROR
                                         // vec is not trivially copyable

Non-trivial types like vector must be passed as rvalues, meaning you need to create an explicit copy if copying is the behaviour you want. Flux provides a copy() function for this purpose, or you can use auto() in C++23:

std::vector<int> vec = get_huge_vector();

auto evens = flux::filter(flux::copy(vec), is_even); // Explicit copy
// or
auto evens = flux::filter(auto(vec), is_even); // Explicit copy (C++23)
// or
auto evens = flux::filter(std::move(vec), is_even); // Explicit move

In other words, if copying is (likely to be) ‘cheap’ then Flux will do the copy for you, otherwise you need to explicitly request it. Not only does this prevent accidental expensive copies, but it also makes it obvious in your source code when a potentially expensive copy is happpening. As an added bonus, the ‘as if’ behaviour enables us to avoid an extra move that might sometimes be needed with ‘true’ pass-by-value.

Note that is_trivially_copyable isn’t a perfect proxy for ‘is cheap to copy’, because trivially copyable types can still be arbitrarily large – something like std::array<int, 1'000'000> for example. I’ve toyed with the idea of adding a size cutoff for implicit copies, but it’s hard to find a good upper limit and I’m reluctant to have some arbitrary N for which array<T, N> works but array<T, N+1> doesn’t, as that would be very surprising and bad for generic code. In any case, huge stack-allocated trivially copyable classes are probably rare in practice, and where they do exist you’re much more likely to want to pass them by reference as we’ll see below.

References really required?

Value semantics are great, but there are times when we actually do want to pass references around.

For such cases, Flux provides explicit pass-by-reference into adaptors using the flux::ref() function. For example:

std::vector vec{1, 2, 3, 4, 5};

auto evens = flux::filter(flux::ref(vec), is_even);

The ref() function accepts an lvalue sequence and returns a specialisation of the internal ref_adaptor type, which works very much like Ranges’ ref_view in that it stores a pointer to the passed-in sequence.

As such, we do now need to be careful with lifetimes to make sure that the referred-to object outlives the object holding the reference. The key difference is that with Flux, taking a reference is clearly visible at the call-site.

Just as with Ranges, since vec and evens are now referring to the same data we again need to be careful not to invalidate the filter state by modifying vec “behind its back”. But since it’s now more obvious that the same data is being referred to twice, this situation is easier to avoid.

Similarly, consider the Flux equivalent of the ‘dangling view’ Ranges code we saw earlier. In Flux, trying to call filter(vec, pred) with an lvalue vector is not going to work – it will be a compile error, as we saw above. Instead, we would need to explicitly opt in to doing the wrong thing, and if we do so then it becomes immediately apparent that something strange is going on:

auto filter_even(const std::vector<int>& vec)
{
    // This now looks very fishy
    return flux::filter(flux::ref(vec), [](int) { return i % 2 == 0; });
}

auto evens = filter_even({1, 2, 3, 4, 5});

With the explicit use of flux::ref() it’s now much more obvious that we’re returning a reference to vec from our function, with the potential lifetime problems that that entails.

Making mutability more marked

Mutable references are bad.

Memory-safe languages like Rust, Swift and Val Hylo provide their safety guarantees in part by strictly limiting how and when mutable references can be used. Specifically, in these languages we can have either a single mutable reference to an object, or any number of immutable references, but not both at the same time – the so-called “law of exclusivity”. (Swift and Hylo don’t directly expose references to the user, but that’s what’s going on behind the scenes.)

If we could somehow follow this same rule in C++ then in theory we would be able to achieve a similar level of memory safety. While this is very difficult to do in general without compiler or language support, there are a couple of ways in which we can try to make it easier – for example by making mutable references more obvious in our code.

To this end, flux::ref(x) actually takes a const reference x and internally stores a T const*. That is, it behaves pretty much the same as std::cref(), with the added constraint that its argument must be a const-iterable flux::sequence. The returned object is freely copyable (indeed, trivially copyable) so you can pass it multiple times to functions if you wish:

std::vector<int> vec = get_vector();

auto ref = flux::ref(vec); // stores std::vector<int> const*

auto zipped = flux::zip(ref, flux::drop(ref, 1)); // Okay, copies ref twice

for (auto [a, b] : zipped) {
    // ...
}

If you actually want to pass a mutable reference into a Flux adaptor, you need to say flux::mut_ref(x). This is slightly more to type than just ref(), helping to emphasise that you should reach for a const reference first; it also means that it’s more obvious in our code where we are passing a mutable reference so we can try to avoid having more than one of them at any time. Unlike with ref(), the object returned from mut_ref() is move-only, to try to help prevent accidentally creating multiple mutable references.

This means that if we tried to write the same zip code again but with a mutable reference, we’re going to get a compile error:

std::vector<int> vec = get_vector();

auto mref = flux::mut_ref(vec); // stores std::vector<int>*

auto zipped = flux::zip(mref, flux::drop(mref, 1)); // ERROR
    // mref is not copyable

We could try to avoid this compile error in a couple of ways, either by moving mref twice:

std::vector<int> vec = get_vector();

auto mref = flux::mut_ref(vec);

auto zipped = flux::zip(std::move(mref), flux::drop(std::move(mref), 1));
// Compiles, but looks very wrong

Or by calling mut_ref() on the same argument twice:

std::vector<int> vec = get_vector();

auto zipped = flux::zip(flux::mut_ref(vec),
                        flux::drop(flux::mut_ref(vec), 1));
// Ditto

But in either case the result is something that just looks wrong.

The equivalent Ranges code for the last example is more consise:

std::vector<int> vec = get_vector();

auto zipped = std::views::zip(vec, std::views::drop(vec, 1));

but in this case the fact that we’re storing multiple mutable references to vec is not at all obvious.

The Flux iteration model also helps to avoid multiple referencing, because Flux cursors do not in general constitute a “borrow” in Rust terminology, unlike STL iterators – but that’s a topic for another post.

An aside: const confusion clarified?

We saw earlier that for Ranges, the const-accessibility of elements of an adapted range depends on whether the source range was an lvalue or an rvalue, and whether it models the view concept:

std::vector vec{1, 2, 3, 4, 5};

auto view1 = std::views::take(vec, 3);
view1.front() = 10; // Okay

const auto view2 = std::views::take(vec, 3);
view2.front() = 10; // Still okay

auto view3 = std::views::take(auto(vec), 3); // copy vec
view3.front() = 10; // Okay

const auto view4 = std::views::take(auto(vec), 3);
view4.front() = 10; // Error, assigning to const reference

With Flux on the other hand, sequence adaptors always take their arguments as-if by value, and we need to explicitly opt-in to operating on references with flux::ref() or mut_ref(). This makes things somewhat less surprising: if we request const- or mutable reference access via ref() or mut_ref() respectively, then that’s what we get, regardless of the const-ness of sequence object that holds the ref_adaptor. Otherwise, if we transfer ownership of the elements into the adaptor, then the const-ness of the elements follows the const-ness of the sequence object as we would expect.

The following example demonstrates the various cases:

std::vector vec{1, 2, 3, 4, 5};

auto seq1 = flux::take(vec, 3); // Error, vec is not trivially_copyable

auto seq2 = flux::take(flux::ref(vec), 3); // Want read-only access to elements of vec
seq2.front().value() = 10; // Error, elements are immutable, as requested

auto const seq3 = flux::take(flux::mut_ref(vec), 3); // Want read-write access to elements of vec
seq3.front().value() = 10; // Okay, elements are mutable, as requested

auto seq4 = flux::take(auto(vec), 3); // No referencing, we own a copy of vec
seq4.front().value() = 10; // Okay, seq4 is non-const

const auto seq5 = flux::take(auto(vec), 3); // Declared const this time
seq5.front().value() = 10; // Error, seq5 is const

Note that unlike the standard library, Flux’s front() operation returns an optional to avoid undefined behaviour in the case where the sequence is empty, so we need to add .value() to access the element itself. And yes, flux::optional<T> allows T to be a reference.

In summary

Like C++20 Ranges, Flux provides a variety of sequence adaptors which you can use to build up a pipeline of operations. Ranges implicitly stores the address of lvalue non-views passed to range adaptors, which can hide lifetime dependencies.

Flux on the other hand semantically always takes adaptor arguments by value, except that you need to make an explicit copy if copying might be expensive. To pass a reference to a sequence, you need to explicitly form one with flux::ref or mut_ref. By giving the latter a longer name and making it move-only, Flux tries to encourage you to use const references whenever possible and avoid having multiple mutable references to a sequence. Flux’s explicit referencing also tries to make the const-complexity story a little less mysterious than with Ranges.

If you found this interesting and want to know more then you can find me on whatever Twitter is called now, or checkout Flux on Github.

Experimenting with Modules in Flux

2023年8月3日 08:00

TL;DR: Flux, my C++20 sequence-orientated programming library, now has experimental support for being used as a module. Provided you’re using a recent compiler and don’t mind a few rough edges, you can say import flux and get going.

The future is (almost) here!

Trying it out

If you fancy giving it a try, you’ll need a recent version of one of the three major compilers: I’ve got it working with Clang 16, GCC 13.1 and MSVC 17.6, but slightly older compilers might work too.

First, grab a fresh checkout of Flux from Github. Then cd to the checkout directory and save the following as modules_test.cpp:

import flux;

int main()
{
    constexpr int arr[] = {1, 2, 3, 4, 5};
    static_assert(flux::sum(arr) == 15);
}

Clang 16

Starting in the top-level directory of the Flux checkout, the first step is to compile the binary module interface (BMI) with:

> clang++ -std=c++20 -I include/ --precompile -x c++-module module/flux.cpp

(You could also provide -DNDEBUG here to compile the library without extra debug checks, or your preferred Flux configuration flags if you wish.)

This should generate a file called flux.pcm. Next, we need to generate an object file to link to:

> clang++ -c flux.pcm

which should output flux.o. Finally, we can compile our test program:

> clang++ -std=c++20 -fprebuilt-module-path=. flux.o modules_test.cpp -o modules_test

If this compiles and outputs the modules-test executable, then congratulations! Everything is working.

GCC 13

For GCC, we need to provide the -fmodules-ts flag to enable modules support: just selecting C++20 mode is not enough.

> g++-13 -std=c++20 -fmodules-ts -I include/ module/flux.cpp -c

This should generate both the BMI (in a folder called gcm.cache/flux.gcm) and the flux.o object file.

Next we can compile the test file with:

> g++-13 -std=c++20 -fmodules-ts flux.o modules_test.cpp -o modules_test

Note that needs to be done from the directory that contains the gcm.cache folder. Presumably it’s possible to persuade GCC to look elsewhere for BMIs, but I wasn’t able to find the right options.

MSVC 17.6

For MSVC, open up a Visual Studio developer prompt and cd to your Flux checkout. Then run:

> cl.exe /std:c++20 /EHsc /I include /c /interface /TP module\flux.cpp

This will generate the flux.ifc BMI and flux.obj object file.

We can then compile the test program with

> cl.exe /std:c++20 /EHsc flux.obj module_test.cpp

Note that at the time of writing, MSVC doesn’t seem to like the flux::ref() function when using Flux via an import. It works fine via #include, and when importing in Clang and GCC, so it might just be a compiler bug.

As a workaround, you can say flux::from(std::cref(seq)) instead of flux::ref(seq), which is semantically equivalent (though quite ugly).

CMake

If you’re using Clang or MSVC and are feeling particularly brave, you can also try out modular Flux using CMake.

You’ll need to add Flux as a subproject and configure it with FLUX_BUILD_MODULE=On, which will add an extra library target called flux-mod. You should then be able be able to use target_link_libraries() to link your own executable with flux-mod, which will take care of adding all the necessary build commands.

Currently this uses Victor Zverovich’s modules.cmake rather than CMake’s very new built-in module support, although that might change at some point as the built-in CMake support matures.

Modularising the code

Flux is a C++20 library that uses lots of cutting-edge features, but the lack of compiler and build system support meant that I had to write it as a “traditional” single-header library rather than using modules.

Then a few days ago I was watching Daniel Ruoso’s C++Now 2023 talk, “The Challenges of Implementing C++ Header Units”. The whole talk is great (if a bit depressing), but I was particularly interested in a slide near the end, where Daniel presents a simple way for authors of existing (non-modular) libraries to produce a wrapper module:

Screenshot from CppNow

That… actually seemed pretty easy, so I thought I’d give it a go. I whipped up very quick test file:

module;

#include <flux.hpp>

export module flux;

export namespace flux {
    // Just export a few names to see if this works
    using flux::sequence;
    using flux::filter;
    using flux::map;
    using flux::write_to;
}

Amazingly, once I’d figured out the correct Clang command-line incantations, it seemed to work – which was very different to my experiences of trying modules up to this point. Flushed with success, I went through and added all of the other public names to the export namespace block: about 300 in all.

I now had a modular version of Flux that worked with Clang and MSVC. GCC didn’t like it for some reason – and to be honest, I didn’t like it all that much either. Duplicating all the public names in two places seemed very inelegant, and it would be annoying to have to remember to update the module file for every public symbol added to the library.

Hunting for an alternative, I looked at the only major project I know of that currently provides a public module: Victor Zverovich’s {fmt}. Following this example, I went through an added a FLUX_EXPORT macro to all the public names in the library, which expands to export if FLUX_MODULE_INTERFACE is defined, and nothing otherwise.

I then re-wrote the module file as

module;

#include <array>
#include <compare>
/* ... all our other stdlib includes ... */

export module flux;

#define FLUX_MODULE_INTERFACE
#include <flux.hpp>

If I understand correctly – and that’s quite a big “if”, as I’m still new to modules! – this attaches all of the names in the library with the flux module, which seems like… what we want? But by first #include-ing all of the standard library headers outside of the module purview, we avoid taking ownership of any of the stdlib stuff.

I tried out the new approach and found that not only did it work just as well with Clang and MSVC, but GCC was also much happier now.

…but is it fast?

The short answer: yes.

I was interested to see what, if any, speedup we’d get using Flux via a module versus traditional header. As a test, I used a small program which solves the Sushi For Two problem from Conor Hoekstra’s Top 10 Algorithms set (which Conor also talks about in his fantastic “New Algorithms in C++23” talk – look out for it when it drops online!). Here it is:

#include <algorithm>
#include <vector>

#ifdef USE_MODULES
import flux;
#else
#include <flux.hpp>
#endif

constexpr int sushi_for_two(std::vector<int> const& sushi)
{
    return 2 * flux::ref(sushi)
                .chunk_by(std::equal_to{})
                .map(flux::count)
                .pairwise_map(std::ranges::min)
                .max()
                .value();
};

static_assert(sushi_for_two({2, 2, 2, 1, 1, 2, 2}) == 4);
static_assert(sushi_for_two({1, 2, 1, 2, 1, 2}) == 2);
static_assert(sushi_for_two({2, 2, 1, 1, 1, 2, 2, 2, 2}) == 6);

int main()
{}

The #includes at the top are intended to replicate a somewhat realistic scenario – most real-world TUs are going to use some standard library facilities, after all.

I then used Clang’s -ftime-trace facility to look at the internal timings during compilation:

  • Without modules, the #include <flux.hpp> line took around 570ms to process on my laptop (of which around half was #include-ing other standard library headers, and the rest parsing the Flux code itself)

  • With modules, the import flux step took 57ms – almost exactly ten times faster!

Of course, this is only a single, unscientific test with one compiler on one machine, but the numbers certainly look encouraging.

Looking forward

All this playing around has finally – three years after C++20 was finished! – got me excited about modules. There’s still a very long way to go yet in terms of seamless compiler and build system support, but things seem to be heading in the right direction.

Flux will remain a header-based library for the foreseeable future, but I’ll keep trying to improve the modules story. And who knows, maybe by the time C++26 rolls around we’ll finally be able to (mostly) consign #include to the history books.

Ranges and Forwarding References

2022年2月4日 08:00

Arthur O’Dwyer recently blogged about “universal references” vs “forwarding references”, where he puts the case that T&& (where T is a deduced template parameter) should be properly called a “forwarding reference” rather than the original term (coined by Scott Meyers), “universal reference”.

I happen to agree that “forwarding reference” is a better name. But Arthur goes on to say

If you see code using std::forward<T> without an originating T&&, it’s almost certainly buggy. If you see code using (deduced) T&& without std::forward<T>, it’s either buggy or it’s C++20 Ranges. (Ranges ill-advisedly uses value category to denote lifetime rather than pilferability, so Ranges code tends to forward rvalueness much more conservatively than ordinary C++ code does.)

It’s true that ranges uses value category as a proxy for lifetime in the shape of borrowed_range, which I’ve written about before. But that’s not actually why ranges uses T&& without std::forward – it would do so even if it didn’t care about trying to save you from dangling iterators. It’s not buggy, it’s the right thing to do, and it’s something to consider for your own code as well where appropriate.

What does ranges do, anyway?

Every ranges algorithm starts with a function1 taking an iterator and a sentinel, much like the classic STL algorithms. We can write one ourselves (constraints, projections etc omitted for brevity):

// Overload (0)
template <typename I, typename S, typename T>
size_t count(I iter, S last, const T& value)
{
    size_t n = 0;
    while (iter != last) {
        if (*iter == value) {
            ++n;
        }
        ++iter;
    }
    return n;
}

Now, like the std::ranges algorithms, we want to add a second overload taking a range as a whole, which in turn dispatches to the iter/sentinel version. We could take the range argument by value:

// Don't do this
template <typename R, typename T>
size_t count(R rng, const T& value)
{
    return count(ranges::begin(rng), ranges::end(rng), value);
}

but of course this would be a silly thing to do, at least for non-view ranges. Instead, we should take the range by reference. In this particular case, we’re not modifying the range elements, just examining them, so we could take the range parameter by const ref:

// Overload (1)
template <typename R, typename T>
size_t count(const R& rng, const T& value)
{
    return count(ranges::begin(rng), ranges::end(rng), value);
}

This works perfectly well in many cases. Unfortunately not all cases though – not every range is const-iterable, and in C++20 it’s not particularly difficult to come up with examples which are not. So we’re going to need a third overload, taking a range by non-const reference:

// Overload (2)
template <typename R, typename T>
size_t count(R& rng, const T& value)
{
    return count(ranges::begin(rng), ranges::end(rng), value);
}

With these in place, we’ll now do the right thing for both const- and non-const ranges:

const std::vector vec1{1, 2, 3, 4, 5};
count(vec1, 3); // calls overload (1)

std::vector vec2{6, 7, 8, 9, 10};
count(vec2, 8); // calls overload (2)

So, are we done now? Alas not. Consider what happens if we call count() with an rvalue range:

std::vector<int> get_vector();

count(get_vector(), 3);

Here, overload (2) is not viable, because we cannot bind an rvalue argument to a non-const lvalue reference. Overload (0) is of course not viable either as we have a mismatch in the number of arguments, so we’ll end up calling overload (1) taking const R&. But as we’ve already mentioned, not all ranges are const-iterable – and sure enough it’s easy to come up with examples where this would be a problem.

So what’s the solution? You guessed it: yet another count() overload, this time taking an rvalue reference as an argument:

// Overload (3)
template <typename R, typename T>
    requires (not std::is_lvalue_reference_v<R>)
size_t count(R&& rng, const T& value)
{
    return count(ranges::begin(rng), ranges::end(rng), value);
}

Here, R&& is in fact a universal forwarding reference, so we’ve added a constraint that it’s only viable when R is deduced to a non-reference type – that is, when the argument is an rvalue. Now, things work as we would like:

const std::vector vec1{1, 2, 3, 4, 5};
count(vec1, 3); // const lvalue, calls (1)

std::vector vec2{6, 7, 8, 9, 10};
count(vec2, 8); // non-const lvalue, calls (2)

std::vector<int> get_vector();
count(get_vector(), 99); // rvalue, calls (3)

Happy days! All we need to do is write four separate overloads for every ranges algorithm.

How many overloads?!

But let’s back up a moment. We added a constraint to overload (3) because we didn’t want it to be called with lvalues. But let’s just suppose we… didn’t. What’s the worst that could happen if we called it with lvalue arguments?

template <typename R, typename T>
size_t forwarding_count(R&& rng, const T& value) {
    return count(ranges::begin(rng), ranges::end(rng), value);
}

const std::vector vec1{1, 2, 3, 4, 5};
forwarding_count(vec1, 3);

In this case, R would be deduced to be const std::vector<int>&, and after type substitution we’d end up with a specialisation like this:

size_t forwarding_count(const std::vector<int>& rng, const int& value)
{
    return count(ranges::begin(rng), ranges::end(rng), value);
}

This is exactly the same function as we’d get from calling overload (1) of our original count()!

The same thing happens if we call forwarding_count() with a non-const lvalue argument:

std::vector vec2{6, 7, 8, 9, 10};
forwarding_count(vec2, 8);

This time R would be deduced as (non-const) std::vector<int>& and we’d end up with the exact same function as we would from overload (2) of count().

In other words, if we remove the constraint from overload (3) it will correctly handle const lvalues, non-const lvalues and rvalues – meaning don’t need overloads (1) or (2) at all!

Our complete and correct count() implementation thus becomes

template <typename I, typename S, typename T>
size_t count(I iter, S last, const T& value)
{
    size_t n = 0;
    while (iter != last) {
        if (*iter == value) {
            ++n;
        }
        ++iter;
    }
    return n;
}

template <typename R, typename T>
size_t count(R&& rng, const T& value)
{
    return count(ranges::begin(rng), ranges::end(rng), value);
}

…and that’s it

For examples like this, using a forwarding reference does the right thing in all cases, even though we aren’t std::forward-ing anything. It’s correct, it’s not buggy, and it saves us from doing extra redundant work.

And that’s why ranges does it.


1 Ranges algorithms are in fact currently implemented using function objects, but that’s not important for the present discussion

Numeric Range Algorithms for C++20

2020年5月19日 08:00

TL;DR: I’ve written C++20 range-based versions of the algorithms in the <numeric> header, and you can get them here


C++20 brings with it updated versions of the many, many algorithms in the <algorithm> header. Sadly, as Peter Dimov recently noted, it does not do so for the “other” algorithms in the <numeric> header.

The reason for this is simply one of time. It turns out that correctly defining the concepts for the numeric algorithms is very tricky to do, and so to avoid ranges “missing the bus” with C++20 these needed to go on the back-burner for a little while. But Christopher Di Bella is currently working on it, so I’m very hopeful we’ll see conceptified, range-based versions of accumulate() and friends in C++23.

As a stop-gap solution in the mean time, I’ve written numeric_ranges.hpp – modern implementations of the most common <numeric> algorithms which you can use with C++20 ranges. You can get it here.

These implementations simply ignore the hardest part of the problem – actually defining the numeric concepts – and so, unlike the other algorithms in the std::ranges namespace, they are unconstrained 1 and don’t offer any protection from ugly, C++98-style template error messages when you get things wrong.

Nonetheless, they do still offer the other benefits of the std::ranges algorithms, for example:

  • They have range-based overloads (of course)
  • They accept optional projections
  • They’re constexpr-callable
  • They’re CPOs, so you can (for example) pass them to other algorithms without needing to wrap them in lambdas.
constexpr std::array arr{1, 2, 3, 4};
std::vector<int> out;

tcb::partial_sum(arr, std::back_inserter(out));
// out contains [1, 3, 6, 10]

const int prod = tcb::inner_product(arr, out, 0);
// prod = (1 * 1) + (2 * 3) + (3 * 6) + (4 * 10) = 65
assert(prod == 65);

constexpr auto sq = [](int i) { return i * i; };
constexpr int sum = tcb::accumulate(arr, 0, {}, sq);
// sum = 1 + 4 + 9 + 16 = 30
static_assert(sum == 30);

Try it out!

Not using C++20 yet?

At the time of writing the only standard library implementation of ranges comes with the brand-new GCC 10.1. Other implementations are sure to follow in due time, and when they do, numeric_ranges.hpp will work with them.

But if you’re not using C++20 or a brand new GCC yet, don’t despair! You can still use all the new ranges goodness in C++17 with NanoRange – and, of course, numeric_ranges.hpp works with NanoRange too2.

Enjoy!

1: Or rather, constrained just enough to avoid ambiguous calls, and no more.

2: Before anyone asks, Range-V3 comes with its own implementations of these algorithms, so numeric_ranges.hpp isn’t needed there.

Rvalue Ranges and Views in C++20

2020年5月12日 08:00

Back in the mists of time, when the world was a more normal place (well, September last year) I gave a talk at CppCon entitled “An Overview of Standard Ranges”. Unfortunately I ran a little long and didn’t have time to take questions on video at the end, but since my talk was right before lunch I ended up having a informal Q&A session in the room afterwards for about 45 minutes. One of the questions that came up a few times was about the relationship between rvalue ranges and views, and what you can do with them. At the time it occurred to me that it might be useful to write a blog post clarifying things, but since the nomenclature and mechanisms were still somewhat in flux at the time I decided I’d wait until C++20 was finalised.

Well, that a couple of months ago and now I’ve run out of excuses, so here we go.

Algorithms, rvalues and dangling iterators

Arguably the best thing about the new algorithms in the std::ranges namespace is that we can, at long last, pass range objects to them directly – so for example we can now say

std::vector<int> get_vector();

auto vec = get_vector();
// returns an iterator to the first element with value 99
auto it = std::ranges::find(vec, 99);

rather than having to type

auto it = std::find(vec.begin(), vec.end(), 99);

as we have done for the past 25 years or so.

This is great, but there is a potential problem here. find() returns an iterator into the passed-in range – so what happens if we call it with an rvalue vector, for example?

auto it = std::ranges::find(get_vector(), 99);

The temporary returned by get_vector() will be destroyed at the end of the statement, meaning that it will be unsafe to use – it is a dangling iterator. Dangling pointers and iterators are already a major problem in C++, and we don’t want to make the situation worse by allowing this sort of dangerous behaviour when using ranges. So, what can we do about it?

One solution would to simply forbid passing rvalue containers to algorithms. In this case the third call to find() above would generate a compile error, which is certainly safer!

This is a tempting solution, and in fact it was briefly used at one point during the ranges standardisation process. However, there are times when passing an rvalue container into an algorithm is perfectly safe, and indeed leads to cleaner code. Consider:

// Print each element separated by spaces
std::ranges::copy(get_vector(),
                  std::ostream_iterator<int>(std::cout, " "));

Forbidding all uses of rvalue containers would make the above a compile error, forcing you to create a variable to hold the result of get_vector() before passing that to copy(). This seems overly heavy-handed – after all, the problem is not with the use of rvalue ranges per se, but with returning iterators into such ranges that people might later try to use.

The solution in C++20 is to have a special type, std::ranges::dangling, which algorithms use in their return types instead of potentially-dangling iterators when passed an rvalue range. So for example

auto vec = get_vector();
auto it = std::ranges::find(vec, 99);

is fine – it will be a std::vector::iterator. But if you were to write

auto it = std::ranges::find(get_vector(), 99);

then the code will still compile, but it is now of type std::ranges::dangling. This type does not provide any operations, so if you try to use it like you would an iterator you’ll get a handy error message involving the dangling type, like this, alerting you to what’s gone wrong.

(Note that this does prevent things like

auto min_val = *std::ranges::min_element(get_vector());
// Error, no operator* in std::ranges::dangling

which is still safe provided the result of get_vector() is non-empty, as the iterator does not out-live the full-expression. In this case however, C++20 provides a new overload of min(), so you can say

auto min_val = std::ranges::min(get_vector()); // Fine

instead.)

Can I borrow your range?

Dangling iterators are a potential problem for most kinds of ranges – we need to be careful to make sure that we do not use a vector iterator or an unordered_map iterator after the parent object has been destroyed, for example.

This is not the case for all ranges though. A std::string_view’s iterators for example actually point into some other character array – so we can safely use a string_view::iterator after the string_view itself has been destroyed (provided the underlying array is still around, of course):

std::string str = "Hello world";
// Weird, but okay:
auto iter = std::string_view{str}.begin();
*iter = 'h';

Types like std::string_view are called borrowed ranges in the new C++20 terminology. A borrowed range is one whose iterators can safely outlive their parent object. So an rvalue std::vector is not a borrowed range, but an rvalue std::span is.

Since we don’t need to worry about dangling iterators when we use borrowed ranges, we don’t need to apply the “dangling protection” described above when we pass them as rvalues to algorithms; we can just return iterators as normal:

auto iter = std::ranges::min_element(get_span());
// iter is an iterator, *not* ranges::dangling

By default, all rvalue ranges are assumed to be non-borrowing. If we have a custom type whose iterators can safely “dangle” (say, a custom StringRef type) then we can opt in to allowing the ranges machinery to consider it borrowed by specialising the enable_borrowed_range trait:

// Outside of any namespace
template <>
inline constexpr bool std::ranges::enable_borrowed_range<my::StringRef> = true;

Can I borrow your view?

So now we know what a borrowed range is, and we know that if we pass an non-borrowed rvalue range into an algorithm, it will give us back a std::ranges::dangling object rather than a potentially dangerous dangling iterator.

There’s one other piece of the puzzle left to discuss, namely views.

Informally, views are sometimes described as “ranges which do not own their elements”. This makes them sound quite a lot like borrowed ranges – but in actual fact they are distinct concepts.

A view is a range which is default constructible, has constant-time move and destruction operations and, if it is copyable, then constant-time copy operations (where “constant-time” here means “independent of the number of elements in the range”). So a std::string_view is a view, but a std::vector is not (because its copy and destruction operations are O(N)). These properties mean that we can pass views around by value without worrying too much about the cost of those operations.

Because the compiler can’t readily check the algorithmic complexity of operations, we need to opt in to the view concept, just as with borrowed_range. There are two ways of doing this. Firstly, we can use another trait:

template <>
inline constexpr bool std::ranges::enable_view<my::StringRef> = true

The second way is by making the view class inherit from the empty std::ranges::view_base base class – but it’s probably more useful to inherit from the std::ranges::view_interface CRTP mixin instead (which in turn inherits from view_base) which also provides default implementations of some useful member functions for free.

Putting it all together

So, views and borrowed ranges are independent concepts:

  • Borrowed: a range whose iterators will not dangle – either an lvalue range, or an rvalue with enable_borrowed_range
  • View: a range with constant-time copy/move/destroy – inherits from view_base or uses enable_view

These are tied together by the std::views::all() function, which takes a borrowed range and turns it into a view (by means of ref_view or, in rare cases, subrange). If it is passed something that is already a view, then views::all() is a no-op.

Together, borrowed ranges and views are called viewable ranges, because they can be turned into views with views::all(). Why is this important? Because viewable ranges are the things that are allowed on the left-hand side of the ranges “pipe” operator:

auto vec = get_vector();

auto v1 = vec | views::transform(func);
// Okay: vec is an lvalue
auto v2 = get_span() | views::transform(func);
// Okay: span is borrowed
auto v3 = subrange(vec.begin(), vec.end()) | views::transform(func)
// Okay: subrange is borrowed (and a view)
auto v4 = get_vector() | views::transform(func);
// ERROR: get_vector() returns an rvalue vector, which is neither
// a view nor a borrowed range

Another way of putting it is that views expect to operate on other views, and the ranges machinery helps you out by first using views::all() to turn a borrowed range into a view when necessary.

Conversely, not all views are borrowed ranges – in fact, most of them are not. This means that if we pass an rvalue view into an algorithm, the “dangling protection” will kick in:

auto vec = std::vector{-3, 2, -1, 0, 1, 2, 3};
auto sq = [](int i) { return i * i; };

auto m = std::ranges::min_element(std::views::transform(vec, sq));

std::cout << "Least square: " << *m << '\n';
// ERROR: no operator* in std::ranges::dangling

Compiler Explorer

The solution here is to save the view in a named variable and pass it to min_element as an lvalue instead. (In this specific case you could use min() instead as we did above, or be really fancy and use a projection – but that’s a topic for another time…)

TL;DR

In summary:

  • A borrowed range is one whose iterator validity doesn’t rely on the lifetime of the parent object
  • Passing a non-borrowed range to an algorithm as an rvalue means that it will return a ranges::dangling rather than an iterator
  • A borrowed range can be wrapped in a view object using views::all()
  • Range adaptors only operate on lvalues, or borrowed range/view rvalues – otherwise you get a compile error

Can you put that in a table for me?

Sure!

T borrowed_range<T> view<T> viewable_range<T>
vector<int>
vector<int>&
string_view
span<int>
span<int, 3>
transform_view
iota_view
  • In the borrowed_range column, a ✅ means that std::ranges algorithms will return an iterator rather than ranges::dangling
  • In the viewable_range column, a ✅ means that the given type can be used on the left-hand side of the ranges “pipe” operator
  • The view column is just to show that not all borrowed ranges are views :)

Opt-in UFCS with "using class"

2019年5月2日 08:00

Sadly, I’ve never had a chance to read Bjarne Stoustrup’s “Design and Evolution of C++”. I suspect I’d find it fascinating, and that it would probably answer many questions I have in my mind about why things in C++ are the way they are today. Unfortunately the book seems to be in short supply (I assume it’s out of print), and the prices for new copies on Amazon are rather on the steep side.

Nonetheless, many of the papers from the early days of C++ standardisation are still available online, and occasionally one turns up which lets us look back in time at how some of the C++ sausage was made. So it was last week, when Arthur O’Dwyer, the Unstoppable Blogging Machine, wrote about argument-dependent lookup (ADL). As usual with Arthur’s prodigious output, the post was very good, but even more interesting to me was a link it contained to a gem of a paper from 1993.

The paper in question is Stroustrup’s N0262, the original proposal for adding namespaces to C++. If you’re interested in delving in to the history of C++ standardisation, I’d strongly recommend giving it a read.

What we ended up getting in C++98 was very close to what Stroustrup originally proposed. But much of the discussion is quite eye-opening, at least to me. For example:

  1. Even in 1993 – five years before standard C++ would even exist! – there were already concerns about backwards compatibility and migrating “legacy” code

  2. Stroustrup originally proposed an extended form of using-declaration which would have allowed you to add multiple names to the current scope in one go:

    using ns::(foo, bar, baz);
    

    I’m curious as to why this was rejected.

  3. Stroustrup devotes Appendix B to discussing a hypothetical “what if?” design of making a namespace a special kind of class. He calls such a thing a “module”. That’s right, we nearly had C++ modules in 1993! (Well, kind of.)

Classes and namespaces

One of the most interesting things about the paper is that (contrary to the hyptothetical “module” discussion above) Dr S. points out that in his design we can consider a class as a special kind of namespace:

It has been suggested that a namespace should be a kind of class, but I don’t think that is a good idea (see Appendix B). The opposite, that a class is a kind of namespace, seems almost obviously true. A class is a namespace in the sense that all operations supported for namespaces can be applied with the same meaning to a class unless the operation is explicitly prohibited for classes.

I’d never really looked at things this way before, but it suddenly makes the use of the using keyword to un-hide base class members look very natural – it has exactly the same meaning as a using-declaration for namespace members. C++ is not a language for which the word “elegant” normally springs to mind, but in this case it almost feels appropriate.

Stroustrup actually takes this class/namespace equivalence to its logical conclusion, and proposes that we should be allowed to use a class name in a using-directive, i.e.

struct X {
    static void f();
    void g();
};

void h()
{
    using namespace X; // a class is-a namespace!
    f(); // Ok, calls X::f()
    g(); // error: object missing for non-static member function
}

The rationale for this is worth quoting in its entirety:

[S]hould it be allowed to specify a using-declaration or a using-directive for a class? My suggested answer is ‘‘yes.’’ My (weak) argument for ‘‘yes’’ is ‘‘why not?’’

(As an aside, one wonders how many features ended up in C++ simply because Stroustrup thought “why not?”)

Using class?

The above part of the proposal didn’t make it into C++98, and to my knowledge has not been re-proposed since: we cannot currently say using namespace X, where X is a class name. But is it an entirely bad idea?

The rest of this post is a bit of an investigation into what we might be able to achieve by allowing it. All of the following is really a thought experiment: although I’ve used the word “proposal” in a few places, I’m definitely not (yet!) proposing that we add this to C++.

So let’s imagine that we added this functionality today. Rather than spell it using namespace X, it would probably be less confusing to say using class X (or equivalently, using struct X), so let’s go with that.

(Curiously, at least if my reading of the standard grammar is correct, this syntax is currently unused. We can say using N::X as a using-delaration for a class X in namespace N, but we cannot spell it using class N::X, even though an elaborated-type-specifier like this is valid in many other places. It appears this particular syntax gap is also exploited by the using enum proposal discussed later.)

Under our proposal, a using class directive would do the same thing as a using namespace directive: make all the names from that “namespace” available in the scope in which the directive appears. So for example:

struct X {
    static void f(int);
    void g();
};

void f(double);

void h()
{
    f(3); // calls ::f(double)

    using struct X;
    f(3); // ::f and X::f considered, X::f is a better match

    g(); // ERROR: no object
}

So far, so… kind of pointless? This doesn’t really seem to add any useful new functionality – we can already make “static” member functions available in the enclosing namespace by making them friends instead, and the fact that this introduces non-static member functions into the current scope – but makes it impossible to call them – seems to be asking for trouble.

What if… we could call them?

But hang on a sec. What if, when made accessible in a scope via a using class directive, we could call a non-static member function, by explicitly passing the “implicit object” as the first argument to the call?

That is, what if we could do the following:

struct X {
    void g(int);
};

void h()
{
    X x;
    using class X;
    g(x, 3); // okay, calls x.g(3)!
}

Specifically, for each non-static member function f(params...) [cv] in X, a synthesised non-member function f(X [cv]&, params...) is considered for the purposes of overload resolution (or f(X [cv]&&, params...) for an rref-qualified member). If the synthesised function is selected, then it is rewritten as a member function call. (The intention is that this matches the current rules for how the implicit object parameter is treated during overload resolution, and the way member and non-member operator overloads are considered in the same overload set.)

Again, when you look at just a simple example then it seems kind of pointless – g(x, 3) is arguably less readable than x.g(3). However, the magic happens when we bring templates into the mix. Let’s consider the implementation of an imaginary STL-like algorithm:

template <typename R, typename T>
bool contains(const R& range const T& value)
{
    using class R;

    auto first = begin(range);
    const auto last = end(range);

    while (first != last) {
        if (*first == value) {
            return true;
        }
        ++first;
    }

    return false;
}

Because of using class R, all of the members of R are made available in the current scope. This means that if R is, say, a specialisation of std::vector, then its begin() const and end() const member functions will be considered (and in all likelihood selected as an exact match) to initialise first and last.

With using class, it doesn’t matter to the author of contains() how begin and end are implemented: whether they’re members, static members, free functions or whatever. Any of the following could be passed to contains():

struct A {
    struct iterator;
    iterator begin() const;
    iterator end() const;
};

struct B {
    struct iterator;
    static iterator begin(const B&);
    static iterator end(const B&);
};

struct C {
    struct iterator;
    friend iterator begin(const B&);
    friend iterator end(const B&);
};

struct D;
struct D_iter;

D_iter begin(const D&);
D_iter end(const D&);

Today, C++20 ranges implementations have to go to quite some lengths to handle the different ways that begin() and end() could be written (not to mention size(), data() and a host of others). With this form of opt-in uniform function call syntax provided by using class, almost all of that becomes unnecessary, and is handled by the language itself.

Hang on, wasn’t UFCS rejected by the committee?

UFCS for C++ has been proposed in a few forms in the last few years, including by Stroustrup himself. None of the proposals so far have been accepted – Barry Revzin has a nice summary of the situation on his blog. To be clear, I should state that here I am suggesting what Revzin calls “OR2” – that is, all candidates (member and non-member, including those found by ADL) are considered according to the existing overload resolution rules – but only when using non-member syntax (“CS1”), and only when a class’s members have been made available in the current scope via a using class directive.

It’s this last point that differentiates this idea from earlier proposals. As I understand it, the existing UFCS proposals have come unstuck due to the fear of unstable interfaces and “spooky action at a distance”. Depending on which proposal we’re talking about, either adding a new non-member overload (perhaps somewhere in a far-away header), or adding a new member function (perhaps somewhere in a far-away base class) could result in a different function begin called when innocent code is recompiled.

With this (proto-)proposal, we can use using class to explicitly say “yes, I know that might happen, and I’m okay with it”. If you’re not okay with it, don’t use using class: you’ll get the same results you do today.

Addendum: using enum

At this point, it’s worth mentioning P1099 “Using enum” by Gašper Ažman and Jonathan Müller, which is looking likely to be part of C++20. Under their proposal, you’ll be able to use a using enum directive to bring the names of enumeration members into the current scope. To take the motivating example from their paper:

enum class rgba_color_channel { red, green, blue, alpha };

std::string_view to_string(rgba_color_channel channel) {
  switch (my_channel) {
    using enum rgba_color_channel;
    case red:   return "red";
    case green: return "green";
    case blue:  return "blue";
    case alpha: return "alpha";
  }
}

The “proposal” presented here is entirely compatible with using enum, and in fact matches its semantics very nicely. Just as using enum makes members of an enumeration available in the current scope, so using class makes members of a class available in the current scope.

So, using class would give us better consistency with namespaces (as envisioned by Stroustrup a quarter of a century ago), would give consistency with using enum in C++20, and solve the UFCS problem. Take that, Vasa!

Quick questions

Here are some off-the-top-of-my-head answers to some off-the-top-of-my-head questions about how a hypothetical using class might work:

  • What’s the difference between using class and using struct?

    Nothing whatsoever. Both should be valid, and they should do exactly the same thing, regardless of which class-key was used when defining the class. Yes, it’s kind of pointless to allow both, but picking one or the other seems like it would be more inconsistent, given they’re almost interchangeable everywhere else in the language.

  • What about using union?

    A union is a kind of class in C++, so it makes sense that you should be able to say using class U where U is a union (subject to the normal UB rules regarding reading from an inactive member).

    Should we additionally allow using union U if and only if U is a union? I can’t see that it would be all that useful, but then again, “why not?”

  • Can I use using class with a forward declaration?

    No. The type must be complete at the point the directive is seen (or at instantiation time for a dependent type in a template).

  • What about private/protected members?

    This is a tricky one. My initial feeling was that only accessible members should be made available via using class: otherwise, we risk having a viable, intended non-member function out-matched by an inaccessible private member, which seems like it would be a pain. It arguably also leaks implementation details of the class into the current scope.

    On the other hand, accessibility testing happens after overload resolution everywhere else in the language (at least as far as I’m aware), so it would probably make more sense not to change that.

  • You’ve only mentioned member functions. What about data members?

    My view is that yes, data members should be made available in the current scope as well, so that we can invoke data members that are callable (e.g. std::function). We could possibly even go further and say that memvar(x) is re-written as x.memvar, which is (roughly) what std::invoke does with PMDs.

    Remember that a local variable in a scope will hide a name that’s merely made available via a using-directive, so this wouldn’t prevent you from naming your local variables whatever you like.

  • What about member types?

    Yep, those too, for consistency with using namespace.

  • What if I say using class X; using class Y; in the same scope?

    Then names from both classes are available in the current scope, just like if you said using namespace N1; using namespace N2;. If the two classes have members with the same name, then overload resolution is performed (in the case of a function call), or it’s ambiguous, and you have to explicitly say which one you mean.

  • Can a using class directive appear at class scope?

    If allowed, this would have the effect of making all of the (public?) names in class B available in class D. I think C++ already has a way of doing this, and adding a second one seems like it would be problematic.

  • Could using class be used to do “full UFCS” (member syntax calling non-member functions too)?

    There is a strong desire in some quarters for a UFCS solution that allows calling non-member functions via the member syntax (“CS2” in Revzin’s terminology). This is akin to what D does, and would allow something similar (at least syntactically) to Rust traits, C# extension methods and Swift extensions. There’s no doubt that the member syntax can often be much more readable, particularly when nesting/chaining many calls.

    So, could using class be extended to allow this? Perhaps, but I haven’t thought about it too deeply, and I’d be hesitant to suggest it.

    In C++, it’s already accepted that an unqualified non-member call can have a open set of candidates: all using class does is offer an opt-in way to extend that set yet further, within a particular scope. By contrast, the set of candidates for a member function call is strictly limited. I think it might be a hard sell to change that.

    That said, if this idea proves popular, and someone comes up with a “full UFCS” implementation, and shows that it doesn’t break any code and allows some nice new patterns, then you never know. Stranger things have happened.

  • using namespace is the devil’s work, and you’re suggesting making things worse!

    It’s true that using namespace – and in particular using namespace std – is often seen an an anti-pattern. But remember that (a) most classes have fewer members than most namespaces, and (b) classes are closed, whereas namespaces are open. This means that (as I see it) there is less risk of unintended consquences with using class than using namespace. I also think that the recommendation would be that using class should normally only be used within function (template) implementations (although banning it outright at namespace scope feels a bit heavy-handed).

In summary

In summary, I read a very old paper, saw something I liked the idea of and had a thought for how it could be used to solve a real problem in C++ today.

This post really contains two proposals in one. Firstly, we re-introduce the idea of allowing a class name in a using-directive, as first suggested by Stroustrup 26 years ago (albeit with a slightly different syntax). Secondly, we allow non-static member function names to participate in overload resolution, if made available via using class.

At this point, this is all very much just an idea, and perhaps a stupid one. It touches on both the third and fourth rails of the C++ standard – name lookup and overload resolution – which are both fearsomely complicated as it is, and which no sane person wants to mess with. I have no doubt there are tricky (and perhaps even obvious) problems I’m overlooking.

Nonetheless, I do like the idea and I think it (perhaps) has merit, and so I thought it was worth writing up as a blog post to get some feedback. Let me know if you think I might be on to something, or if I’ve completely lost the plot.

And who knows, if enough people like it then maybe I’ll write this up more formally…

Beware of copies with std::initializer_list!

2019年2月9日 08:00

Pop quiz: what does the following C++ program do?

#include <memory>
#include <string>
#include <vector>

using namespace std;

int main()
{
    vector<unique_ptr<string>> vec{
        make_unique<string>("foo"),
        make_unique<string>("bar"),
        make_unique<string>("baz")
    };
}

This is a trick question of course: the answer is that the above program will fail to compile. If this surprises you then you’re not alone. When I first wrote something like the above, I expected it to work just fine.

If you want to know why this fails and how we can work around it, then read on.

What’s going on?

Looking at the error message, we can see that the compiler is complaining that it’s trying to use std::unique_ptr’s copy constructor, which is of course deleted.

Which is odd. The calls to std::make_unique() are returning rvalues, so they should simply be moved into place in the vector. Perhaps we need to explicitly use std::move()?

int main()
{
    vector<unique_ptr<string>> vec{
        move(make_unique<string>("foo")),
        move(make_unique<string>("bar")),
        move(make_unique<string>("baz"))
    };
}

Nope, that doesn’t work, and neither should we expect it to – explicitly move()-ing something that is already an prvalue won’t do anything good. (In fact, Clang with -Wall will warn about a pessemistic move in this case, which is rather nice.) What’s going on here? Surely vector’s initializer_list constructor doesn’t always copy, does it?

Let’s have a closer look.

Counting instances

To investigate further, we’ll use a wrapper class which counts how many times a type is constructed, copied, moved and destructed, and prints these stats after main() returns:

template <typename T>
struct instance_counter {

    instance_counter() noexcept { ++icounter.num_construct; }
    instance_counter(const instance_counter&) noexcept { ++icounter.num_copy; }
    instance_counter(instance_counter&&) noexcept { ++icounter.num_move; }
    // Simulate both copy-assign and move-assign
    instance_counter& operator=(instance_counter) noexcept
    {
        return *this;
    }
    ~instance_counter() { ++icounter.num_destruct; }

private:
    static struct counter {
        int num_construct = 0;
        int num_copy = 0;
        int num_move = 0;
        int num_destruct = 0;

        ~counter()
        {
            std::printf("%i direct constructions\n", num_construct);
            std::printf("%i copies\n", num_copy);
            std::printf("%i moves\n", num_move);
            const int total_construct = num_construct + num_copy + num_move;
            std::printf("%i total constructions\n", total_construct);
            std::printf("%i destructions ", num_destruct);
            if (total_construct == num_destruct) {
                std::printf("(no leaks)\n");
            } else {
                std::printf("WARNING: potential leak");
            }
        }
    } icounter;
};

template <typename T>
typename instance_counter<T>::counter instance_counter<T>::icounter{};

template <typename T>
struct counted : T, private instance_counter<T>
{
    using T::T;
};

Let’s now try to construct a vector of counted<std::string> using an initializer list, and see what happens:

int main()
{
    vector<counted<string>> vec{
        "foo", "bar", "baz"
    };
}

This reports

3 direct constructions
3 copies
0 moves
6 total constructions
6 destructions (no leaks)

This reveals the sad truth: things are just as bad as we feared. We are indeed constructing three temporary counted<string> instances, and then copying (not moving!) them into place.

A quick search turns up the explanation, nicely given in this StackOverflow answer. As stated in the answer, an initializer list behaves like an array of const T – and since the elements are const, we cannot move them out of the array.

Well, that’s disappointing. Can we work around this somehow?

Fill your vector like it’s 1998

We want efficient vector construction, but it seems like initializer_list is off the table. Instead, let’s do things the old-fashioned way, by first constructing the vector and then calling push_back() for each element:

int main()
{
    std::vector<counted<string>> vec;
    vec.push_back("foo");
    vec.push_back("bar");
    vec.push_back("baz");
}

This time (on my system) we get

3 direct constructions
0 copies
6 moves
9 total constructions
9 destructions (no leaks)

This is potentially better – we’re not copying the strings any more – but not ideal. We’re getting six moves, when really only three should be required. Why is that?

The reason is that as we’re adding elements to the vector one-by-one, it needs to resize itself to accommodate the new elements. As it does this, it will move the already-constructed elements into the newly-allocated block of memory (but only if the value type has a noexcept move constructor – otherwise it will copy them into place). But since we know in advance how many elements we are going to add to the vector, we can avoid this reallocation by calling vector::reserve() up front:

int main()
{
    std::vector<counted<string>> vec;
    vec.reserve(3);
    vec.push_back("foo");
    vec.push_back("bar");
    vec.push_back("baz");
}

This time we get

3 direct constructions
0 copies
3 moves
6 total constructions
6 destructions (no leaks)

Just three constructions followed by three moves, which is (nearly) ideal.

This sounds a bit iffy

Using the above method is more efficient in terms of copies than using an initializer list, but one downside is that we cannot make our vector const, since we need to modify it after its initial construction. Fortunately, we can get around that by using an immediately-invoked function expression (“IIFE”), in the form of a lambda:

int main()
{
    const auto vec = [] {
        std::vector<counted<string>> v;
        v.reserve(3);
        v.push_back("foo");
        v.push_back("bar");
        v.push_back("baz");
        return v;
    }();
}

Using immediately-invoked lambdas like this to allow complex initialisation of const variables is a really nice pattern that doesn’t seem to be used all that much in C++.

Fewer lambdas, more templates

So we’ve now got a way of (almost) optimally constructing a vector from a bunch of elements, but having to write lambdas all the time will get tiresome pretty quickly. Can we somehow abstract this into a reusable function? The answer is yes, using a variadic function template:

template <typename T, typename... Args>
std::vector<T> make_vector(Args&&... args)
{
    std::vector<T> vec;
    vec.reserve(sizeof...(Args));
    (vec.push_back(std::forward<Args>(args)), ...);
    return vec;
}

int main()
{
    const auto vec = make_vector<counted<string>>(
        "foo", "bar", "baz"
    );
}

The push_back() call here is in a fold expression, which is a new feature in C++17. For C++11 and 14, we can emulate the same thing by taking advantage of the rules around pack expansions in initializer lists. If you’re squeamish, look away now:

template <typename T, typename... Args>
std::vector<T> make_vector(Args&&... args)
{
    std::vector<T> vec;
    vec.reserve(sizeof...(Args));
    using arr_t = int[];
    (void) arr_t{0, (vec.push_back(std::forward<Args>(args)), 0)...};
    return vec;
}

If you’re interested in how this monstrosity works, check out this StackOverflow question. The gist of it is that we are creating a temporary, unnamed int array and filling it with zeros, and calling push_back() for each element as a side-effect. The cast to void is just to avoid a compiler warning about an unused variable.

Running the above code, we can see that we get the same results as before: three constructions followed by three moves.

3 direct constructions
0 copies
3 moves
6 total constructions
6 destructions (no leaks)

Removing the moves

The next question is, do we really need the three moves? Can’t we just construct the strings directly inside the vector? Fortunately we can, by changing our push_back() calls to emplace_back():

template <typename T, typename... Args>
std::vector<T> make_vector(Args&&... args)
{
    std::vector<T> vec;
    vec.reserve(sizeof...(Args));
    using arr_t = int[];
    (void) arr_t{0, (vec.emplace_back(std::forward<Args>(args)), 0)...};
    return vec;
}

int main()
{
    const auto vec = make_vector<counted<string>>(
        "foo", "bar", "baz"
    );
}

This time, we get

3 direct constructions
0 copies
0 moves
3 total constructions
3 destructions (no leaks)

which is perfect, as far as the number of constructor invocations goes. However, there is a (potential) downside to this: emplace_back() will happily call explicit constructors, whereas push_back() (which relies on an implicit conversion to the value type first) will not. This doesn’t make any difference in this case (since std::string has an implicit constructor from const char*), but it would be noticable with, for example, string_view:

int main()
{
    using namespace std::string_view_literals;

    const auto vec = make_vector<counted<string>>(
        "foo"sv, "bar"sv, "baz"sv
    );
}

The above will work if we use emplace_back(), but will fail with push_back() because the std::string(std::string_view) constuctor is explicit. Since the whole point of explicit constructors is that their use should be, well, explicit, my preference would normally be to avoid hiding them and just use push_back().

But we’ve got this far, so we may as well aim for perfection: we can guard against using explicit constructors by inserting a static_assert inside make_vector(), along the lines of

static_assert((std::is_convertible_v<Args, T> && ...),
              "All arguments to make_vector() must be implicitly convertible to T");

and then use emplace_back() for maximal efficiency. An alternative would be to use std::enable_if to disable the function if each argument type is not implicitly convertible to the vector’s value type.

Finishing touches

We’re getting there now, but there’s another improvement we can make. C++17 introduces class template argument deduction, which allows you to say

auto v = std::vector{1, 2, 3};

which will deduce the vector’s value type as int. We can do something similar with our make_vector() by introducing a tiny bit of metaprogramming:

namespace detail {

template <typename T, typename...>
struct vec_type_helper {
    using type = T;
};

template <typename... Args>
struct vec_type_helper<void, Args...> {
    using type = typename std::common_type<Args...>::type;
};

template <typename T, typename... Args>
using vec_type_helper_t = typename vec_type_helper<T, Args...>::type;

}

Now, detail::vec_type_helper_t<T, Args...> will be T, except if that is void, in which case it will be the common type of the arguments. We can then modify our make_vector() function to use this, defaulting the first template parameter:

template <typename T = void, typename... Args,
          typename V = detail::vec_type_helper_t<T, Args...>>
std::vector<V> make_vector(Args&&... args)
{
    std::vector<V> vec;
    vec.reserve(sizeof...(Args));
    using arr_t = int[];
    (void) arr_t{0, (vec.push_back(std::forward<Args>(args)), 0)...};
    return vec;
}

Now in simple cases we can say

const auto vec = make_vector(1, 2, 3);

and have the value type correctly deduced for us. As an added bonus, this works all the way back to C++11, not just C++17. And of course, we can still specify the value type explicitly too:

const auto vec = make_vector<string>("foo", "bar", "baz");

What’s more, if we don’t supply the value type then this will fail to compile if the arguments have no common type (or a common type which is ambiguous). So, no surprises.

Conclusion

A complete implementation of make_vector() can be found in this gist. It backwards compatible to C++11, minimises the number of moves and copies, allows deduction of the value type from its arguments, and works with move-only types like std::unique_ptr. It’s superior to vector’s initializer_list constructor whenever the value type is not trivially copyable, and exactly equivalent otherwise.

Of course, the real lesson here is that a function like make_vector() shouldn’t be needed at all. This is something the language should handle properly. It’s especially sad given that if we use a raw dynamic array rather than a vector, things work just fine:

int main()
{
    auto vec = std::unique_ptr<counted<string>[]>{
        new counted<string>[]{"foo", "bar", "baz"}
    };
}
3 direct constructions
0 copies
0 moves
3 total constructions
3 destructions (no leaks)

All in all, it’s hard to escape the conclusion that C++11’s “uniform initialisation” (or “unicorn initialisation” as some call it) was a mistake – for this and other reasons. Sadly it’s one that may be too late to fix now.

A more useful compile-time quicksort in C++17

2017年6月6日 08:00

A few days ago a post from Björn Fahller did the rounds about a compile-time quicksort in C++, using template metaprogramming. It is, as the author concludes, “of limited usefulness, but kind of cool”. Today, I happened to see a follow-up in D, effectively pointing out that D’s standard library sort() function can be used at compile-time.

This got me thinking: can we implement a normal sort() function in C++, usable at compile-time as in D, but avoiding the need for any unpleasant metaprogramming?

It turns out the answer is yes. Here it is, in all its glory:

namespace cstd {

template <typename RAIt>
constexpr RAIt next(RAIt it,
                    typename std::iterator_traits<RAIt>::difference_type n = 1)
{
    return it + n;
}

template <typename RAIt>
constexpr auto distance(RAIt first, RAIt last)
{
    return last - first;
}

template<class ForwardIt1, class ForwardIt2>
constexpr void iter_swap(ForwardIt1 a, ForwardIt2 b)
{
    auto temp = std::move(*a);
    *a = std::move(*b);
    *b = std::move(temp);
}

template<class InputIt, class UnaryPredicate>
constexpr InputIt find_if_not(InputIt first, InputIt last, UnaryPredicate q)
{
    for (; first != last; ++first) {
        if (!q(*first)) {
            return first;
        }
    }
    return last;
}

template<class ForwardIt, class UnaryPredicate>
constexpr ForwardIt partition(ForwardIt first, ForwardIt last, UnaryPredicate p)
{
    first = cstd::find_if_not(first, last, p);
    if (first == last) return first;

    for(ForwardIt i = cstd::next(first); i != last; ++i){
        if(p(*i)){
            cstd::iter_swap(i, first);
            ++first;
        }
    }
    return first;
}

}

template<class RAIt, class Compare = std::less<>>
constexpr void quick_sort(RAIt first, RAIt last, Compare cmp = Compare{})
{
    auto const N = cstd::distance(first, last);
    if (N <= 1) return;
    auto const pivot = *cstd::next(first, N / 2);
    auto const middle1 = cstd::partition(first, last, [=](auto const& elem){
        return cmp(elem, pivot);
    });
    auto const middle2 = cstd::partition(middle1, last, [=](auto const& elem){
        return !cmp(pivot, elem);
    });
    quick_sort(first, middle1, cmp); // assert(std::is_sorted(first, middle1, cmp));
    quick_sort(middle2, last, cmp);  // assert(std::is_sorted(middle2, last, cmp));
}

template <typename Range>
constexpr auto sort(Range&& range)
{
    quick_sort(std::begin(range), std::end(range));
    return range;
}

There’s nothing particularly magical about this code. I put it together in about 10 minutes, and I can’t take any credit for it. The quicksort implementation is taken from this excellent post by user TemplateRex on StackOverflow, and the rest is just reimplementations of a few standard library algorithms which are (sadly) not marked as constexpr (even in C++17). I copied partition() and find_if_not() from the sample implementations on cppreference, and the others are trivial (for random access iterators, anyway).

(EDIT: TemplateRex – again! – has pointed out on Reddit that std::advance(), next(), prev() and distance() are in fact constexpr in C++17. Unfortunately these changes didn’t make it into GCC 7.1 which I used for testing, but at least the code above will get a little bit smaller once compilers/libraries fully implement the new standard.)

The cool thing is that this version is usable both at compile-time:

int main()
{
    constexpr auto arr = sort(std::array{4, 2, 1, 3});

    std::copy(std::cbegin(arr), std::cend(arr),
              std::ostream_iterator<int>{std::cout, " "});
}

Wandbox

…and at run-time.

int main()
{
    const auto vec = sort(std::vector{4, 2, 1, 3});

    std::copy(std::cbegin(vec), std::cend(vec),
              std::ostream_iterator<int>{std::cout, " "});
}

Wandbox

And that’s really all there is to it. While std::sort() is more complicated than a simple quicksort (from what I understand, std::sort() uses a variety of different algorithms in different situations), on the face of it there doesn’t appear to be any fundamental reason why it (and the other STL algoithms) couldn’t be made available at compile-time, as in D. Perhaps we’ll see it in C++20?

The Case for Optional References

2016年9月19日 08:00

[Note: at the time this post was written (September 2016), the version of std::variant proposed for C++17 permitted reference types at alternatives, although the semantics of assignment were not clear. At the November 2016 meeting the C++ commitee chose to resolve this by forbidding variant<T&>. This removes the inconsistency with std::optional<T&>, and thus the main point I was trying to make in this post. Nonetheless, I’d still like to see optional<T&> and I hope we can come to some consensus about the expected assignment semantics at some point in the future.]

I have a confession to make. Whenever I’ve come across code that looks like this:

struct example {
    example() = default;

    example(std::string& s) : str_{s} {}

private:
    boost::optional<std::string&> str_{};
};

there is a little voice inside my head that whispers “why didn’t you just use a pointer?”. Like so, for instance:

struct example {
    example() = default;

    example(std::string& s) : str_{&s} {}

private:
    std::string* str_ = nullptr;
};

This is equivalent to the first example, except that it’s slightly less typing, it doesn’t have any dependencies, and feels in some sense “cleaner”. I personally have always preferred it.

Except, I was wrong. After attending Bjarne Stroustrup’s keynote and this excellent talk at Cppcon this morning, I’m persuaded that optional references are a good thing. In this post I hope to be able to convince you of the same.

std::optional<T&>?

It seems I’m not the only one who preferred the pointer form. Whereas boost::optional allows reference types, the standardised version std::optional (forthcoming in C++17, and available now in libstdc++ and libc++ as std::experimental::optional) does not. N3527, revision 3 of the paper proposing std::optional, says:

The intention is that the Committee should have an option to accept optional values without optional references, if it finds the latter concept inacceptable [sic]

The paper also goes on to talk about one potential pitfall of optional<T&>, namely assignment, that we’ll talk about later. But first, the pros.

Expressing intent

Pointers represent memory addresses. When we say

int a = 0;
int* b = &a;

we are assigning the address of a to the variable b. But if we’re using a pointer as an optional reference then this is misleading: we don’t care about memory addresses at all. We only care if it’s nullptr or not. Compare the above with

int a = 0;
optional<int&> b{a};

Here, we are explicitly saying that b is either empty, or a reference to an int. The difference becomes more clear when we look at a function signature:

void f(int* pi);

Is pi allowed to be nullptr here? Without looking at the function documentation, we have no idea. Contrast this with

void g(optional<int&> oi);

Now it is immediately clear that g() takes a reference to an int, or nothing. We are expressing our intent much more clearly.

Bjarne made the point in his keynote (as he has done several times before) that he would like C++ to be more friendly to novice programmers. By introducing std::array<T>, the committee removed one of the major uses of raw pointers inherited from C; permitting optional<T&> would remove another. It would allow us to say “only use raw pointers in low-level code”, and would make the learning and teaching of C++ that little bit easier.

Better syntax

I don’t think I’m alone in finding the syntax for declaring const pointers and pointers-to-const to be anything but intuitive. Any introduction to pointers contains an example such as the following:

int i = 0;
int* a = &i;              // mutable pointer to mutable int
const int* a = &i;        // mutable pointer to const int
int const* b = &i;        // as above, pointer to const int
int* const c = &i;        // const pointer to mutable int
int const * const d = &i; // const pointer to const int;

(It’s telling that even after many years of writing C and C++, I still had to double-check that to make sure I got it right.)

Compare this to the equivalent with optional references:

int i = 0;
auto a = optional<int&>{i};
auto b = optional<const int&>{i};
const auto c = optional<int&>{i};
const auto d = optional<const int&>{i};

The use of the template syntax (and AAA style) immediately makes it clear whether it is the wrapper or the contained referee that is const. The same cannot be said for int const * versus int * const, especially for novices.

Again, there is an analogy with std::array vs C arrays: have you ever tried to pass a C array by reference to a function? The syntax is odd to say the least:

void f(const std::string (&arg)[4]); // huh? are we taking the address of arg?

whereas with std::array, it’s obvious what’s going on:

void f(const std::array<std::string, 4>& arg); // oh, I see

As above, clear, consistent, familiar syntax allows us to express our intent more clearly.

No pointer arithmetic

Optional references do not allow pointer arithmetic. This removes a potential source of errors:

int* maybe_increment(int* pi)
{
    if (pi) {
        pi += 1;
    }
    return pi;
}

Did you spot the bug? The above is legal C++ and compiles without warning, but it’s very unlikely that it does what the author intended. Compare this to the optional reference case:

optional<int&> maybe_increment(optional<int&> oi)
{
    if (oi) {
        // oi += 1; // Error, no match for operator+=
        *oi += 1; // That's better
    }
    return oi;
}

Consistency with std::variant

Another new addition to C++17 is std::variant. One of the things that variant allows is to specify that an empty state is permitted by using std::monostate. For example:

auto v = variant<monostate, int>{}; // v is either empty, or contains an int

Such a variant is conceptually the same as an optional<int>, except that the latter has a more specialised, friendlier API. This is in much the same way that pair<int, float> is conceptually the same as tuple<int, float>, except that it provides a more specialised API to directly access the two members.

But here’s the thing: variant<monostate, int&> is permitted, behaving as if it contained a std::reference_wrapper<int>. So why not optional<int&>? Such an inconsistency is needless and confusing.

EDIT: /u/tvaneerd pointed out on Reddit that I was mistaken – the standard says it’s permissible to use a reference wrapper, not necessarily std::reference_wrapper. I believe my point stands though: if variant<monostate, T&> is permitted (and it seems that this is explicitly so), then it is inconsistent to forbid optional<T&>.

Generic programming

The lack of optional references means that generic code which deals with optionals is unnecessarily complicated.

N3527, referenced above, acknowledges this, saying:

Users that in generic contexts require to also store optional lvalue references can achieve this effect, even without direct support for optional references, with a bit of meta-programming.

and providing the following workaround:

template <class T>
struct generic
{
  typedef T type;
};

template <class U>
struct generic<U&>
{
  typedef std::reference_wrapper<U> type;
};

template <class T>
using Generic = typename generic<T>::type;

template <class X>
void generic_fun()
{
  std::optional<Generic<X>> op;
  // ...
}

Urgh. This should not be necessary.

Zero cost

Like std::unique_ptr, optional references can be implemented as a zero-overhead abstraction around a raw pointer. In fact, boost::optional does just this.

There is no reason std::optional<T&> could not do the same.

The assignment problem (and what to do about it)

Having given the plus points of allowing optional references, it’s only fair to give some time to the potential downsides as well. In particular, deciding what to do about assignment appears to be a bit tricky.

Version 2 of the paper proposing std::optional (N3406) goes into a bit more detail about the assignment problem. In short, when assigning one optional reference to another, should it behave like a pointer, or like a built-in language reference?

int i = 0;
int j = 1;

int* p1 = &i;
int* p2 = &j;
p2 = p1; // Pointer is rebound, p2 now points to i

int& r1 = i;
int& r2 = j;
r2 = r1; // No reference rebinding, j now equals 0

auto o1 = optional<int&>{i};
auto o2 = optional<int&>{j};
o2 = o1; // Should we rebind o2, or copy the value?

From what I can gather, this was one of the major points of contention regarding optional references. As the paper makes clear, there are arguments to be made both ways about how it should behave, and it was not obvious as the time the paper was proposed which way optional should go.

However, I believe that we should now firmly come down on the side of rebinding. Why? Recall from above that optional<T> can be regarded as a special case of variant<monostate, T>. So what does variant<monostate, T&> do when we assign to it? I don’t have an implementation of std::variant to hand, but since it permits implementations to store references in a reference_wrapper, ~I believe it must rebind.~

EDIT: As above, it’s been pointed out that I jumped to conclusions. The standard merely says that a reference wrapper is permitted, not that this should be std::reference_wrapper. It seems the rebind vs copy-through debate may still be up in the air for variant.

So, whether the committee intended it or not, it has made a decision; variant permits references, and so whatever it does on assignment, optional should do too. (For what it’s worth, boost::optional rebinds on assignment.)

In summary

By allowing optional references, the committee would add a zero-overhead abstraction which would allow C++ programmers to make their intent clearer, to make code easier to read, to reduce the possibility for errors, and make teaching the language easier. It would also remove an unnecesary incompatibility between optional and variant.

A Quicker Study on Tokenising

2016年1月28日 08:00

I recently stumbled upon this nice blog post by Josh Barczak, comparing the performance of various C++ string tokenisation routines. The crux of it is that by writing low-level C-like code, Josh was able to get better performance than by using Boost or either of the standard library solutions he tried.

This post is meant as a rebuttal, showing that by using the STL properly we can get simple, elegant, generic, reusable code that still performs better than the hand-coded solution.

The problem

So let’s take look at the problem. In his code, Josh takes a reasonably large (~20MB) text file, splits it up into tokens, and then copies those tokens to an output file. The final hand-coded method looks like this:

static bool IsDelim( char tst )
{
    const char* DELIMS = " \n\t\r\f";
    do // Delimiter string cannot be empty, so don't check for it
    {
        if( tst == *DELIMS )
            return true;
        ++DELIMS;
    } while( *DELIMS );

    return false;
}

void DoJoshsWay( std::ofstream& cout, std::string& str)
{
    char* pMutableString = (char*) malloc( str.size()+1 );
    strcpy( pMutableString, str.c_str() );

    char* p = pMutableString;

    // skip leading delimiters
    while( *p && IsDelim(*p) )
        ++p;

    while( *p )
    {
        // note start of token
        char* pTok = p;

        do// skip non-delimiters
        {
            ++p;
        } while( !IsDelim(*p) && *p );

        // clobber trailing delimiter with null
        *p = 0;
        cout << pTok; // send the token

        do // skip null, and any subsequent trailing delimiters
        {
            ++p;
        } while( *p && IsDelim(*p) );
    }

    free(pMutableString);
}

Now I don’t want to pick on Josh, because this code works, and it’s faster than anything else he tried. But… well, let’s just say it’s not a style of code I would enjoy working with. Let’s see how we can come up with something better.

Evolving a splitting algorithm

First, let’s take a step back and look at things from a mile-high view: given an input string str and a set of delimiters delim, how would you describe to someone in plain English how to split the string? Bear in mind that although it’s not required in this case, for other uses we we may want to keep empty tokens which occur when we have two consecutive delimiters.

It turns out this isn’t so easy. My effort is the following:

“If the string is empty, then you’re done. Otherwise, take the first character of str, and call it first. If first is a delimiter, then the string begins with an empty token; save the token, remove first from the string and start again. If first is not a delimiter, then scan along the string from first until you find another character which is a delimiter, or else reach the end of the string. Now the interval [first, last) consists of one token; save that token. Remove the closed interval [first, last] from the string and restart.”

Translating this naively into C++, we get something this:

// Version 1
bool is_delimiter(char value, const string& delims)
{
       for (auto d : delims) {
           if (d == value) return true;
       }
       return false;
}

vector<string>
split(string str, string delims)
{
    vector<string> output;

    while (str.size() > 0) {
        if (is_delimiter(str[0], delims)) {
            output.push_back("");
            str = str.substr(1);
        } else {
            int i = 1;
            while (i < str.size() &&
                   !is_delimiter(str[i], delims))  {
                i++;
            }
            output.emplace_back(str.begin(), str.begin() + i);
            if (i + 1 < str.size()) {
                str =  str.substr(i + 1);
            } else {
                str = "";
            }
        }
    }

    return output;
}

This algorithm actually works, provided you’ve got enough patience – with all the string copies going on, it’s very, very slow. (Though if you use std::experimental::string_view, which doesn’t copy but just updates a couple of pointers, then this actually performs respectably – but we can still do better.)

Now this isn’t great, but it’s at least something to start with. Let’s iterate on it and see where we get. The first thing we want to do is to stop making all the string copies. That’s actually not too difficult. Instead of chopping the front off the string every time we go through the loop, we’ll use a variable to keep track of the “start”. Having done that, and with a minor tidy-up, we arrive at:

// Version 2
bool is_delimiter(char value, const string& delims)
{
       for (auto d : delims) {
           if (d == value) return true;
       }
       return false;
}

vector<string>
split(const string& str, const string& delims)
{
    vector<string> output;
    int first = 0;

    while (str.size() - first > 0) {
        if (is_delimiter(str[first], delims)) {
            output.push_back("");
            ++first;
        } else {
            int second = first + 1;
            while (second < str.size() &&
                   !is_delimiter(str[second], delims))  {
                ++second;
            }
            output.emplace_back(str.begin() + first, str.begin() + second);
            if (second == str.size()) {
                break;
            }
            first =  second + 1;
        }
    }

    return output;
}

Again, this works, and it’s much faster than before. But… well, it’s ugly. We have two different cases depending on whether str[first] is a delimiter or not, and we’re calling is_delimiter() twice in many cases. Can we collapse these cases down to a single one?

It turns out we can, with just a minor change: instead of defining second to start at first + 1, we just initialize it with first instead. Now, if first is a delimiter then the inner while loop will exit immediately and second will never be incremented, so we end up emplacing an empty string in the vector just as we’d like. Once we collapse down the two cases, we end up with version 3 of our algorithm:

// Version 3
bool is_delimiter(char value, const string& delims)
{
       for (auto d : delims) {
           if (d == value) return true;
       }
       return false;
}

vector<string>
split(const string& str, const string& delims)
{
    vector<string> output;
    int first = 0;

    while (first < str.size()) {
        int second = first;
        while (second < str.size() &&
               !is_delimiter(str[second], delims))  {
            ++second;
        }
        output.emplace_back(str.begin() + first, str.begin() + second);
        if (second == str.size()) {
            break;
        }
        first =  second + 1;
    }

    return output;
}

We’ve only removed 4 lines, but this is already beginning to look much better.

Enter the iterator

Up until now, you’ll notice that we haven’t used iterators, but rather first and second have been integer indices into the string. That was deliberate, because a lot of people seem to be put off by iterators. They needn’t be: an iterator is really just an index into some set. All we need to do is to change first = 0 to first = std::cbegin(str), and the str.size() checks into checks against std::cend(str):

// Version 4
bool is_delimiter(char value, const string& delims)
{
       for (auto d : delims) {
           if (d == value) return true;
       }
       return false;
}

vector<string>
split(const string& str, const string& delims)
{
    vector<string> output;
    auto first = cbegin(str);

    while (first != cend(str)) {
        auto second = first;
        while (second != cend(str) &&
               !is_delimiter(*second, delims))  {
            ++second;
        }
        output.emplace_back(first, second);
        if (second == cend(str)) {
            break;
        }
        first =  next(second);
    }

    return output;
}

As you can see, the code is barely any different, and performs identically.

Now, let’s turn our attention to the inner while loop. Slightly reorganised, this is:

auto second = first;
while (second != cend(str) {
       if (is_delimiter(*second, delims))  {
           return second;
       }
    ++second;
}

What is this snippet really doing? It’s saying “find the first element in the interval [first, end()) which is a delimiter”. Now, “is a delimiter” means “is a member of the set of delimiters”, so if we put these together then the while loop is saying

“Find the first element of the interval [first, end()) which is also in the interval [delims.begin(), delims.end())

Fortunately for us, there is a stardard algorithm that does exactly this: it’s called std::find_first_of(). Let’s update our code to use this algorithm, which gives us the (almost) final version:

// Version 5
vector<string>
split(const string& str, const string& delims)
{
    vector<string> output;
    auto first = cbegin(str);

    while (first != cend(str)) {
        const auto second = find_first_of(first, cend(str),
                                          cbegin(delims), cend(delims));
        output.emplace_back(first, second);
        if (second == cend(str)) break;
        first =  next(second);
    }

    return output;
}

This version still adds empty strings to the output when it comes across two consecutive delimiters. This is sometimes what people want, but sometimes not, so let’s make it an option. Also, the most common case for splitting is to use a single space as a delimiter, so we’ll use that as a default parameter. Making these changes, and putting back the std:: directives that we have so far elided, we our final string splitter:

// Final version
std::vector<std::string>
split(const std::string& str, const std::string& delims = " ",
      bool skip_empty = true)
{
    std::vector<std::string> output;
    auto first = std::cbegin(str);

    while (first != std::cend(str)) {
        const auto second = std::find_first_of(first, std::cend(str),
                                               std::cbegin(delims), std::cend(delims));
        if (first != second || !skip_empty) {
            output.emplace_back(first, second);
        }
        if (second == std::cend(str)) break;
        first =  std::next(second);
    }

    return output;
}

This code is simple enough that we don’t even need to add comments. The core of it is the find_first_of() call, which is easily looked up even if you can’t guess what it does from the name. But we can do better yet.

A more generic tokeniser

It’s long been a criticism of those coming to C++ from other languages that there is no split() function for strings in the standard library. The reason is that doing so in a generic way is pretty tricky. Let’s have a try at it now:

// Bad generic split
template <class Input, class Delims>
vector<Input>
split(const Input& input, const Delims& delims,
      bool skip_empty = true)
{
    vector<typename Input> output;
    auto first = cbegin(input);

    while (first != cend(input)) {
        const auto second = find_first_of(first, cend(input),
                                          cbegin(delims), cend(delims));
        if (first != second || !skip_empty) {
            output.emplace_back(first, second);
        }
        if (second == cend(input)) break;
        first =  next(second);
    }

    return output;
}

Unfortunately, this falls apart as soon as we try to call it with a string literal, because the compiler will complain that it cannot intantiate a vector<const char[17]> (or something similar). Also, what if we don’t want to output a vector? A generic solution should surely let us use whatever container we like. What if we are streaming in the input via istream_iterator? How do we pass the output to an ostream?

This problem is pretty tricky, but it’s not insurmountable. Our splitting algorithm is sound – it will work for anything that models the InputIterator concept. The problem is, what do we do with the tokens once we’ve found them? Actually, the answer is obvious: we should let the caller do whatever they like, by letting them pass in a function which we will call every time we find a token.

Then our generic solution then looks like this:

// Good generic "split"
template <class InputIt, class ForwardIt, class BinOp>
void for_each_token(InputIt first, InputIt last,
                    ForwardIt s_first, ForwardIt s_last,
                    BinOp binary_op)
{
    while (first != last) {
        const auto pos = std::find_first_of(first, last, s_first, s_last);
        binary_op(first, pos);
        if (pos == last) break;
        first = std::next(pos);
    }
}

This simply calls the given function for each token (hence the name), passing the first and one-past-the-end iterators we’ve found. Writing our split() for strings in terms of this generic function is trivial:

vector<string>
split(const string& str, const string& delims = " ",
      bool skip_empty = true)
{
    vector<string> output;
    for_each_token(cbegin(str), cend(str),
                   cbegin(delims), cend(delims),
                   [&output] (auto first, auto second) {
        if (first != last || !skip_empty) {
            output.emplace_back(first, second);
        }
    });
    return output;
}

Our generic for_each_token() is simple, elegant, and it shows off very nicely the power of the STL. All of which is very nice, but pointless if it isn’t fast. Is it fast?

Yes. Yes it is.

Performance

In order to measure performance, we’ll use Josh’s original microbenchmark from here, slightly modified to use a timer based on std::chrono::high_performance_clock rather than boost::timer. Re-running the original tests on my system (GCC 5.3 with -O3 on a Macbook Pro running OS X El Capitan), and taking the average of 5 runs for each algorithm, I get the following profile:

Original

As you can see, the results are almost the same as Josh’s, except that Boost does slightly better this time. Josh’s approach is still fastest, with strtok() a close second.

Now let’s add our method, using the generic algorithm above. This is the code I added:

// 5 statements
template <class InputIt, class ForwardIt, class BinOp>
void for_each_token(InputIt first, InputIt last,
                    ForwardIt s_first, ForwardIt s_last,
                    BinOp binary_op)
{
    while (first != last) {
        const auto pos = find_first_of(first, last, s_first, s_last);
        binary_op(first, pos);
        if (pos == last) break;
        first = next(pos);
    }
}

// 2 statements
void DoTristansWay(std::ofstream& cout, std::string str)
{
    constexpr char delims[] = " \n\t\r\f";
    for_each_token(cbegin(str), cend(str),
                   cbegin(delims), cend(delims),
                   [&cout] (auto first, auto second) {
                       if (first != second)
                           cout << string(first, second);
                   });
}

The full code is available in this gist.

This time, the results look like this:

Modified

As you can see, our algorithm is by far the fastest of all the options. The average runtime for the generic algorithm on my system is 220ms, against 285ms for the next best – that’s a 1.3x speed-up.

Not only that, but we’ve done it with just seven statements (using the metric from the original post), as opposed to 21 for the low-level version. 1.3x the performance with 1/3rd of the code? I’ll take that any day of the week. That’s the power of the STL.

Discussion

In the original post, Josh presents the “conventional wisdom” as being the following:

You shouldn’t roll your own code. The standard library was written by gurus and mere mortals won’t do better. Even if you do, you’ll write bugs, and you’ll end up spending more time fixing them than it’s worth.

This is all true, but I would put it slightly differently: I would say that if a standard library tool exists, then use it, but make sure you pick the right tool for the job.

In this case, despite the prevalence of suggestions on the internet, std::stringstream is the wrong tool to use for string splitting. The right tool is std::find_first_of. Once we use that, then not only do we get simple code, but it turns out to be much faster too.

The beauty of the STL is that it provides composable, low-level algorithms from which you can build up complex behavior. Sean Parent makes this argument far better than me; in his fantastic C++ Seasoning talk at Going Native 2013, he shows how you can use the STL algorithms in places where it’s not obvious. The video is very highly recommended.

Sean advocates that every C++ programmer should share at least one useful algorithm publicly per year. Hopefully I’ve now met my quota for 2016.

❌
❌