阅读视图

这些资源帮助你深入学习C++

2021年1月15日 08:00

这些年来，很多人都向我寻求学习 C++的帮助。我算不上什么 C++专家，但是作为一个从事 C++多年的人，我想在这分享一些高质量并且同时适合初学者的 C++资源。希望这些资源对您有所帮助。

当有人问我有关使用 C++的指导时，我总是首先问他们已有的编程经验经验。有些人刚开始学习编程，并决定学习 C++作为他们的第一门编程语言；有些人已经掌握了少量的 C++，并且想要学习更多；而有些人已经使用了其他语言编程多年，然后尝试学习一些 C++。因为不同的人有不同的背景以及不同的学习目标，所以我会推荐一些不同的材料。

不过，我想提到的一件事是，仅仅阅读书籍或观看视频并不是学习的最佳策略。无论您处于什么阶段，把学到的知识用到实践中都非常重要，因此开始进行一些编程项目会对你的学习很有帮助。

另外一件我想提到的是，我在这里推荐到资源基本上都是英文资源。我强烈建议您试图通过英文资源来学习，因为您只通过中文来学习编程，那么您将失去使用绝大多是好的学习资料的机会。

如果我刚刚开始学习编程并选择 C++作为我的第一门编程语言，我该怎么做？

对于编程初学者来说，我推荐 Bjarne Stroustrup（C++之父）的《C++程序设计：原理与实践》第二版（Programming: Principles and Practice Using C++ 2nd edition）。这本书有中文翻译，不过就像我之前说的，如果您有一点的英文阅读能力，我建议您阅读原版。因为这本书很厚，所以您不一定能够坚持看完整本，但是无论您看了多少页你都能学到东西。

如果您不想要看书， C++专家 Kate Gregory 在 Pluralsight 网站上提供了不少的视频教程。其中她的入门教程是Learn to Program with C++。如果你加入#include<c++> discord 服务器，你可以在服务器内直接为她要一份试用码。

如果我以前已经学习过一些 C++并且想更深入地学习，我该怎么做？

也许您已经从大学数据结构课程中使用过一些 C++，又或者您学习了一些使用 C++的在线教程，接下来该做什么？

根据我的个人经历以及我所听闻的，大多数大学编程课程或那些在线教程的质量都偏低，而且讲师通常对 C++一知半解。您可能会被之前的学习资源所误导，并且学习到了一些错误的实践或者是对概念的误解。因此，选择正确的学习资料是对高效学习十分重要的一点。

在这种情况下，我同样会推荐 Bjarne Stroustrup 的《C++程序设计：原理与实践》第二版。你可以看书看得比纯粹初学者更快一些，不过使用该书来系统地查漏补缺依然很有好处。如果您更喜欢视频教程，可以从 Kate Gregory 的C++ Fundamentals Including C++17开始。

如果我是另一门语言的资深人士并想学习 C++，该怎么办？

如果您已经精通了某个其他的编程语言，并且想开始学习一些 C++，您可以直接选择更加进阶的材料。

对于书来说，我建议阅读 Bjarne Stroustrup 的《C++程序设计语言》第四版（The C++ Programming Language (4th Edition)）。这本书是我读过的最好的技术书籍之一。不过这本书也同样相当得厚。如果您没有时间阅读该书并且想要有一个简短的 C++介绍，您可以购买《A Tour of C++》第二版。

我认为我对 C++有一定的了解了。下一步是什么？

如果您花了数月的时间学习上述资料，并觉得您对 C++基本概念有相当的了解。接下来该做什么？

如果您达到了这个阶段，那么您应该对下列的多数话题有相当的熟悉程度：

如何正确使用const
模板（templates）
引用（references）以及指针（pointers）
对标准库的熟练使用，尤其是迭代器（iterators）以及标准算法（algorithms）
RAII
析构函数（destructor）
复制/移动构造函数以及复制/移动赋值运算符
移动语义(move semantics)
运算符重载（operator overloading）
lambda 表达式以及函数对象
未定义行为（undefined behaviors）

如果你已经到了这个阶段，那么为 C++找到实际用途或许比学习 C++语言本身更重要了。 C++被用于许多不同的用途，而您也可以开始考虑如何把 C++应用在您感兴趣的领域上。

同样，现在是学习 C++生态系统的好时机，您可以花一些时间来深入学习例如Catch2等单元测试库，CMake等构建系统, 以及Conan等包管理器。

另外一个可以考虑的事是开始学习另一门编程语言，尤其是如果您目前仅了解 C++一门语言。下一个不错的选择是与 C++截然不同的语言，例如 Javascript，Python 或 Lisp 等动态类型的语言。

话虽这么说，仍然有无尽得关于 C++语言本身的知识可以学习。我将尝试在以下列出一些我喜欢的资源：

书籍

如果你仍然没有阅读《C++程序设计语言》第四版（The C++ Programming Language (4th Edition)的话，这本书仍然是一个非常好的选择。除此之外，我还有一些其他的书可以推荐：

Scott Mayer 的《Effective Modern C++》
Jason Turner 的《C++ Best Practices》
Nicolai M. Josuttis 的《C++17 - The Complete Guide》

还有一些书籍会关注于某些特定的方向，例如：

David Vandevoorde、Nicolai M. Josuttis、以及 Douglas Gregor 的《C++ Templates - The Complete Guide, 2nd Edition》
Arthur O'Dwyer 的《Mastering the C++17 STL》
Ivan Čukić 的《Functional Programming in C++》
Anthony Williams 的《C++ Concurrency in Action, 2nd edition》

大会讲话视频

大会讲话同样是学习 C++的绝佳资源。下列是一些我个人喜欢并且适合初学者的讲话：

社群

加入编程社群有非常多的好处，你可以向专家提问，知道他人的动态，讨论有关工作的信息，甚至交到一些好朋友。

#include<C++>

#include<C++>是一个非常不错的 C++社群，它提供了一个友好的讨论环境，并且你在里面可以找到多个在 C++界知名的人物。您可以加入它的 discord 服务器并且和大家一起讨论 C++。

见面会(Meetups)

加入North Denver Metro C++ Meetup是我大学阶段做出的最好决定之一。如果您有时间的话，参加一些本地的 C++见面会是一个非常不错的选择。（因为新冠的原因，现在绝大多数见面会以及大会都在网上召开，这有利有弊，但一个很大的优势是您现在可以参加全球的见面会）您可以在meetup.com网站上搜索本地的见面会。

参加大会

如果您认真对待 C++，那么大会是结识志趣相投的人的好地方。这是我知道的一些重复举办的 C++会议：

CppCon
C++Now (tuned toward a more advanced audience)
ACCU
Meeting C++
Pacific++
C++ on Sea
Core C++

除此之外，ISO C++网站上有一个大会列表。

播客

网上有不少 C++的播客，尤其是 2020 年有不少新的播客涌现。当然，所有的这些播客都需要较好的英语听力水平：

博客

我推荐使用 RSS 来关注各种技术博客。我个人关注超过 200 个关于 C++或者其他技术话题的博客，下列是一些我个人认为最好的 C++博客：

需要注意的是某些博文会讨论非常高深的话题，因此您并不一定需要读懂每一篇博文。

其他资料

下面是一些其他有用的 C++资源：

cppreference是最好的 C++语言以及标准库 API 文档网站
Compiler Explorer一个在线编码环境，支持 ++和许多其他语言。它可以编译后的汇编码以及运行程序。
Quick C++ benchmark是一个可以快速对 C++代码进行测速的网站。

引用以及扩展阅读

"SG20 Education And Recommended Videos For Teaching C++". Christopher Di Bella, 2021, https://www.cjdb.com.au/sg20-and-videos. Accessed 15 Jan 2021.
"References And Links". #Include ＜ C++＞, 2021, https://www.includecpp.org/resources/references/. Accessed 16 Jan 2021.
Yaghmour, Shafik. "Where To Get Started Learing C++ And What Resources To Use". Shafik Yaghmour's Blog, 2019, https://shafik.github.io/c++/learning/2019/09/05/getting_started_learning_cpp.html. Accessed 16 Jan 2021.

A Quicker Study on Tokenising

tristanbrindle.com

2016年1月28日 08:00

I recently stumbled upon this nice blog post by Josh Barczak, comparing the performance of various C++ string tokenisation routines. The crux of it is that by writing low-level C-like code, Josh was able to get better performance than by using Boost or either of the standard library solutions he tried.

This post is meant as a rebuttal, showing that by using the STL properly we can get simple, elegant, generic, reusable code that still performs better than the hand-coded solution.

The problem

So let’s take look at the problem. In his code, Josh takes a reasonably large (~20MB) text file, splits it up into tokens, and then copies those tokens to an output file. The final hand-coded method looks like this:

static bool IsDelim( char tst )
{
    const char* DELIMS = " \n\t\r\f";
    do // Delimiter string cannot be empty, so don't check for it
    {
        if( tst == *DELIMS )
            return true;
        ++DELIMS;
    } while( *DELIMS );

    return false;
}

void DoJoshsWay( std::ofstream& cout, std::string& str)
{
    char* pMutableString = (char*) malloc( str.size()+1 );
    strcpy( pMutableString, str.c_str() );

    char* p = pMutableString;

    // skip leading delimiters
    while( *p && IsDelim(*p) )
        ++p;

    while( *p )
    {
        // note start of token
        char* pTok = p;

        do// skip non-delimiters
        {
            ++p;
        } while( !IsDelim(*p) && *p );

        // clobber trailing delimiter with null
        *p = 0;
        cout << pTok; // send the token

        do // skip null, and any subsequent trailing delimiters
        {
            ++p;
        } while( *p && IsDelim(*p) );
    }

    free(pMutableString);
}

Now I don’t want to pick on Josh, because this code works, and it’s faster than anything else he tried. But… well, let’s just say it’s not a style of code I would enjoy working with. Let’s see how we can come up with something better.

Evolving a splitting algorithm

First, let’s take a step back and look at things from a mile-high view: given an input string str and a set of delimiters delim, how would you describe to someone in plain English how to split the string? Bear in mind that although it’s not required in this case, for other uses we we may want to keep empty tokens which occur when we have two consecutive delimiters.

It turns out this isn’t so easy. My effort is the following:

“If the string is empty, then you’re done. Otherwise, take the first character of str, and call it first. If first is a delimiter, then the string begins with an empty token; save the token, remove first from the string and start again. If first is not a delimiter, then scan along the string from first until you find another character which is a delimiter, or else reach the end of the string. Now the interval [first, last) consists of one token; save that token. Remove the closed interval [first, last] from the string and restart.”

Translating this naively into C++, we get something this:

// Version 1
bool is_delimiter(char value, const string& delims)
{
       for (auto d : delims) {
           if (d == value) return true;
       }
       return false;
}

vector<string>
split(string str, string delims)
{
    vector<string> output;

    while (str.size() > 0) {
        if (is_delimiter(str[0], delims)) {
            output.push_back("");
            str = str.substr(1);
        } else {
            int i = 1;
            while (i < str.size() &&
                   !is_delimiter(str[i], delims))  {
                i++;
            }
            output.emplace_back(str.begin(), str.begin() + i);
            if (i + 1 < str.size()) {
                str =  str.substr(i + 1);
            } else {
                str = "";
            }
        }
    }

    return output;
}

This algorithm actually works, provided you’ve got enough patience – with all the string copies going on, it’s very, very slow. (Though if you use std::experimental::string_view, which doesn’t copy but just updates a couple of pointers, then this actually performs respectably – but we can still do better.)

Now this isn’t great, but it’s at least something to start with. Let’s iterate on it and see where we get. The first thing we want to do is to stop making all the string copies. That’s actually not too difficult. Instead of chopping the front off the string every time we go through the loop, we’ll use a variable to keep track of the “start”. Having done that, and with a minor tidy-up, we arrive at:

// Version 2
bool is_delimiter(char value, const string& delims)
{
       for (auto d : delims) {
           if (d == value) return true;
       }
       return false;
}

vector<string>
split(const string& str, const string& delims)
{
    vector<string> output;
    int first = 0;

    while (str.size() - first > 0) {
        if (is_delimiter(str[first], delims)) {
            output.push_back("");
            ++first;
        } else {
            int second = first + 1;
            while (second < str.size() &&
                   !is_delimiter(str[second], delims))  {
                ++second;
            }
            output.emplace_back(str.begin() + first, str.begin() + second);
            if (second == str.size()) {
                break;
            }
            first =  second + 1;
        }
    }

    return output;
}

Again, this works, and it’s much faster than before. But… well, it’s ugly. We have two different cases depending on whether str[first] is a delimiter or not, and we’re calling is_delimiter() twice in many cases. Can we collapse these cases down to a single one?

It turns out we can, with just a minor change: instead of defining second to start at first + 1, we just initialize it with first instead. Now, if first is a delimiter then the inner while loop will exit immediately and second will never be incremented, so we end up emplacing an empty string in the vector just as we’d like. Once we collapse down the two cases, we end up with version 3 of our algorithm:

// Version 3
bool is_delimiter(char value, const string& delims)
{
       for (auto d : delims) {
           if (d == value) return true;
       }
       return false;
}

vector<string>
split(const string& str, const string& delims)
{
    vector<string> output;
    int first = 0;

    while (first < str.size()) {
        int second = first;
        while (second < str.size() &&
               !is_delimiter(str[second], delims))  {
            ++second;
        }
        output.emplace_back(str.begin() + first, str.begin() + second);
        if (second == str.size()) {
            break;
        }
        first =  second + 1;
    }

    return output;
}

We’ve only removed 4 lines, but this is already beginning to look much better.

Enter the iterator

Up until now, you’ll notice that we haven’t used iterators, but rather first and second have been integer indices into the string. That was deliberate, because a lot of people seem to be put off by iterators. They needn’t be: an iterator is really just an index into some set. All we need to do is to change first = 0 to first = std::cbegin(str), and the str.size() checks into checks against std::cend(str):

// Version 4
bool is_delimiter(char value, const string& delims)
{
       for (auto d : delims) {
           if (d == value) return true;
       }
       return false;
}

vector<string>
split(const string& str, const string& delims)
{
    vector<string> output;
    auto first = cbegin(str);

    while (first != cend(str)) {
        auto second = first;
        while (second != cend(str) &&
               !is_delimiter(*second, delims))  {
            ++second;
        }
        output.emplace_back(first, second);
        if (second == cend(str)) {
            break;
        }
        first =  next(second);
    }

    return output;
}

As you can see, the code is barely any different, and performs identically.

Now, let’s turn our attention to the inner while loop. Slightly reorganised, this is:

auto second = first;
while (second != cend(str) {
       if (is_delimiter(*second, delims))  {
           return second;
       }
    ++second;
}

What is this snippet really doing? It’s saying “find the first element in the interval [first, end()) which is a delimiter”. Now, “is a delimiter” means “is a member of the set of delimiters”, so if we put these together then the while loop is saying

“Find the first element of the interval [first, end()) which is also in the interval [delims.begin(), delims.end())”

Fortunately for us, there is a stardard algorithm that does exactly this: it’s called std::find_first_of(). Let’s update our code to use this algorithm, which gives us the (almost) final version:

// Version 5
vector<string>
split(const string& str, const string& delims)
{
    vector<string> output;
    auto first = cbegin(str);

    while (first != cend(str)) {
        const auto second = find_first_of(first, cend(str),
                                          cbegin(delims), cend(delims));
        output.emplace_back(first, second);
        if (second == cend(str)) break;
        first =  next(second);
    }

    return output;
}

This version still adds empty strings to the output when it comes across two consecutive delimiters. This is sometimes what people want, but sometimes not, so let’s make it an option. Also, the most common case for splitting is to use a single space as a delimiter, so we’ll use that as a default parameter. Making these changes, and putting back the std:: directives that we have so far elided, we our final string splitter:

// Final version
std::vector<std::string>
split(const std::string& str, const std::string& delims = " ",
      bool skip_empty = true)
{
    std::vector<std::string> output;
    auto first = std::cbegin(str);

    while (first != std::cend(str)) {
        const auto second = std::find_first_of(first, std::cend(str),
                                               std::cbegin(delims), std::cend(delims));
        if (first != second || !skip_empty) {
            output.emplace_back(first, second);
        }
        if (second == std::cend(str)) break;
        first =  std::next(second);
    }

    return output;
}

This code is simple enough that we don’t even need to add comments. The core of it is the find_first_of() call, which is easily looked up even if you can’t guess what it does from the name. But we can do better yet.

A more generic tokeniser

It’s long been a criticism of those coming to C++ from other languages that there is no split() function for strings in the standard library. The reason is that doing so in a generic way is pretty tricky. Let’s have a try at it now:

// Bad generic split
template <class Input, class Delims>
vector<Input>
split(const Input& input, const Delims& delims,
      bool skip_empty = true)
{
    vector<typename Input> output;
    auto first = cbegin(input);

    while (first != cend(input)) {
        const auto second = find_first_of(first, cend(input),
                                          cbegin(delims), cend(delims));
        if (first != second || !skip_empty) {
            output.emplace_back(first, second);
        }
        if (second == cend(input)) break;
        first =  next(second);
    }

    return output;
}

Unfortunately, this falls apart as soon as we try to call it with a string literal, because the compiler will complain that it cannot intantiate a vector<const char[17]> (or something similar). Also, what if we don’t want to output a vector? A generic solution should surely let us use whatever container we like. What if we are streaming in the input via istream_iterator? How do we pass the output to an ostream?

This problem is pretty tricky, but it’s not insurmountable. Our splitting algorithm is sound – it will work for anything that models the InputIterator concept. The problem is, what do we do with the tokens once we’ve found them? Actually, the answer is obvious: we should let the caller do whatever they like, by letting them pass in a function which we will call every time we find a token.

Then our generic solution then looks like this:

// Good generic "split"
template <class InputIt, class ForwardIt, class BinOp>
void for_each_token(InputIt first, InputIt last,
                    ForwardIt s_first, ForwardIt s_last,
                    BinOp binary_op)
{
    while (first != last) {
        const auto pos = std::find_first_of(first, last, s_first, s_last);
        binary_op(first, pos);
        if (pos == last) break;
        first = std::next(pos);
    }
}

This simply calls the given function for each token (hence the name), passing the first and one-past-the-end iterators we’ve found. Writing our split() for strings in terms of this generic function is trivial:

vector<string>
split(const string& str, const string& delims = " ",
      bool skip_empty = true)
{
    vector<string> output;
    for_each_token(cbegin(str), cend(str),
                   cbegin(delims), cend(delims),
                   [&output] (auto first, auto second) {
        if (first != last || !skip_empty) {
            output.emplace_back(first, second);
        }
    });
    return output;
}

Our generic for_each_token() is simple, elegant, and it shows off very nicely the power of the STL. All of which is very nice, but pointless if it isn’t fast. Is it fast?

Yes. Yes it is.

Performance

In order to measure performance, we’ll use Josh’s original microbenchmark from here, slightly modified to use a timer based on std::chrono::high_performance_clock rather than boost::timer. Re-running the original tests on my system (GCC 5.3 with -O3 on a Macbook Pro running OS X El Capitan), and taking the average of 5 runs for each algorithm, I get the following profile:

Original

As you can see, the results are almost the same as Josh’s, except that Boost does slightly better this time. Josh’s approach is still fastest, with strtok() a close second.

Now let’s add our method, using the generic algorithm above. This is the code I added:

// 5 statements
template <class InputIt, class ForwardIt, class BinOp>
void for_each_token(InputIt first, InputIt last,
                    ForwardIt s_first, ForwardIt s_last,
                    BinOp binary_op)
{
    while (first != last) {
        const auto pos = find_first_of(first, last, s_first, s_last);
        binary_op(first, pos);
        if (pos == last) break;
        first = next(pos);
    }
}

// 2 statements
void DoTristansWay(std::ofstream& cout, std::string str)
{
    constexpr char delims[] = " \n\t\r\f";
    for_each_token(cbegin(str), cend(str),
                   cbegin(delims), cend(delims),
                   [&cout] (auto first, auto second) {
                       if (first != second)
                           cout << string(first, second);
                   });
}

The full code is available in this gist.

This time, the results look like this:

Modified

As you can see, our algorithm is by far the fastest of all the options. The average runtime for the generic algorithm on my system is 220ms, against 285ms for the next best – that’s a 1.3x speed-up.

Not only that, but we’ve done it with just seven statements (using the metric from the original post), as opposed to 21 for the low-level version. 1.3x the performance with 1/3rd of the code? I’ll take that any day of the week. That’s the power of the STL.

Discussion

In the original post, Josh presents the “conventional wisdom” as being the following:

You shouldn’t roll your own code. The standard library was written by gurus and mere mortals won’t do better. Even if you do, you’ll write bugs, and you’ll end up spending more time fixing them than it’s worth.

This is all true, but I would put it slightly differently: I would say that if a standard library tool exists, then use it, but make sure you pick the right tool for the job.

In this case, despite the prevalence of suggestions on the internet, std::stringstream is the wrong tool to use for string splitting. The right tool is std::find_first_of. Once we use that, then not only do we get simple code, but it turns out to be much faster too.

The beauty of the STL is that it provides composable, low-level algorithms from which you can build up complex behavior. Sean Parent makes this argument far better than me; in his fantastic C++ Seasoning talk at Going Native 2013, he shows how you can use the STL algorithms in places where it’s not obvious. The video is very highly recommended.

Sean advocates that every C++ programmer should share at least one useful algorithm publicly per year. Hopefully I’ve now met my quota for 2016.