阅读视图

发现新文章,点击刷新页面。

我推荐的在线公开课

自从 2013 年以来,我参加了许多在线课程,在途中放弃的课程更多。总体上我认为在线课程比看书更加适合自学。

在这篇文章中,我整理了一份我觉得对我印象深刻的课程清单。我的选择无疑偏向于我的个人兴趣,但我希望你也能觉得它们有用。

需要注意的是,我推荐的一些课程并非慕课(大规模开放在线课堂)而是普通的大学课程。这些课程可能在某些年份将课程资料上传至网上,但之后就不再公开。因此,您可能需要花一些时间来搜索这些课程材料。

另外一点是我这里推荐的课程不一定有中文字幕,在这里对英语不好的朋友们表示抱歉。

学习方法论

  • Learning How to Learn — 我觉得学习能力是一种大多数人,包括我自己在内,都缺乏的能力。一些人可能会觉得这个课程提供的建议都是常识。但是问题是你在生活中往往可以听到各种自相矛盾的“常识”,而我们缺乏区别有效建议与有害建议的能力。这门课恰恰用科学来试图补足这个问题。

数学

微积分

线性代数

离散数学

计算机科学以及编程

入门

  • 斯坦福大学的CS106A106B、以及106L — 我以前是通过自学这系列课程入门的计算机科学。我当年学习的 2008 年版本以现在的标准来看已经过时了,但是在网上还可以找到这个课程的更新版。

计算机图形学和 GPU 编程

编程语言与编译器

系统编程

其他

心理学

物理

如何找到好的公开课?

我的观察是,传统的大学课程(例如来自 MIT OCW 的课程)平均质量比慕课更高。不幸地是,大多数网上可以找到的大学课程缺乏讲座视频。另外,尽管视频课程不可或缺,辅助材料和作业也同等重要。

在慕课中个人认为CourseraEdx仍是相对比较靠谱的平台,而其他平台的课程质量更加参次不齐一些。

另一个建议是:对于那些有评论的平台上的课程,花点时间阅读 1 星和 2 星评价。虽然这些评论大多都是胡说八道,但如果你看了感觉不大对劲,那也许这门课程不适合你。同样的方法也可以用来评估一本书是否值得阅读。

个人不推荐的一些高分课程

与大学课程类似,我参加过的大多数在线课程都让人感到失望。绝大多数平庸的课程我都不大有印象了,以下是一些我至今仍然记忆犹新的一些我认为特别有问题的课程。

  • 卫斯理安大学的社会心里学 — 这门课讲了许多许多夸大其次的说法和现在被证明错误的理论,例如“电子游戏导致暴力”。它还引用了诸如斯坦福监狱实验和米尔格拉姆实验之类的具有争议的研究。

  • 爱丁堡大学的乐理基础 — 课程简单过头,然后考试完全超纲。

小提示:在函数和变量名称中使用 from 而非 to

今天我写到了涉及多个Map的代码片段。与许多其他人一样,我通常将这些Map命名为 x_to_y

let mut value_to_var: HashMap<Value, (String, usize)> = HashMap::new();
let mut var_to_num: HashMap<String, usize> = HashMap::new();
let mut num_to_canonical_var = vec![];

当我需要使用这些Map时,我的代码通常看起来像这样:

let canonical_variable = num_to_canonical_var[var_to_num.get(var).unwrap()];

正如你所见,这个顺序并不特别直观。这段代码执行了从 variable 到 number 再到 canonical var 的转换,但理解它需要从内到外阅读,类似于解开螺旋一样。

从内到外阅读很具有挑战性。例如,尽管我有将近十年的 C 和 C++ 经验,但有时我仍然很难解读 C 声明语法。其他人也经常有类似的感受

如果我将所有的 to 改为 from,会怎么样呢?

let mut var_from_value: HashMap<Value, (String, usize)> = HashMap::new();
let mut num_from_var: HashMap<String, usize> = HashMap::new();
let mut canonical_var_from_num = vec![];

现在,我们可以这样写代码:

let canonical_variable = canonical_var_from_num[num_from_var.get(var).unwrap()];

这段代码更容易理解,因为相关的名称在位置上更加靠近。同样地,我们可以通过从右到左阅读来阅读来理解这段代码所进行的转换,而无需进行“陀螺式阅读。”

相同的想法也可以应用于函数。例如,let shader = shader_from_file(file)let shader = file_to_shader(file) 更易读。考虑到函数(至少纯函数)和Map数据结构都旨在创建数据之间的映射,它们在命名上的相似自然并非巧合。

更新 2024-02-10:Mastodon 上的几位用户指出,他们更喜欢将他们的映射变量命名为 x_by_y,这与我在这里提出的 x_from_y 异曲同工。

更新 2024-02-11:Mastodon 上的另一位用户提到 Joel Spolsky 写过类似的建议。这篇文章还对作者推崇的“应用匈牙利命名法” (Apps Hungarian)与大多数人熟悉的更繁琐的“系统匈牙利命名法”(System Hungarian)进行了比较。

Vulkan动态渲染(dynamic rendering)教程

动态渲染(dynamic rendering)两个月前被刚刚推出的一个新的Vulkan扩展。 有了它,我们在Vulkan中可以省去创建渲染通道对象(VkRenderPass)以及帧缓冲存储器对象(VkFramebuffer)的代码。

在动态渲染被推出以前,为了写一个Vulkan渲染器,我们总是需要创建渲染通道对象。 渲染通道的API并不容易使用,我们也通常不需要使用多个子通道(subpasses)或者是多个输入附件(input attachments)。 相比之下,DirectX 12 API将渲染通道作为一个可选项。只有当程序员需要对某些特定移动设备进行优化时才会使用。

最近,我开始在Rust中使用ash crate从头开始编写一个新的Vulkan渲染器, 我也很自然地想要试试这个新的动态渲染扩展。 现在关于这个扩展的在线资源还很少,也没有关于使用它的教程。 我最后选择直接阅读动态渲染扩展的技术规范

我写这篇教程,希望能够让这个扩展被更多的人所了解与掌握。 为了使文章有更广泛的受众,我使用了Vulkan C API而不是Rust binding来写这篇教程。 我使用的Rust ash crate和Vulkan C API有非常明确的一一对应关系, 但是尽管如此,如果我在“翻译”Rust代码到C++的过程中犯了什么错误,请联系我让我知道。

初始化动态渲染

VK_KHR_dynamic_rendering是一个设备(device)扩展,所以我们在创建设备时需要把它和其他诸如VK_KHR_swapchain的设备扩展一起开启。

检查在你的GPU是否支持动态渲染扩展

像所有其他扩展一样,我们可以通过vkEnumerateDeviceExtensionProperties检查我们的物理设备是否支持VK_KHR_dynamic_rendering。 如果我们从vkEnumerateDeviceExtensionProperties得到的结果不包含VK_KHR_dynamic_rendering,我们需要更新显卡驱动程序以及Vulkan SDK和运行时库(runtime)

注意:在我写这篇文章的时候(2021年1月),VK_KHR_dynamic_rendering才刚刚被发表没多久,所以有可能你显卡上的最新驱动仍然不支持它。 当我写这篇文章时,我甚至需要为我的Nvidia显卡安装一个“Vulkan测试版驱动”,不过现在已经不需要了。

加载动态渲染扩展

现在,在我们创建逻辑设备之前,我们需要将VK_KHR_dynamic_rendering添加到我们的扩展列表中:

std::vector<const char*> device_extensions = {
  // ...,
  "VK_KHR_dynamic_rendering"
};

此外,动态渲染被藏在一个特性切换后面,所以我们需要创建一个VkPhysicalDeviceDynamicRenderingFeaturesKHR结构,然后在创建逻辑设备时将其传递给pNext链表:

constexpr VkPhysicalDeviceDynamicRenderingFeaturesKHR dynamic_rendering_feature {
    .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DYNAMIC_RENDERING_FEATURES_KHR,
    .dynamicRendering = VK_TRUE,
};

const VkDeviceCreateInfo device_create_info = {
    .sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO,
    .pNext = &dynamic_rendering_feature,
    // ...
    .enabledExtensionCount = static_cast<unsigned int>(device_extensions.size()),
    .ppEnabledExtensionNames = device_extensions.data(),
};

在命令缓冲区(command buffer)中使用动态渲染

在你的Vulkan渲染器中,你的命令缓冲区录制中很可能有类似以下的代码:

VK_CHECK(vkBeginCommandBuffer(command_buffer, &command_buffer_begin_info));

VkRenderPassBeginInfo render_pass_begin_info = {
    // ...
};

vkCmdBeginRenderPass(command_buffer, &render_pass_begin_info, VK_SUBPASS_CONTENTS_INLINE);

// Draw calls here

vkCmdEndRenderPass(command_buffer);

VK_CHECK(vkEndCommandBuffer(command_buffer));

有了动态渲染,我们需要替换VkRenderPassBeginInfo结构以及vkCmdBeginRenderPassvkCmdEndRenderPass调用。 我们使用VkRenderingInfoKHR结构来取代VkRenderPassBeginInfo,它看起来如下:

typedef struct VkRenderingInfoKHR {
    VkStructureType                        sType;
    const void*                            pNext;
    VkRenderingFlagsKHR                    flags;
    VkRect2D                               renderArea;
    uint32_t                               layerCount;
    uint32_t                               viewMask;
    uint32_t                               colorAttachmentCount;
    const VkRenderingAttachmentInfoKHR*    pColorAttachments;
    const VkRenderingAttachmentInfoKHR*    pDepthAttachment;
    const VkRenderingAttachmentInfoKHR*    pStencilAttachment;
} VkRenderingInfoKHR;

你可以看到例如renderArea的某一些字段是在之前被提供给VkRenderPassBeginInfo。 不过这个结构体所需大部分信息的来源都是原用于创建渲染通道的。 尤其是,我们有这个新的VkRenderingAttachmentInfoKHR结构体用以取代VkAttachmentDescription来描述附件(attachment):

typedef struct VkRenderingAttachmentInfoKHR {
    VkStructureType          sType;
    const void*              pNext;
    VkImageView              imageView;
    VkImageLayout            imageLayout;
    VkResolveModeFlagBits    resolveMode;
    VkImageView              resolveImageView;
    VkImageLayout            resolveImageLayout;
    VkAttachmentLoadOp       loadOp;
    VkAttachmentStoreOp      storeOp;
    VkClearValue             clearValue;
} VkRenderingAttachmentInfoKHR;

现在我们可以使用上述结构体来取代我们之前的渲染通道相关代码。 这一变化确实意味着我们将在命令缓冲区的录制中写更多的代码,但这是因为我们将一部分原来放在渲染通道创建的代码移到了这里:

VK_CHECK(vkBeginCommandBuffer(command_buffer, &command_buffer_begin_info));

const VkRenderingAttachmentInfoKHR color_attachment_info {
    .sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO_KHR,
    .imageView = swapchain_image_views_[swapchain_image_index_],
    .imageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL_KHR,
    .loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR,
    .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
    .clearValue = clear_value,
};

const VkRenderingInfoKHR render_info {
    .sType = VK_STRUCTURE_TYPE_RENDERING_INFO_KHR,
    .renderArea = render_area,
    .layer_count = 1,
    .colorAttachmentCount = 1,
    .pColorAttachments = &color_attachment_info,
};

vkCmdBeginRenderingKHR(command_buffer, &render_info);

// Draw calls here

vkCmdEndRenderingKHR(command_buffer);

VK_CHECK(vkEndCommandBuffer(command_buffer));

管线(Pipeline)创建

现在,我们可以删除所有创建渲染通道(render pass)以及帧缓冲存储器(framebuffer)的代码了! 然后,在创建渲染管线时,我们不用向管线提供相应的渲染通道参数,但我们需要提供一个VkPipelineRenderingCreateInfoKHR对象来描述附件信息:

const VkPipelineRenderingCreateInfoKHR pipeline_rendering_create_info {
    .sType = VK_STRUCTURE_TYPE_PIPELINE_RENDERING_CREATE_INFO_KHR
    .colorAttachmentCount = 1,
    .pColorAttachmentFormats = &swapchain_image_format_,
};

const VkGraphicsPipelineCreateInfo pipeline_create_info {
  // ...
  .pNext = &pipeline_rendering_create_info,
  // ...
  .renderPass = nullptr, // We no longer need a render pass
  // ...
};

图像布局转换 (Image layout transition)

如果一切都那么简单,我就会对这个扩展非常满意。 但是事实上render pass对象的确为我们做了一些有用的事。

在我们现在的代码下,每帧验证层(validation layer)都会产生如下的警报:

pSwapchains[0] images passed to present must be in layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR or VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR but is in VK_IMAGE_LAYOUT_UNDEFINED. The Vulkan spec states: Each element of pImageIndices must be the index of a presentable image acquired from the swapchain specified by the corresponding element of the pSwapchains array, and the presented image subresource must be in the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout at the time the operation is executed on a VkDevice (https://github.com/KhronosGroup/Vulkan-Docs/search?q=VUID-VkPresentInfoKHR-pImageIndices-01296)

验证层的意思是,我们现在的交换链(swapchain)图像处于VK_IMAGE_LAYOUT_UNDEFINED布局,但是为了展示图像,它的布局必须为VK_IMAGE_LAYOUT_PRESENT_SRC_KHR或者是VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR

我们可以手动在展示前将交换链图像的布局转换转化至VK_IMAGE_LAYOUT_PRESENT_SRC_KHR

// draw calls here

vkCmdEndRenderingKHR(command_buffer);

const VkImageMemoryBarrier image_memory_barrier {
    .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
    .srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
    .oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
    .newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
    .image = swapchain_images[swapchain_image_index_],
    .subresourceRange = {
      .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
      .baseMipLevel = 0,
      .levelCount = 1,
      .baseArrayLayer = 0,
      .layerCount = 1,
    }
};

vkCmdPipelineBarrier(
    command_buffer,
    VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,  // srcStageMask
    BOTTOM_OF_PIPE, // dstStageMask
    0,
    0,
    nullptr,
    0,
    nullptr,
    1, // imageMemoryBarrierCount
    &image_memory_barrier // pImageMemoryBarriers
);

VK_CHECK(vkEndCommandBuffer(command_buffer));

但是现在我们在渲染下一帧前必须将交换链图像从VK_IMAGE_LAYOUT_PRESENT_SRC_KHR的布局转换回VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL

VK_CHECK(vkBeginCommandBuffer(command_buffer, &command_buffer_begin_info));

const VkImageMemoryBarrier image_memory_barrier {
    .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
    .dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
    .oldLayout = VK_IMAGE_LAYOUT_UNDEFINED,
    .newLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
    .image = swapchain_images[swapchain_image_index_],
    .subresourceRange = {
      .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
      .baseMipLevel = 0,
      .levelCount = 1,
      .baseArrayLayer = 0,
      .layerCount = 1,
    }
};

vkCmdPipelineBarrier(
    command_buffer,
    VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,  // srcStageMask
    VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, // dstStageMask
    0,
    0,
    nullptr,
    0,
    nullptr,
    1, // imageMemoryBarrierCount
    &image_memory_barrier // pImageMemoryBarriers
);

// begin dynamic rendering here

// draw calls

几乎所有的Vulkan渲染器都有辅助函数来简化这些图像布局转换代码,但即便如此要指定所有的图像转换参数还是挺麻烦的。 对于深度缓冲(depth buffer)以及模版缓冲(stencil buffer),我们也需要进行类似的图像布局转换。

最后的话

在这个简单的情况下,动态渲染扩展似乎和创建渲染通道以及帧缓冲存储器对象一样繁琐。 不过在multi-pass渲染中,因为同步(Synchronization)的复杂度,动态渲染可能会变得更有价值。 Khronos也可能在未来将动态渲染扩展改善地更容易使用。

致谢

特别感谢我的朋友Charles Giessen对这篇文章的校对和编辑!

除此之外,在这篇文章发布后,许多资深图形程序员也提供了宝贵的见解和反馈。

用Typescript来实现中英文博客

各位2022年新年快乐! 今天我想谈谈与我大多数博客文章不同的东西:我是如何用Typescript来实现我的双语博客的。

自从我在2015年创建这个博客以来,我一直想把它变成中英文的来吸引更多的国内受众,而我终于在2019年底终于实现了这一点。 我的博客的国际化实现可能与大多数人不同,因为我没有使用任何例如i18next的第三方库, 而主要依靠Typescript强大的类型系统。

我的方法可能不是最“正确”与可扩展的方式, 但是我认为对于个人博客网站来说,我的方法是一个非常合适的方案。 它提供了几个重要的优势:

  • Typescript的类型系统保证了我不会忘记翻译任何一个条目
  • 我可以简单地对不同语言的界面采用不同的排版
  • 我不需要单单为了我的博客网站而学习使用一个国际化库

因此,如果你想创建一个多语种的个人网站,我建议你使用类似的方法。

我的博客使用了GatsbyJS静态网页生成器。 静态网页生成器可以在模板的帮助下将Markdown文件转换为Html网页1

对于博客文章来说,我只需要为每篇文章每种语言都准备单独的markdown文件就可以了。 比如说,你现在所看到中文文字以及这篇文章的英文版就会被储存在不同markdown文件中。 另一方面,在模板UI中仍然有很多文字需要翻译, 例如我在右边栏的自我介绍、不同的菜单选项、以及博客文章的标签。

GatsbyJS使用Javascript作为模板的语言, 所有GatsbyJS的模板都是React组件。 我选择了可以转译为Javascript的Typescript作为我这个博客主要的开发语言。 因此,我很自然地为国际化问题开发了一个Typescript的解决方案,而Gatsby会把所有的React组件以及翻译逻辑都会自动变为静态的HTML。 反过来说,假设你使用一个使用Python的静态网站生成器,那么在理想情况下你应该在Python中实现国际化,这样在载入你的网站时无需再动态加载翻译。

我把大部分国际化逻辑的实现都放在了translation.tsx文件中:

首先,我使用一个 en 对象来存储所有英文翻译条目。

const en = {
  ai: "AI",
  algorithms: "Algorithms",
  archive: "Archive",
  ...
};

由于en只是一个普通的对象,我在其中可以储存任意的数据作为条目,例如jsx对象甚至是函数:

  all_n_posts: (n: number) => (
    <>
      All <Link to={`/en/archive`}>{n} posts</Link>
    </>
  ),

有了en这个对象的定义,我们可以通过typeof操作符来得到它的类型:

export type Translations = typeof en;

大多数编程语言都不具备类似于typeof的这种反射能力。有了typeof,我们就不要自己显式定义 en 对象的类型了。

现在有了Translations类型,我们可以以此类型来建立一个zh对象来存储中文:

const zh: Translations = {
  ai: "AI",
  algorithms: "算法",
  archive: "博文目录",
  ...
};

这样类型系统会确保我不会忘记翻译任何条目。

然后,我们可以把所有语言的翻译都放到一个对象中。 在Javascript组件中,我们会使用这个对象来查询特定的翻译条目:

export const translations = {
  en: en,
  zh: zh,
};

然后我们使用keyof操作符来获得包含各种语言的联合类型(union type)。 在我的情况下我会得到"en" | "zh"keyof可以说是另一个Typescript中非常有意思的反射特性。 由于 keyof 期望的是一个类型而不是一个对象,我们需要在使用 keyof 之前加上另一个 typeof 操作符:

export type Language = keyof typeof translations;

每当我需要显式提供当前语言变量的类型,比如说在将当前语言作为参数传递时,我就会使用的Language上述类型。

最后,我们使用Object.keys来获得一个语言列表,通过此我们可以循环遍历所有的语言:

export const languages = Object.keys(translations) as Language[];

我的这个博客只有两种语言,我也不懂除了中英文外的其他语言。 但在我的代码中,除了将英语作为“默认”语言外,并没有对特定语言进行硬编码。 因此如果我需要的话,我可以很简单地为这个网站加上第三个语言: 我只需要定义另外一个类型是Translations的对象并且把它加入到translations中。

我们需要将页面的当前语言传递给它的组件来使用翻译。 之后我们可以在我需要翻译的地方使用translations[lang]["entry"](用具体需要的条目名来替换entry)。 同样的方法也适用于函数。我只需要用类似于translations[lang]["all_n_posts"](n)的方式来调用函数就可以了。

这样我就实现了整个国际化的逻辑! 要添加新的翻译,我只需要向enzh对象中添加相应的翻译条目。 当然,维护一个双语博客最具挑战性的部分始终是翻译实际的博客文章。 而我并不能说我在把博文译回中文这方面做了非常好的工作。 但我希望至少我在技术方面的做法可以提供一些启发。


  1. GatsbyJS比较特殊,它并不完全按照我所说的静态网页生成器的方法工作。但你可以访问他们的网站了解更多。

C++标准库的小工具: std::align

今天我想来讲述一下C++标准库中的 std::align函数。 因其用途有限,它可能是C++标准库中最鲜为人知的函数之一。 在下文中,我将用arena allocator来作为使用 std::align 的例子。

Arena allocator

Arena allocator 可能是最简单的自定义内存管理策略。在一些文献中 arena allocator 也被叫为 bump allocator 或者 region-based allocator。 尽管它简单,arena allocator 是一种经常被使用的内存管理策略,以至于C++标准库中都有一个 arena allocator 的实现,虽然标准库叫它std::pmr::monotonic_buffer_resource

Arena allocator 的原理是我们先预分配一块很大的内存。 这一块内存既可以来自栈中,也可以是在堆中被 malloc 一次性分配。 之后当我们每次需要内存时,我们都从这块内存中分配一小块内存,而每次分配时我们所需要做的工作仅仅是增加一个指针的值。

Arena before allocation
图1 - 分配内存前的 Arena
Arena after allocation
图2 - 分配内存后的 Arena

Arena allocator 的速度非常快,尤其是和 malloc 这种原理复杂的函数相比较。 每次分配内存时我们只需要修改一个指针的值,而如果我们只分配可被平凡析构(trivially destructible)的对象,那么释放内存几乎是免费的。 如果我们释放内存时需要调用析构函数,情况会变得更加复杂,因为我们需要维护一个要销毁的对象列表,但我不会在本文中对此进行阐述。

Arena allocator 的缺点在于所有的内存只能被一齐释放,因为 arena allocator 不会记录每块单独的内存分配。 尽管如此,当我们有许多不同的对象需要分配时 arena allocator 仍然非常有用, 它在从编译器到游戏引擎的不同领域都有广泛应用。

一个arena allocator的精简实现

一下是一个对 arena allocator 的最简单的实现:

struct Arena {
  std::byte* ptr = 0;
  std::size_t size_remain = 0;

  [[nodiscard]] auto alloc(std::size_t size) noexcept -> void*
  {
    if (size_remain < size) return nullptr;
    
    auto* alloc_ptr = ptr;
    ptr += size;
    size_remain -= size;
    return alloc_ptr;
  }
};

我们也可以存储一个结束指针而不是 size_remain 并将 ptr + size 与结束指针进行比较, 不过这样做并不会和我现在的做法有什么特别的差异。

这个arena allocator的实现省略了arena allocator中大量非常实用的功能。 例如,我们不能重置我们的arena allocator并重用里面的内存。 但是在本文中我不会对这些功能进行进一步的展开讨论。

为了使用我们的 arena allocator ,我们首先从一个预先分配的缓冲区构建我们的 arena allocator 。 然后我们可以从 arena allocator 中分配内存并在分配的内存之上创建对象:

std::byte buffer[1000];
Arena arena {
  .ptr = buffer, 
  .size_remain = std::size(buffer)
};

auto* ptr = static_cast<std::uint8_t*>(arena.alloc(sizeof(std::uint8_t)));
ptr = new(ptr) std::uint8_t{42};
  
auto* ptr2 = static_cast<std::uint32_t*>(arena.alloc(sizeof(std::uint32_t)));
ptr2 = new(ptr2) std::uint32_t{1729};

因为我们的类型是整数,所以这里的位置布置 new(placement new)并不会进行任何实质上的操作, 但C++标准要求它们来启动一个对象的生存期(lifetime)。 如果我们没有布置 new 而直接做 *ptr = 42 之类的赋值,那么我们在理论上进行了未定义行为(undefined behavior)。

对齐(Alignment)

不幸的是,如上所示的 arena allocator 的实现是有问题的,因为我们现在的 alloc 所返回的指针并不一定符合我们所想要创建的对象的对齐要求。

在C++中,所有的类型以及对象都有对齐要求。 我们可以通过关键字 alignas 来手工设置一个对象的对齐要求,或者用关键字 alignof 来得到一个类型的对齐要求。

如果在一个未对齐的地址上启动一个对象的生存期,那么我们就碰到了未定义行为。 如果我们这么做的话,根据处理器构架的不同,也许我们会有非常慢得内存访问,或者甚至我们的程序可能会直接崩溃。

一般情况下我们C++程序员并不太关心对齐的问题,因为编译器自动会帮我们解决这些这些问题,同时类似于 malloc 的标准库函数也会自己妥善处理对齐问题。 但是当我们需要自定义内存管理策略时,对齐就突然间变得非常地重要了。

让我们来考虑一下之前的 arena allocator。 一开始我们的 arena allocator 是空的。 然后我们分配1个字节的内存并在其上构造一个 std::uint8_t,到目前为止一切都顺利。 但是接着我们再分配了4个字节,然后构造了 std::uint32_tstd::uint32_t 需要4个字节的对齐要求,但是我们分配这4个字节的位置正好离对齐点有1个字节的错位:

Arena after allocating one uint8_t and one uint32_t
图3 - Arena 在分配两块内存后的状态

改进的Arena allocator

在写把对齐要求也放进考虑的 arena allocator 之前, 我们先写一个辅助函数 align_forward。 它可以将给定的指针 ptr 向前移动到一个满足特定对齐要求 alignment 的地址:

[[nodiscard]] inline auto align_forward(std::byte* ptr, std::size_t alignment) noexcept
  -> std::byte*
{
  const auto addr = std::bit_cast<uintptr_t>(ptr);
  const auto aligned_addr = (addr + (alignment - 1)) & -alignment;
  return ptr + (aligned_addr - addr);
}

我们首先将我们的指针转换为一个整数,然后使用表达式 (addr + (alignment - 1)) & -alignment 将我们的地址四舍五入到对齐边界。

如果要理解这个表达式的含义,我们需要首先考虑负号-会对二进制整数的操作:它会把所有的位(bit)都倒过来,然后在结果上再加上1。

例如,假设我们的 alignment4,它在二进制中被表示为

0b00000100,

当我们加上负号时我们会获得-4,它在补码(two's complement)中会把表示为

0b11111100.

我省略了更高位的字节,但是你应该可以看出规律: -alignment正好是可以用于剔除未对齐地址的低位的位掩码(bit-mask)。

在最后我们需要把对齐的地址aligned_addr从整数变回指针。 我选择使用指针算术(pointer arithmetics)而非再使用转型运算符(std::bit_cast<std::byte*>(aligned_addr))。

有了这个函数,我们可以用它来帮助我们实现arena allocator:

struct Arena {
  std::byte* ptr = 0;
  std::size_t size_remain = 0;

  [[nodiscard]]
  auto aligned_alloc(std::size_t alignment, std::size_t size) noexcept -> void*
  {
    std::byte* aligned_ptr = align_forward(ptr, alignment);
    const size_t size_for_alignment = aligned_ptr - ptr;
    const size_t bump_size = size_for_alignment + size;
    if (size_remain < bump_size) return nullptr;

    ptr = aligned_ptr + size;
    size_remain -= bump_size;
    return aligned_ptr;
  }
};

请注意,我将函数名称从 alloc 更改为 aligned_alloc, 因为我们必须显式地将对齐要求 alignment 参数传递给该函数。 我们调用 align_forward 来使得我们的指针 ptr 能够正确对齐。 然后,我们计算一共需要多少字节(即用于对齐的字节数加上我们实际需要分配的大小 size)。 最后,如果我们有足够的空间来分配内存,我们即增加我们的指针ptr到分配后的位置,减少 arena allocator 中剩余的大小,并返回对齐后的指针。

当使用 aligned_alloc 时,我们需要传递对齐方式:

auto* ptr = static_cast<std::uint8_t*>(
  arena.aligned_alloc(alignof(std::uint8_t), sizeof(std::uint8_t)));
ptr = new(ptr) std::uint8_t{42};
  
auto* ptr2 = static_cast<std::uint32_t*>(
  arena.aligned_alloc(alignof(std::uint32_t), sizeof(std::uint32_t)));
ptr2 = new(ptr2) std::uint32_t{1729};

你可以看到arena allocator使用起来略有些麻烦。 但在实际应用中,我们可以用模板函数来封装对 aligned_alloc 的调用。 重要的是,我们分配的内存将正确对齐:

Alignment-aware arena after allocating one uint8_t and one uint32_t
图4 - 考虑对齐的 Arena 在分配两块内存后的状态

如果你仍然想要之前不需要显式提供对齐要求的 alloc 成员函数, 我们可以在其中调用 aligned_alloc并把对齐要求设置为std::max_align_t(一个平台上所以类型最大可能的对齐要求):

[[nodiscard]]
auto alloc(std::size_t size) noexcept -> void*
{
  return aligned_alloc(alignof(std::max_align_t), size);
}

这个版本的 alloc 总是返回与 std::max_align_t 一样严格对齐的指针。std::malloc同样满足这一要求。 如果我们分配许多对齐要求小于std::max_align_t的小对象,这种分配方式会浪费内存,但是它至少保证了对所有分配内存对齐要求的满足。

std::align

我在C语言项目中会使用和上示基本相同的arena allocator实现。 但是在C++中,通过标准库的帮助,我们还能做得更好。

std::align是一个在<memory>头文件中所定义的标准函数。它的接口如下:

namespace std {
  auto align(std::size_t alignment,
           std::size_t size,
           void*& ptr,
           std::size_t& space)
  -> void*;
}

因为有两个址传递的参数的关系,光光看函数声明,std::align 不是那么得容易理解。 但是它事实上所起到的目的和我们之前的 align_forward 函数非常接近。 前面两个参数 alignment 以及 size 就是我们传给 aligned_alloc 的两个参数, 而 ptr 以及 space 是我们的arena allocator的内部状态。

std::align 首先检查我们是否有足够的空间( space )来分配对齐调整后的 size 字节。 如果是这样,它会调整我们的指针 ptr 并对 space 减去用于对齐的字节数,然后它会返回对齐后的指针。

利用 std::align, 我们的代码会得到极大地精简:

struct Arena {
  void* ptr = 0;
  std::size_t size_remain = 0;
  
  [[nodiscard]]
  auto aligned_alloc(std::size_t alignment, std::size_t size) noexcept -> void*
  {
    void* res = std::align(alignment, size, ptr, size_remain);
    if (res) {
        ptr = static_cast<std::byte*>(res) + size;
        size_remain -= size;
        return res;
    }
    return nullptr;
  }
};

因为 std::align 以及提供了类似的功能,我们不需要我们自己的 align_forward 辅助函数了, 这样我们就不需要手写指针到整数的转换以及的难以理解的位操作了。 我们的 aligned_alloc 函数也变得几乎和我们一开始的 alloc 函数一样简单。

请注意,由于 std::align 仅将 ptr 增加到对齐边界,并将 size_remain 减少用于对齐的字节数,我们仍然需要根据实际分配内存的大小 size 来更改这两个变量。

我们的代码还有另一个小的变化:std::align 要求我们使用 void*,而我们之前的代码使用的是 std::byte*。 由于我们不再需要自己进行指针算数,因此使用 void* 也不会有任何影响,而且void* 也正好是 aligned_alloc 需要返回的类型。

结论

我不确定在自定义内存管理策略之外,std::align 还有多大的应用。 也许它还可以被用来模拟类似于灵活数组类型(flexible array member)的功能。 但是不管怎么说,我很感谢C++标准库提供了这个小小的工具函数。

Fun with Ternary Search(暂未翻译)

This year is my first year doing the Advent of Code challenge, and today (2021 Day 7)'s challenge is a fun one.

I won't go to the details, but the problem involves finding the minimum for a function. The function takes an integer and returns another integer. An interesting property of that function is that it has one "valley": Everything at the left of the global minimal point monotonically decreases. Everything at the right of the global minimal point monotonically increases.

You can think the function output as a bunch of integers like

100, 81, 56, 32, 16, 33, 44, 78, 129

And we want to find out the value 16.

Naively we can evaluate the function at every point in the domain and then find the minimum, and a slightly better way is to evaluate the function until we find where the result starts to increase. Both strategies requires O(n) time, but since our data are nicely "sorted," we can do better.

Ternary Search

Ternary Search, similar to binary search, expliots our data's pattern, and can achieve O(log n) asymptotic time. Wikipedia describe it as a technique for "finding the minimum or maximum of a unimodal function," which is exactly the function we want to solve. The basic idea is simple: if we partition our domain into three segment by two point: left and right, then we can evaluate the function at left and right and get several cases:

  • f(left) < f(right)
  • f(left) > f(right)
  • f(left) == f(right)

If f(left) < f(right), which means either both left and right points are greater than the position of the local minimum, or left is less than the position of local minimum and right is greater than the position of local minimum. In either case, we know that the local minimum is not at the right hand side of right, so we can discard that part of the domain.

If f(left) > f(right), similarly we can discard the left hand side of left. And if f(left) == f(right), we can discard both side and only keep the range [left, right].

We can equally trisect the domain into left and right, and then we can run the above process recursively or iteratively to solve the problem. We still need a terminate condition: since our left and right can be stuck if right - left <= 2, we stop there and then fall back to linear search. And we can have the following pseudocode:

var left = domain.min
var right = domain.max
while (right - left) > 3 {
  let left_third = left + (right - left) / 3
  let right_third = right - (right - left) / 3

  if f(left_third) < f(right_third) {
    right = right_third
  } else {
    left = left_third
  }
}

for i in left until right get the smallest f(i)

It is an elegent and fun algorithm, and I am suprised that today is the first time I hear about it. And hopefully now you also understand how and when to use this algorithm.

在C++中,不要不假思索地使用auto参数

从C++14开始,我们可以创建带auto参数的lambda表达式。 到了C++20,我们甚至可以在正常的函数中使用auto参数。 随着这一特性的出现, 在一些C++程序员开始流行了把所有的参数都使用auto的风气。 然而,我认为除非我们不得已,我们不应该使用auto参数。

为什么人们会喜欢它?

在某些时候写出具体的类型确实比较烦人,因此人们就会开始使用auto参数的风格。 当我们写拥有大量模板的泛型编程时,这可能确实是一个合理的理由。 但在很多时候,通过重构我们可以避免写很多"写得很烦"的类型。 我们甚至可以通过这样做获得更高的代码质量。

例如,我在网上找到了如下的代码。因为隐私因素,我对它略进行了一些修改。 的确,在如下代码中显式写出pair类型的参数确实过于扰人。

std::vector<std::pair<double, double>> pairs;

return std::accumulate(
  pairs.cbegin(), pairs.cend(), 0,
  [](auto acc, const auto& pair) {
      return acc + pair.first * pair.second;
});

同时,如果只阅读这一小段,我也完全不知道这段代码到底做了些什么因为一个pairfirst以及second值完全没有任何含义。 假如是我们把pairs的元素变为一个有名字的结构体呢?

struct Outcome {
  double probability = 0;
  double value = 0;
};

std::vector<Outcome> distribution;

return std::accumulate(
  distribution.cbegin(), distribution.cend(), 0,
  [](double acc, const Outcome& outcome) {
      return acc + outcome.probability * outcome.value;
});

突然间,我们很容易就可以看出这段代码在计算一个离散随机变量的期望值!

不幸的是,有些人不仅没有想办法给他们的变量更具体的类型, 而反而变得如此适应auto参数风格, 以至于他们即使在auto并不能使得代码更加简洁的地方也开始到处使用auto参数。

const std::vector<int> v1 = ...;
const std::vector<int> v2 = ...;
std::vector<int> smaller_ones;

std::ranges::transform(v1, v2, std::back_inserter(smaller_ones),
  [](auto x, auto y) { return std::min(x, y); });

auto参数会生成模板

在如ML或Rust的一些一些编程语言中, 类型系统可以通过定义推断出一个函数或lambda表达式的确切类型。 这些语言也有不同的类型注释(type annotation)语法从而使得这些语言的程序员习惯省略lambda表达式具体的参数类型。 在写过这些语言后,这些人回到了C++并且开始使用相同的代码规范。 但是,因为C++拥有模板、重载(overloading)、以及实参依赖查找(argument-dependent lookup)这些复杂特性,C++编译器无法实现这样的类型推断。 因此,当我们使用auto参数时,编译器会生成不受约束的模板。 例如,我们可以使用cppinsights网站来看编译器对[](auto x, auto y) { return x * y + 42; });表达式所做的变换:

class __lambda_5_2
  {
    public:
    template<class type_parameter_0_0, class type_parameter_0_1>
    inline /*constexpr */ auto operator()(type_parameter_0_0 x, type_parameter_0_1 y) const
    {
      return (x * y) + 42;
    }
    private:
    template<class type_parameter_0_0, class type_parameter_0_1>
    static inline auto __invoke(type_parameter_0_0 x, type_parameter_0_1 y)
    {
      return (x * y) + 42;
    }

  } __lambda_5_2{};

当我们开始写超过单行的lambda表达式时,这个问题就变得更加突出了 甚至当我们开始在C++20中为普通函数使用自动参数时,这个问题就更加突出了。

问题是,模板编程与平常编程的体验并不相同。 在模板编程中,类型错误比我们想要的更晚被发现。 而且我们的IDE自动补全以及错误检测功能在模板中一般都没有办法发挥最好的效果。 当我们开始写较长的lambda表达式时或者甚至我们开始把auto参数应用到普通函数时,这个问题就变得更加突出。

不受约束的模板是危险的

即使当我们需要模板时,通常对模板进行约束是一个更好的做法。 在一次讲话中,Bjarne Stroustrup(C++创始者)提到auto是最没用被约束的概念1

在没有约束的情况下,我们很容易错误得把不小心与接口相匹配的类型传给模板函数。 比方说,我们有一个三维的向量结构, 我们很自然地会想对它们进行点积(dot product):

struct Vec3 {
  float x = 0;
  float y = 0;
  float z = 0;
};

auto dot(auto v1, auto v2) {
  return v1.x * v2.x + v1.y * v2.y + v1.z * v2.z;
}

之后当我们想要增加四维的向量结构时,C++并不会阻止我们同样对这个向量结构使用之前为三维向量准备的dot函数:

struct Vec4 {
  float x = 0;
  float y = 0;
  float z = 0;
  float w = 0;
};

dot(Vec4{1, 2, 3, 4}, Vec4{1, 2, 3, 4}); // 我们想要30,但结果我们得到了14

The C++ Core Guidelines也提到了在高度可见的范围内使用不受约束的模板是非常危险的,尤其是考虑到实参依赖查找。2

明确的类型注释提供了文档价值

即使在上述的C++特有问题的其它语言中, 显式的参数类型同时起到了文档的目的。 并且,在重构过程中,显式的参数类型可以使得类型检查的工作更加得轻松。 这就是为什么在各种ML变种和以及Haskell中, 没有显式类型注释的顶级函数被认为是不好的风格。 而Rust规定所有的顶级函数都必须提供显式类型。

当我们在任何静态类型的编程语言中使用一个不熟悉的API时, 我们通常会通过类型注解来帮助理解一个函数做了些什么。 如果我们使用auto参数的话, 我们将不会给其他人或者未来的自己留下关于这些参数性质的提示。

结论

我们并不是总是能避免auto参数。 在C++20之前,没有办法可以为lambda表达式使用概念(Concept)或显式模板。 另外,在某些情况下,使用auto参数的便利性和生产力的提高可能超过了它的缺点。 然而,我认为其弊端严重到足以将自动参数视为一种代码异味(code smell)。 当遇到带有auto参数的代码时,我们应该总是问:"这里有可能使用一个具体的类型吗?" 如果不是这样的话,那么下一个问题就是,"这里有可能使用一个概念吗?"

Using default parameters to circumvent the type system is an anti-pattern(暂未翻译)

I am doing some peer programming for a university course project today. In our codebase, we have a Ship class like the following:

public class Ship {
    private final String name;
    private final int length;
    private int hitCount = 0;

    public Ship(String name, int length) {
        this.name = name;
        this.length = length;
    }

    // other methods
    ...
}

Later, the new requirement of the course project requires us to add another field, what we called captainsQuertersHealth, to the class, so we made the following change:

public class Ship {
    private final String name;
    private final int length;
    private int hitCount = 0;
    private final int captainsQuertersHealth;
    public Ship(String name, int length, int captainsQuertersHealth) {        this.name = name;
        this.length = length;
        this.captainsQuertersHealth captainsQuertersHealth;    }

    // other methods
    ...
}

However, we had tons of unit tests that test against the old interface and they don't care about the captainsQuertersHealth field. My partner suggested adding a default parameter. Since the course project is in Java, which doesn't support default parameters by default, he recommends adding an overloaded constructor:

public Ship(String name, int length) {
  this(name, length, 1);
}

This constructor delegates to the previous constructor and always default captainsQuertersHealth to one. It is certainly convenient. However, adding this constructor means that the compiler will not be able to generate a compile-time error at places where we use this constructor, and at those places, we do care about captainsQuertersHealth. As a result, it would postpone a compiler catchable bug to runtime.

I convinced my teammate that it was a bad idea. And our final solution was to add a helper in unit tests:

private Ship createShip(String name, int length) {
  return new Ship(new, length, 1);
}

I am not saying that you shouldn't use default parameters or function overloading, but do notice their shortcoming and use them judiciously.

C++中的std::function到底是什么,为什么我们需要它?

昨天,有人在#include<C++>的discord服务器上问了关于为社么我们需要std::function的问题。 下面是我对这个问题的回答。

即使参数类型以及返回类型都完全相同,C++中的可以被当作函数一样被调用的对象也可以有不同的类型

Lambda表达式可以被认为是定义有operator()的类的语法糖。比如说

int x = 3;
auto lambda = [x](int y) { return x + y; };

大体上等同于

struct __Lambda {
  int x;

  int operator()(int y) const {
    return x + y;
  }
};

int x = 3;
auto lambda = __Lambda { .x = x };

因此,每个lambda表达式都有一个独特的类型。例如,在下面的片段中,

int x, z;

auto lambda = [x](int y) { return x + y; };
auto lambda2 = [x, z](int y) { return x + y + z; };

尽管lambdalambda2都接收一个int并返回一个int,它们有着不一样的类型。

C++还有普通的函数, 它们又和实现了operator()的类不同。

std::function的需求

被当作函数一样被调用的对象

那么,我们如何存储一个接收一个int并返回一个int的一个可调用的对象,并且不考虑它的具体类型?

我们需要std::function来完成这样的任务。比如说:

struct S {
  std::function<int(int)> func;
};

以这种方式存储可调用程序的典型用例是一个task system。 你可能想在一个容器中存储回调,以便于以后执行:

struct TaskQueue {
  std::queue<std::function<void()>> queue;
  std::mutex mutex;
  std::condition_variable ready;

  // 我省略了各个成员函数的实现
  ...
};

类型擦除(Type Erasure)

为了使func同时接受lambdalambda2, std::function需要有可以接受任何符合要求的函数对象或普通函数的构造函数。 我们需要使用类型擦除来实现这种行为。

在C++中实现类型擦除的方法有很多,不过这对于这篇文章来说有些超纲。 大体上的想法是,std::function需要储存一个函数指针以及另外一些用于存放lambda捕获的空间。 因为lambda表达式(或函数对象)可以有任意大小的捕获,这些额外的数据需要在堆上分配。 不过,所有主要的std::function实现都会进行小缓冲区优化(small buffer optimization),如果你的lambda小到可以装入预定义的容量。 在这种情况下,所有的数据都可以直接在std::function对象本身内部分配,而不需要进行额外的堆分配。

这些资源帮助你深入学习C++

这些年来,很多人都向我寻求学习 C++的帮助。 我算不上什么 C++专家, 但是作为一个从事 C++多年的人, 我想在这分享一些高质量并且同时适合初学者的 C++资源。 希望这些资源对您有所帮助。

当有人问我有关使用 C++的指导时, 我总是首先问他们已有的编程经验经验。 有些人刚开始学习编程,并决定学习 C++作为他们的第一门编程语言; 有些人已经掌握了少量的 C++,并且想要学习更多; 而有些人已经使用了其他语言编程多年,然后尝试学习一些 C++。 因为不同的人有不同的背景以及不同的学习目标, 所以我会推荐一些不同的材料。

不过,我想提到的一件事是,仅仅阅读书籍或观看视频并不是学习的最佳策略。 无论您处于什么阶段,把学到的知识用到实践中都非常重要,因此开始进行一些编程项目会对你的学习很有帮助。

另外一件我想提到的是, 我在这里推荐到资源基本上都是英文资源。 我强烈建议您试图通过英文资源来学习, 因为您只通过中文来学习编程, 那么您将失去使用绝大多是好的学习资料的机会。

如果我刚刚开始学习编程并选择 C++作为我的第一门编程语言,我该怎么做?

对于编程初学者来说,我推荐 Bjarne Stroustrup(C++之父)的《C++程序设计:原理与实践》第二版(Programming: Principles and Practice Using C++ 2nd edition)。 这本书有中文翻译,不过就像我之前说的,如果您有一点的英文阅读能力,我建议您阅读原版。 因为这本书很厚,所以您不一定能够坚持看完整本,但是无论您看了多少页你都能学到东西。

如果您不想要看书, C++专家 Kate Gregory 在 Pluralsight 网站上提供了不少的视频教程。 其中她的入门教程是Learn to Program with C++。 如果你加入#include<c++> discord 服务器, 你可以在服务器内直接为她要一份试用码。

如果我以前已经学习过一些 C++并且想更深入地学习,我该怎么做?

也许您已经从大学数据结构课程中使用过一些 C++, 又或者您学习了一些使用 C++的在线教程, 接下来该做什么?

根据我的个人经历以及我所听闻的, 大多数大学编程课程或那些在线教程的质量都偏低,而且讲师通常对 C++一知半解。 您可能会被之前的学习资源所误导,并且学习到了一些错误的实践或者是对概念的误解。 因此,选择正确的学习资料是对高效学习十分重要的一点。

在这种情况下,我同样会推荐 Bjarne Stroustrup 的《C++程序设计:原理与实践》第二版。 你可以看书看得比纯粹初学者更快一些,不过使用该书来系统地查漏补缺依然很有好处。 如果您更喜欢视频教程, 可以从 Kate Gregory 的C++ Fundamentals Including C++17开始。

如果我是另一门语言的资深人士并想学习 C++,该怎么办?

如果您已经精通了某个其他的编程语言,并且想开始学习一些 C++, 您可以直接选择更加进阶的材料。

对于书来说, 我建议阅读 Bjarne Stroustrup 的《C++程序设计语言》第四版(The C++ Programming Language (4th Edition))。 这本书是我读过的最好的技术书籍之一。 不过这本书也同样相当得厚。如果您没有时间阅读该书并且想要有一个简短的 C++介绍, 您可以购买《A Tour of C++》第二版。

我认为我对 C++有一定的了解了。 下一步是什么?

如果您花了数月的时间学习上述资料, 并觉得您对 C++基本概念有相当的了解。 接下来该做什么?

如果您达到了这个阶段,那么您应该对下列的多数话题有相当的熟悉程度:

  • 如何正确使用const
  • 模板(templates)
  • 引用(references)以及指针(pointers)
  • 对标准库的熟练使用,尤其是迭代器(iterators)以及标准算法(algorithms)
  • RAII
  • 析构函数(destructor)
  • 复制/移动构造函数以及复制/移动赋值运算符
  • 移动语义(move semantics)
  • 运算符重载(operator overloading)
  • lambda 表达式以及函数对象
  • 未定义行为(undefined behaviors)

如果你已经到了这个阶段, 那么为 C++找到实际用途或许比学习 C++语言本身更重要了。 C++被用于许多不同的用途, 而您也可以开始考虑如何把 C++应用在您感兴趣的领域上。

同样,现在是学习 C++生态系统的好时机,您可以花一些时间来深入学习例如Catch2等单元测试库,CMake等构建系统, 以及Conan等包管理器。

另外一个可以考虑的事是开始学习另一门编程语言, 尤其是如果您目前仅了解 C++一门语言。 下一个不错的选择是与 C++截然不同的语言,例如 Javascript,Python 或 Lisp 等动态类型的语言。

话虽这么说,仍然有无尽得关于 C++语言本身的知识可以学习。 我将尝试在以下列出一些我喜欢的资源:

书籍

如果你仍然没有阅读《C++程序设计语言》第四版(The C++ Programming Language (4th Edition)的话, 这本书仍然是一个非常好的选择。除此之外,我还有一些其他的书可以推荐:

  • Scott Mayer 的《Effective Modern C++》
  • Jason Turner 的《C++ Best Practices》
  • Nicolai M. Josuttis 的《C++17 - The Complete Guide》

还有一些书籍会关注于某些特定的方向,例如:

  • David Vandevoorde、Nicolai M. Josuttis、以及 Douglas Gregor 的《C++ Templates - The Complete Guide, 2nd Edition》
  • Arthur O'Dwyer 的《Mastering the C++17 STL》
  • Ivan Čukić 的《Functional Programming in C++》
  • Anthony Williams 的《C++ Concurrency in Action, 2nd edition》

大会讲话视频

大会讲话同样是学习 C++的绝佳资源。下列是一些我个人喜欢并且适合初学者的讲话:

社群

加入编程社群有非常多的好处,你可以向专家提问,知道他人的动态,讨论有关工作的信息,甚至交到一些好朋友。

#include<C++>

#include<C++>是一个非常不错的 C++社群, 它提供了一个友好的讨论环境,并且你在里面可以找到多个在 C++界知名的人物。 您可以加入它的 discord 服务器并且和大家一起讨论 C++。

见面会(Meetups)

加入North Denver Metro C++ Meetup是我大学阶段做出的最好决定之一。如果您有时间的话,参加一些本地的 C++见面会是一个非常不错的选择。(因为新冠的原因,现在绝大多数见面会以及大会都在网上召开,这有利有弊,但一个很大的优势是您现在可以参加全球的见面会)您可以在meetup.com网站上搜索本地的见面会。

参加大会

如果您认真对待 C++,那么大会是结识志趣相投的人的好地方。 这是我知道的一些重复举办的 C++会议:

除此之外,ISO C++网站上有一个大会列表

播客

网上有不少 C++的播客,尤其是 2020 年有不少新的播客涌现。当然,所有的这些播客都需要较好的英语听力水平:

博客

我推荐使用 RSS 来关注各种技术博客。 我个人关注超过 200 个关于 C++或者其他技术话题的博客, 下列是一些我个人认为最好的 C++博客:

需要注意的是某些博文会讨论非常高深的话题,因此您并不一定需要读懂每一篇博文。

其他资料

下面是一些其他有用的 C++资源:

  • cppreference是最好的 C++语言以及标准库 API 文档网站
  • Compiler Explorer一个在线编码环境,支持 ++和许多其他语言。 它可以编译后的汇编码以及运行程序。
  • Quick C++ benchmark是一个可以快速对 C++代码进行测速的网站。

引用以及扩展阅读

Summary of reading: October - December 2020(暂未翻译)

I read almost nothing for a few months after the lockdown, but I started to pick up reading more for the last couple of months.

  • "C++ Best Practices" by Jason Turner — Buying Jason's book is a no-brainer for me considering I started to watch his C++ Weekly in 2016, and he was one of the people who inspired me to delve into C++ at that time. I particularly enjoy the chapter "25. Avoid default In switch Statements," which is a great practice not often mentioned, and "47. Fuzzing and Mutating," which provide concrete instructions on setting up fuzzing and mutating test.

  • "Effective C: An Introduction to Professional C Programming" by Robert C. Seacord — I love this book, and will recommend all C people, not just beginners, to read it. It is really easy to make mistakes when writing C code or using C APIs, and this book tries to mitigate the problem and teach best practices for writing secure C code. Since most commonly recommended C books are decades old, Effective C is a rare book covering up-to-date C standards and practices. Robert certainly knows about both the standard and modern techniques very well.

  • "Elm in Action" by Richard Feldman — This book introduces the Elm programming language from scratch by building a simple frontend application incrementally through chapters. In each chapter, "your boss" gives you more requirements, and the book introduces language features to fulfill the requirements. Even though I used Elm to build a few games before, I still find this book enjoyable as there are many practical Jewels in this book on building production web applications. The sections about inter-oping with Javascript by custom elements (instead of ports) and handling routings for single-page applications are particularly enlighting for me.

  • "Automata and Computability" by Dexter C. Kozen is a textbook I used in my Theory of Computation class. It is more like a course note than a traditional textbook, where topics are split into "lessons." I enjoy the writing style of this book.

  • "Analysis I: Third Edition" by Terence Tao — this is the textbook used for our university' mathematical analysis course. It is a solid read, and the points are conveyed clearly. I also found that I am quite interested in the topic of analysis.

  • "How to Take Smart Notes" by Sönke Ahrens: This book is recommended in the talk on "org-mode for non-programmers" by Noorah Alhasan in Emscs-SF meetup. I am positively surprised by this book. My expectation of "self-help" books is full of platitudes with little insights. Yet this book was one of the most profound books I read this year. And I immediately put the slip-box method described in the book into practice on this same book and other things I learned. The downside of this book is that it does not spend enough time on "How to take smart note," as the title suggests, but instead repeats a lot on "why." Nevertheless, these characteristics are pretty common in this kind of book.

Reread:

In progress:

Factual errors in "These Modern Programming Languages Will Make You Suffer", and why it is a suffer to read(暂未翻译)

Today I stumble upon an article These Modern Programming Languages Will Make You Suffer after Twitter outrage. The post is absurd and indeed a suffer to read for me. However, it also receives 1k+ medium claps at the time of writing, and I cannot stay silent.

In essence, this article tries to promote functional languages and list their advantages. As an FP fanboy myself, I love content that encourages the usage of functional programming. However, this article is so biased and full of factual errors, to the degree that it only shows the author's lack of understanding in both the languages they hate and the languages they try to promote. And I am not even surprised to find that the author was behind another notorious medium clickbait, Object-Oriented Programming - The Trillion Dollar Disaster.

I will not focus on this article's opinions. Various Twitter shitposts sometimes have more extreme views than this article. Also, it is hard to criticize objectively on buzzwords such as "bad" or "a mess." Instead, let's focus on the misleading factual errors. Though I am sure that there are still many more factual errors in the section I missed or in languages where I don't have experience.

Pure functions

Functions that do not mutate(change) any state are called pure

Pure functions are deterministic and have no side-effect. "Do not mutate" is way not enough to make a function "pure."

Surprisingly, the author has a correct description of pure function later in the post, and similar discrepancy also happened more than once, which made me wonder whether a large chunk of the article is "borrowed" from elsewhere.

C++

C++ is the perfect punch bag for a lot of reasons, but still, you shouldn't bash on a language if you have no understanding of it.

The developers have to worry about manually releasing and allocating memory.

Do you know what RAII is? And have you ever used C++ or Rust before? The same argument can go to the author's rant on the lack of GC in Rust.

has only rudimentary concurrency mechanisms

Let me reply with a tweet from Bryce Lelbach.

Almost everything in this article is patently wrong, like the claim that "[C++] only rudimentary concurrency mechanisms". C++ has a formally verified memory and execution model, which is the defacto industry standard, and a rich library of concurrency primitives. https://t.co/idtcg7gRCy

— Bryce Adelstein Lelbach (@blelbach) December 8, 2020

In C++, all references are nullable.

In C++, no references are nullable 😉.

JAVA & C#

since the early versions of C# were a Microsoft implementation of Java

C# was an imitation of Java. But it was a new language and never intended as an implementation of Java.

In particular, C# claims to support functional programming. I must disagree... What functional features should a language have? At the very least, built-in support for immutable data structures, pattern matching, pipe operator for function composition, Algebraic Datatypes.

Those are all great features, but none of them are the essence of functional programming. The first functional language, Lisp, supports none of those features.

By the way, C# does support pattern matching. 1 The author seems to acknowledge this fact earlier and forget later, again made me wonder whether some part of the post is "borrowed" from elsewhere.

rudimentary concurrency support

C# is the language that makes the async/await paradigm popular.

In C#, all references are nullable.

Except that there are nullable-references support and references can be made not-null by default.

Python

Language family: C

What does "The C family languages" even mean? Languages share a syntax resemble C? And how does Python suddenly become a C-family language?

Python is an interpreted language

"Interpreted language" is a common buzzword in this field without a clear definition. Instead of the language itself, a language implementation decides whether it is "interpreted" or "compiled." Plus, there are a lot of middle-ground between an ahead-of-time compiler and a tree-walk interpreter, and most language implementations these days are in the middle ground.

Python is also pretty slow to start up

A Python VM usually boots up in less than 100ms.

Rust

Rust also suffers from a lot of unfair ranting for its "low productivity" in this article, and to be honest all the criticism for Rust in this article looks like from a quick Google search.

The runtime performance of Rust programs is a little faster than Go.

You can't compare the runtime performance of programming languages without context like that. There are a lot of performance design tradeoff, and a language that runs faster in one circumstance is possible to run slower in another.

The first language on our list with a modern null alternative

C++ has std::optional2 and Java has Optional3.

Developers have to worry about things like boxing and pinning, which typically are done automatically in a garbage-collected language.

Some garbage collectors move memories in a process called memory compaction, and that is why C#, for example, also support pinning.

Typescript

Even experimental features can be enabled in JavaScript with the use of Babel, which can’t be done for TypeScript.

Totally not true4.

While JavaScript developers can use libraries that help with immutability, TypeScript developers typically have to rely on the native array/object spread operators

Both immutable.js and Rambda, the Javascript libraries that the author mentioned, provide typescript type definitions, and they are not harder to use compared to using them in JS.

Functional languages

As a person who tries to promote functional languages, the author should know better about those languages. Unfortunately, the author seems to have more errors in those languages, probably because they change from "opinionated rant mode" into actually talking about language features in this section.

Haskell

There's no type system more powerful than Haskell.

No type system can be considered the most "powerful." By the way, what about dependent type languages 4?

OCaml

There're three package managers — Opam, Dune, and Esy.

Dune is not a package manager, but instead a build system. It is often used in combination with Opam.

The go-to book for learning OCaml is Real World OCaml. The book hasn't been updated since 2013, and many of the examples are outdated.

The 2nd edition of Real World OCaml is up-to-date and also available freely online.

Scala

Scala has first-class support for immutable data structures (using case classes).

The Scala standard library does provide fantastic support for immutable data structures. However, case classes have nothing to do with those data structures.

Elm

The only time when you will encounter runtime errors with Elm is when interacting with outside JavaScript code.

Unfortunately, the Elm compiler still can generate Javascript code that throws exceptions at runtime.

interop with JavaScript libraries next to impossible

There are custom elements5 and ports6.

This means that the developers have no access to the vast ecosystem of libraries and components made for React.

You can make a React component a custom element.

We don’t even know whether or not he’s still working full-time on Elm. The language might actually be dead by now.

Evan is still doing work on Elm and interacts with the community regularly.

Reason ML

Just like TypeScript, ReasonML has access to the entire JavaScript ecosystem.

Using Javascript libraries in Reason requires some boilerplates (external), just like in Elm.

React initially was written in OCaml

The first prototype of React was written in Standard ML, rather than OCaml.

Elixir

Language family: ML.

Ok, I can stand that you say Haskell or Elm is in the ML family (though I disagree), but what is a dynamic-typed language doing here?

Conclusion

The article has some good content on pure functions, algebraic data types, pattern matching, and error-handling in FP languages. If the author removes all the biased, incorrect, and misleading content, I would recommend it for people to read. However, the author chooses a different path. Unfortunately, the Internet always rewards clickbait and sensational articles these days instead of posts with meaningful content.

Also, what worries me is that these kinds of blog posts will push people away from functional languages. A minority of trolls make people lose faith in the whole community. For example, here is one comment on Medium to the article:

Ooh, you're a functional programmer! That explains why I had that feeling you didn't know what you were talking about.

I can stop reading now. I read enough flawed examples, dubious comparisons and more from articles written by your kind."

Rest assured that most people in the functional programming community are friendly and don't have that kind of bias against your favorite language.

Improve Rust Link Time with lld(暂未翻译)

Today I start to experiment with the WebGPU API, and I choose to use the wgpu-rs implementation in Rust. I am happy with the experience overall, but one difficulty I met is the long iterative compilation time:

Code is compiling meme
Compiling
Source: xkcd

The compile-time of my webgpu project is ridiculously slow (about 10s for this single file), any #Rust expert know how to improve that? https://t.co/SVZYs54L7E

— Lesley Lai | Remember ThePhD (@LesleyLai6) November 2, 2020

For some applications, slow compile-time is OK. Coding some hard algorithms requires extensive thinking, and if they compile and pass unit tests, they are likely correct.

By contrast, for graphics and game programming, iteration time is paramount. A lot of the time, there are no right or wrong answer to a problem, instead, we need to do a lot of small tweakings.

Fortunately, a person (user Rukai) on the Graphics Programming Discord provides a solution.

What I need to do is to create a config file ~/.cargo/config as

[build]
rustflags = [
  "-C", "link-arg=-fuse-ld=lld",
]

This flag sets lld to the linker, which is way faster than Rust's default linker. And I also need to install lld on my computer.

And this simple change magically makes my iterative compilation time under 3s. It is still not ideal from my perspective, but at least I can enjoy doing this project again.

Recursive Modules in OCaml(暂未翻译)

Recursive module is an interesting feature in OCaml. To use it, we need to use the form

module rec module-name : module-signature = module-expr

Explicit signature is required when using recursive modules, as the compiler can no longer deduce the module signature with recursion.

A typical recursive module looks like the following:

module rec M : sig
  (* explicit signature *)
end = struct
  (* Implementations *)
end

And we can even have mutually recursive modules, for example:

module rec A : sig ...end = struct ... end
and B : sig ... end = struct ... end

My primary use case of recursive modules is to combine with first-class modules. First-class modules are ordinary values that wrap a module. It is a powerful way to introduce dynamic polymorphism in OCaml.

Dynamic polymorphism is usually combined with recursive data types, but Ocaml modules are not recursive by default. Thus, recursive modules serve as valuable additions.

For example, I use first-class modules and recursive modules in my ocamlpt project. Ocamlpt is a path tracer, where its scene contains different kinds of shapes. The signature of a shape is the following:

module type Shape = sig
  type t
  val hit: Ray.t -> t -> Material.hit_record option
  val bounding_box: t -> Aabb.t
end

We want to make the shape polymorphic, so we need to use first-class modules. In the below code, I introduce a Shape_instance modules that wrap both the shape module and the value of that module, and I also add a build_shape function that builds first-class modules of the signature of Shape_instance. This way, we can store those first-class modules, and Whenever we want to use them, we can unwrap the first-class modules to get a concrete Shape_instance module.

module type Shape_instance = sig
  module S: Shape
  val this: S.t
end

let build_shape
    (type a)
    (module S : Shape with type t = a)
    (shape: a)
  =
  (module struct
    module S = S
    let this = shape
  end : Shape_instance
  )

The above code is good enough to deal with concrete shapes such as spheres or triangles. However, the shapes are organized in a tree structure called bounding volume hierarchy (BVH). And each BVH node can contain other shapes, including BVH nodes themselves.

We can implement the BVH node with recursive modules:

module rec Bvh_node : sig
  include Shape
  val create: (module Shape_instance) list -> t
end = struct

type t = {
  left: (module Shape_instance);
  right: (module Shape_instance);
  aabb: Aabb.t;
}

(* Other members of the Bvh_node module *)

(* Creating bvh code from a list of objects *)
let rec create (shapes: (module Shape_instance) list) =
  ...
  (* if shapes contain 3 elements *)
  let left = ...
  and right = ... in
  let aabb = Aabb.union left.aabb right.aabb in
  { left=(build_shape (module Bvh_node) left);
    right=(build_shape (module Bvh_node) right);
    aabb }

end

The above code works like magic, but the code would not compile without the rec keyword before Bvh_node, since the create function refers to the module Bvh_node itself.

All in all, recursive modules is a way to support cyclic dependencies among components that the purely hierarchical module systems cannot support. Such cyclic dependencies are usually undesirable and can be avoided by changing the software design. Nevertheless, sometimes there are legitimate reasons to have a module depend on itself, especially consider how versatile the OCaml module system is. In those cases, recursive modules can serve as a valuable asset.

Beware passing mutable lambda to STL algorithms.(暂未翻译)

Recently, I have seen some people passing complex mutable lambdas to standard algorithms. Those usages usually come from one mindset: "Since we want to follow 'no raw-loop,' and the choice of STL algorithms is limited, what can we do other than using a mutable lambda to hold our complicated logic?" I think that both premises of this thought are wrong. First, "no raw-loop" should be treated as an ideal instead of dogma. Second, even though STL algorithms cannot cover every use case, we can always write algorithms to fit our needs.

I expressed this thoughts in the following tweet:

Random thought: if you want to pass a complicated mutable lambda to a C++ standard algorithm, you are using the wrong algorithm.

— Lesley Lai (@LesleyLai6) September 21, 2020

And this post tries to expend this thought a little bit.

Mutable Lambdas destroy the beauty of <algorithms>

Why we use <algorithm>? Is it because it is "elegant" or "modern?" Or is it because "Some experts said so?" Both are horrible reasons to prefer <algorithm> over loops. For me, <algorithm> provides the following benefits:

  • Less mutable states
  • Declarative
  • Express intent
  • Known correct implementation

Mutable lambda destroys all of them. First, STL algorithms encapsulate mutable states into small functions. Nevertheless, we only need mutable lambda when our algorithm fails to encapsulate all mutable logics. Second, since the mutable states and complex control flow are back, we can no longer call our implementation declarative. Third, since we need complicated logic inside a lambda to stretch the algorithm to perform our task, the algorithm does not express our intent. Fourth, since the algorithm does not express our intent, even though the algorithm itself is correct, there can still be bugs lure in our own hard-to-understand code.

A LeetCode example

Let's look at the following C++ solution to the LeetCode Two Sum problem by Yacob Cohen-Arazi. The problem is worded as follows: "Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target." and LeetCode provides the type signature of the twoSum function that we cannot change.

std::vector<int> twoSum(std::vector<int>& nums, int target) {
  int idx1{}, idx2{};
  auto process_and_lookup(
      [m = std::unordered_map<int, int>(),
       i = 0, target, &idx1, &idx2]
      (const auto item) mutable {
        auto iter = m.find(target - item);
        if (iter == cend(m)) {
          m[item] = i++;
          return false;
        }
        idx1 = iter->second;
        idx2 = i;
        return true;
      });

  auto iter = std::find_if(
    cbegin(nums), cend(nums), process_and_lookup);
  assert(iter != cend(nums));
  return {idx1, idx2};
}

This version is long, messy, and hard to read. It also contains five mutable states m, idx1, idx2, i, and target, even though target is never modified. Here is the loop version I wrote that are doing essentially the same logic:

std::vector<int> twoSum(std::vector<int>& nums, int target) {
  std::unordered_map<int, int> nums_map;

  const int size = static_cast<int>(nums.size());
  for (int i = 0; i < size; ++i) {
    const auto item = nums[i];
    const auto iter = nums_map.find(target - item);
    if (iter != nums_map.end()) {
      return {iter->second, i};
    }
    nums_map.emplace(item, i);
  }
  throw std::runtime_error{"No solution exist"};
}

This loop version is shorter, easier to understand, and only contains two mutable states: the map nums_map and index i.

The <algorithm> version lands badly here because std::find_if does not match this problem's intent. std::find_if finds a single element that matches a predicator, but our situation requires to find two elements that match a predicator together. As a result, it does not provide enough useful functionalities for this problem but instead serves as an obstacle. I consider this kind of <algorithm> usages instances of the abstraction inversion anti-pattern, where the abstraction is so unfit for the task that we start to re-implements the implementation details that our abstractions suppose to hideaway. This kind of usage makes the code hard to read, introduces potential non-trivial runtime cost, and increases the possibility of introducing bugs. The <algorithm> header tries to address all of the adversities, but by using mutable lambda, we somehow land us in a situation worse than the loop counterparts of our functions.

Another Example: Calculates inner product until it satisfies a predicate

Dima Savin gives me a tricky problem:

Ok, which algorithm should I use to calculate the inner product until it exceeds a value and get the index when it happened? Didn't find find just the right thing in a table by @code_report that doesn't require a mutable lambda.

— Dima Savin (@dima_savin) September 21, 2020

This problem is tricky to solve with STL algorithms since STL algorithms are designed to compose sequentially, and as we will see in the loop version, there is multiple interleaved logic during the iteration.

Thus, I will use the loop version as the starting point. Since Dima does not specify what happens if we didn't find the index, I return the final result of i, which should be the index of the last element plus one:

template <std::input_iterator Itr, std::input_iterator Itr2, class T>
auto inner_product_till(
        Itr first1, Itr last1, Itr2 first2, const T upper_bound)
   -> std::size_t
{
  T acc{};
  std::size_t i = 0;
  for (; first1 != last1; ++first1, ++first2, ++i) {
    acc = std::move(acc) + *first1 * *first2;
    if (acc > upper_bound) { return i; }
  }
  return i;
}

This version is certainly not ideal. It contains four mutable states first1, first2, i, and acc. Nevertheless, the logic inside the loop is simple, and every decent C++ programmers should be able to grasp this code in a relatively short amount of time.

I am satisfied with this version. Even the person who proposed "no raw loop" ideology in the first place, Sean parent, will not consider this kind of simple loops that are nicely encapsulated in a function "raw loops."

A raw loop is any loop inside a function where the function serves purpose larger than the algorithm implemented by the loop — Sean Parent, 2013 1

The std::find + mutable lambda version, however, is certainly inferior to the loop version. This version contains the same amount of mutable states, and is signaficantly harder to read even for people familiar with this kind of lambda-heavy style of programming:

template <std::input_iterator Itr, std::input_iterator Itr2, class T>
auto inner_product_till(
        Itr first1, Itr last1, Itr2 first2, const T upper_bound)
   -> std::size_t
{
  std::size_t i = 0;
  std::find_if(first1, last1,
              [acc = T{}, first2, upper_bound, &i]
                (const T& elem) mutable {
                  acc = std::move(acc) + elem * *first2;
                  if (acc > upper_bound) return true;
                  ++first2;
                  ++i;
                  return false;
                });
  return i;
}

If we step back a little bit and think about what logic we try to achieve here. We can find two interleaving steps. First, we need to perform an inner product to the elements we meet so far. Second, we find whether this calculated inner product is greater than the upper_bound. If we ignore the "interleaving" part, then we can use std::transform and std::partial_sum to perform the first step and std::find_if to perform the second step:

template <std::input_iterator Itr, std::input_iterator Itr2, class T>
auto inner_product_till(
        Itr first1, Itr last1, Itr2 first2, const T upper_bound)
    -> std::size_t
{
  std::vector<T> products;
  std::transform(first1, last1, first2, std::back_inserter(products),
                 std::multiplies<T>{});
  std::partial_sum(products.begin(), products.end(),
                   products.begin());
  const auto result = std::find_if(products.begin(), products.end(),
                      [&](T e) { return e > upper_bound; });
  return std::distance(products.begin(), result);
}

This version is the closest to my thought flow, however, it is also very inefficient since it allocates extra heap memory and eagerly computes results that we may not need. Lazy ranges view solves the performance problem, If ranges have numeric algorithms support, then we can potentially write the following code:

template <std::input_range Range, class T>
auto inner_product_till(Range r1, Range r2, const T upper_bound)
    -> std::size_t
{
  return std::ranges::distance(
    std::view::transform(r1, r2, std::multiplies<T>{})
    | std::view::partial_sum
    | std::view::take_while([&](T e) { return e > upper_bound; }));
  );
}

This version is splendid. It does not allocate and exit early, so in theory, it can be as efficient as the raw loop version or the mutable lambda version, and it is certainly much more readable and less error-prone to write than both of them. Unfortunately, none of the algorithms in the <numeric> header is included in the C++20 ranges. As a result, std::view::partial_sum is not a thing at the time of this writing. Nevertheless, the range-v3 library includes all those functionalities.

Don't be afraid of writing your own algorithm

Another way to solve this problem is to write your own algorithm. For example, in the above example, we can write our own view::partial_sum view adapter.

Our algorithm often does not need to be very generic in practice, as you can always enhance it later when you need to reuse this piece of code. The starting point of an algorithm can merely be "extracting a loop into a function."2

Also, the interesting thing is that the above inner_product_till is an STL-compatible algorithm. And we can treat it as one of the lowest levels of abstraction. If it is well tested, fast, and well behaved, who cares about whether it uses loops or other algorithms under-the-hood? It is not as generic as std::inner_product, but we can always add initial value and the plus/multiply binary operations as parameters later if we need them.

What about using mutable lambdas in std::generate?

A lot of usages of std::generate use mutable lambdas as a "generator" function. For example, the following code generates the first 20 numbers of the recurrence relationship x0=0,xn=2xn1+1x_0 = 0, x_n = 2x_{n-1} + 1.

int seq[20];

std::generate(std::begin(seq), std::end(seq),
    [x = 0]() mutable {
        return std::exchange(x, x * 2 + 1);
    });

This kind of "generator" usage of std::generate and mutable lambdas is common, and I think, unlike previous examples, they are fine.

There is an advantage of this version compared to using a loop. Compared to the equivalent loop version, the scope of the mutable variable x is constrained to be in lambda's scope. And we should strive to make the scope of variables (especially mutable) as small as possible. Nevertheless, we can surround the loop with explicit brace pair to get a similar effect:

int seq[20];

{
  int x = 1;
  for (auto& elem: seq) {
    elem = std::exchange(x, x * 2 + 1);
  }
}

Consider the alternatives over passing mutable lambdas to STL algorithms

To sum everything up, I believe passing mutable lambdas to STL algorithms other than std::generate or std::generate_n is an anti-pattern that we should try to avoid. There are several alternatives. Sometimes we can switch to a better algorithm. Sometimes using a plain-old loop is the better option. And sometimes, we can write our customized algorithms to achieve the task.


  1. Sean Parent, 2013. C++ Seasoning. Retrieved September 23, 2020, from http://channel9.msdn.com/Events/GoingNative/2013/Cpp-Seasoning
  2. Writing your algorithm is not rocket science, but the more generic an algorithm is, the more factors we need to consider. Ben Deane's talk Constructing Generic Algorithms: Principles and Practice is an excellent resource on this topic.

在C++中使用Const或者引用成员变量的后果

在C++中,使用const或者引用非静态成员变量会造成一些问题。 很多的资深C++程序员都知道这一点, 但是网上没有人单独写一篇文章来阐述原因。 而我在网上反复看到有人问这个问题, 所以我决定写下这篇博文。

Const成员变量

在有一些编程语言中,例如Rust,所有变量都默认是const,而你需要手动声明某个变量是可变的。 如果你有使用这些编程语言的经历,那么你可能会想在C++中给所有不需要修改的变量都加上const。 这种实践有很大的好处。 但在C++中,成员变量是一个例外,因而我们并不会把这种实践放到成员变量上。

const成员变量禁用了一个类的赋值(assignment)以及移动语义(move semantics)。 这听起来很有道理,因为你肯定不希望重新赋值或者移走某个const变量。

“所有这有什么问题吗?”你可能会问,“我已经说过我不想修改这个变量了。”

但是很多的操作,例如swap,依赖于赋值以及移动语义两者。 如果单单缺乏了移动语义,swap仍然可以复制。 但赋值的缺乏会造成swap无法编译通过:

struct BadImmutablePoint {
    const int x = 0;
    const int y = 0;
};

int main() {
  BadImmutablePoint p1;
  BadImmutablePoint p2 {42, 55};
  std::swap(p1, p2); // 错误
}

不仅仅是swap,所有STL中和赋值有关的操作都会被禁用。例如排序:

std::vector<BadImmutablePoint> points;
// 想要按x轴排序
std::ranges::sort(points, {}, &BadImmutablePoint::x); // 错误

但我不想要修改这个成员变量!

在C++中,你最多只能把这个变量设为private,然后只暴露这个变量的Getter。

这种做法仍然没有防止该类内部的成员函数修改这个成员变量, 但至少类以外的函数再也没法随意改动这个变量了。

class ImmutablePoint {
    int x_ = 0;
    int y_ = 0;

public:
    constexpr ImmutablePoint() = default;
    constexpr ImmutablePoint(int x, int y) : x_{x}, y_{y} {}
    [[nodiscard]] constexpr auto x() const -> int { return x_; }
    [[nodiscard]] constexpr auto y() const -> int { return y_; }
};

int main() {
    std::vector<ImmutablePoint> points;
    ...
    std::ranges::sort(points, {}, &ImmutablePoint::x); // Ok
}

上例中有很多的“八股代码”(boilerplate code)。

老实说,如果让我写这个例子,我会使用简单的聚合加上非const的成员变量:

struct Point {
    int x = 0;
    int y = 0;
};

const Point immutable_point {42, 55};

如果你真的想要搞得很花哨的话,你甚至可以写一段小的模板来自动化上述的过程。

template <typename T>
class const_wrapper {
    T val_;
public:
    constexpr const_wrapper(const T& val) : val_{val} {}
    constexpr const_wrapper(T&& val) : val_{std::move(val)} {}

    [[nodiscard]] constexpr auto get() const -> const T& { return val_; }
    [[nodiscard]] constexpr operator T() const { return val_; }
};

那么你就可以按照如下的方式来使用这个模板:

struct ImmutablePoint {
    const_wrapper<int> x = 0;
    const_wrapper<int> y = 0;
};

int main() {
    std::vector<ImmutablePoint> points;
    ...
    std::ranges::sort(points, {}, &ImmutablePoint::x); // Ok
}

引用成员变量

C++引用无法重绑定,这一点与指针以及很多其他语言中的“引用”不同。 因此,我们面对和const一样的情况。 引用非常类似于一个不能是空的常指针。 例如,如下的三角形结构体有和拥有const成员变量的结构体同样的问题:

struct BadImmutableTriangle {
    const ImmutablePoint& a;
    const ImmutablePoint& b;
    const ImmutablePoint& c;
};

与之前类似, 我们同样可以不直接储存引用成员变量, 而储存指针成员变量并且只暴露getter。

class ImmutableTriangle {
    const ImmutablePoint* a_;
    const ImmutablePoint* b_;
    const ImmutablePoint* c_;

public:
    // 没有默认构造函数,必须提供三角形的三个顶点
    constexpr ImmutableTriangle(
        const ImmutablePoint& a,
        const ImmutablePoint& b,
        const ImmutablePoint& c)
        : a_{&a}, b_{&b}, c_{&c} {}

    [[nodiscard]] constexpr auto a() const -> const ImmutablePoint& { return *a_; }
    [[nodiscard]] constexpr auto b() const -> const ImmutablePoint& { return *b_; }
    [[nodiscard]] constexpr auto c() const -> const ImmutablePoint& { return *c_; }
};

C++标准库很方便得包含了std::reference_wrapper类型,它和我们刚刚提到的const_wrapper非常类似。

struct ImmutableTriangle {
    std::reference_wrapper<const ImmutablePoint> a;
    std::reference_wrapper<const ImmutablePoint> b;
    std::reference_wrapper<const ImmutablePoint> c;
};

std::reference_wrapper比我的const_wrapper更有用。 比如说,std::reference_wrapper可以用来把多个引用存入一个容器:

std::vector<ImmutablePoint&> triangles1; // Error
std::vector<std::reference_wrapper<ImmutablePoint>> triangles2; // Ok
std::vector<ImmutablePoint*> triangles3; // Ok

那么std::ranges::sort(triangles2);根据三角形的值来排序。 因为三角形没有默认的顺序,这句会在我们意料之中编译不通过。 反过来,std::ranges::sort(triangles3)可以编译通过, 但是它会根据指针的地址来排序,这一定不是我们想要的行为。

在什么时候可以使用Const或者引用成员变量?

在某些情况下,某个类对应的默认赋值以及移动语义本来就不可用或者已经被删除了。 比如继承体系(inheritance hierarchies)就是一个主要的例子。 在这些情况下,使用const或者引用成员变量也不会有什么后果。

在使用本地函数对象时,我们有时候也需要使用到const或者引用成员变量。 lambda表达式中被引用捕获(capture by reference)的变量会被转化为引用成员变量。

结论

C++是一门指令式编程语言。 它继承了很多C的特点,而const以及引用都是后来才加入到语言当中的。 赋值在C++语言中有非常重要的应用。 因此,不管你喜不喜欢,你很难限制外界代码修改单独某个成员变量的自由。

Zero is the Devil: Common ways to construct bogus proofs. (暂未翻译)

It is easy to make mistakes when conducting mathematical proofs. Nevertheless, you can find some recurring error patterns in those proofs. And some of the most common reasons are related to the innocuous-looking number zero.

Division-by-zero fun

Let's look at the following "proof" of 1=21 = 2:

let a,bZ such that a=ba2=aba2b2=abb2(a+b)(ab)=b(ab)a+b=ba+a=a2a=a2=1\begin{aligned} \text{let } a, b \in \mathbb{Z} \text{ such that } a = b \\ a^2 &= ab \\ a^2 - b^2 &= ab - b^2 \\ (a + b)(a - b) &= b(a - b) \\ a + b &= b \\ a + a &= a \\ 2a &= a \\ 2 &= 1 \end{aligned}

What is wrong here? We cancel both side of the equation by aba - b, but our premise includes a=ba = b, so we have a division-by-zero problem.

Generally, performing cancellation without a zero-check is a terrible idea and should be avoided.

Sets with zero elements

Ok, here is another stupid "proof" of that "all objects are the same." We will assume that objects are countable.

Proof:

Let SS be the set of all objects. And let the property P(n)P(n) mean that all subsets of SS of size at most nn contain the same same objects. Formally:

P(n)XPow(S),  Xn such that o,o X,o=oP(n) \equiv \forall X \in \text{Pow}(S),\; |X| \leq n \text{ such that } \forall o, o' \ \in X, o = o'

where Pow(S)\text{Pow}(S) is the power set of the set SS, which is defined by all subsets of SS, and X|X| means the cardinality (elements count) of XX.

We want to prove that n>1,P(n)\forall n > 1, P(n). And we will prove that by mathematical induction on nn.

Base case (n=1n = 1):

This is trivial as the singleton set of object can only contain the same object.

Inductive cases:

We treat P(n)P(n) as our inductive hypothesis, and we need to prove P(n+1)P(n + 1). Without the loss of generality, pick an arbitrary set XPow(S)X \in \text{Pow}(S) such that X=n+1|X| = n + 1. Pick two objects x,xXx, x' \in X, and let's show x=xx = x'. Let Y=XxY = X - {x} and Y=XxY' = X - {x'}. Since Yn,Yn|Y| \le n, |Y'| \le n, we know that P(Y)P(Y) and P(Y)P(Y') by the inductive hypothesis. Pick an arbitrary object yYYy \in Y \cup Y'. We get y=xy = x because of P(Y)P(Y) and x,yYx,y \in Y. Similarly, y=xy = x'. Thus, x=xx = x', which proves the inductive steps and the "theorem" n>1,P(n)\forall n > 1, P(n).

Again the mistake here is related to zero. YY|Y \cup Y'| can well be zero, so we cannot just "pick" an element from it.

If you are from a more programming background, it is no coincidence that dividing by zero or getting an element from a collection of zero-elements will cause horrible run-time errors.

I hope you have fun reading this post, just as me having fun writing it.

Summary of reading: January - March 2020(暂未翻译)

  • "Real World OCaml Functional programming for the masses 2nd edition" by Yaron Minsky, Anil Madhavapeddy, and Jason Hickey - I highly recommend this book for people who want to learn Ocaml in-depth. However, it does require familiarity with functional programming to understand. I understand a lot of advanced ML language features such as Functors (witch very different from Haskell functors) and First-Class Modules by reading this book. Also, as the name suggests, this book is a "real world" programming book that spends enough time on the build system and libraries.

  • "The Formal Semantics of Programming Languages: An Introduction" by Glynn Winskel - The book has a clear explanation of the programming language concepts. It is centralized around a little programming language IMP, and the book defines it with different style semantics. On the other side, since this book is so dated (1993), the notations used in the book are quite weird.

  • "Practical Foundations for Programming Languages" by Robert Harper - Maybe I am dumb, but I find this book a dry and tough read. A lot of the time, this book read more like a reference menu than a textbook. If you don't understand the concepts, reading this book is probably not an efficient way to help you. On the other hand, if you do understand the concepts, then you will find the definition mechanical. Simultaneously reading the Winskel book helps a lot in understanding this book. I like the emphasis on language static of this book, which is a missing part of the Winskel book.

  • "Hands-On Design Patterns with C++" by Fedor G. Pikus - I start reading this one after someone recommended it on the cpplang slack channel. I like how this book focuses on idiomatic C++ instead of design patterns. My only gripe about this book is that some examples are quite contrived or may not use the right pattern to solve the problems. For example, the game example in the template method chapter should be implemented with a component architecture or ECS instead of inheritance. I understand that those examples are just for demonstration purposes, but they can be misleading for people who don't know the alternatives.

  • "The Elements of Style" by William Strunk Jr. and E. B. White - A cute little book on how to write English efficiently. I have to say, nevertheless, that understanding the points from the book is far from applying them as an instinct.

noexcept对代码生成的影响

在C++代码中,如果我们把每个函数声明都加上noexcept,我们的代码会变得更高效吗? 事情不是这么地简单。 考虑以下代码片段:

int g();

int f() {
  return g();
}

我故意不在此翻译单元(translation unit)中定义g,否则的话编译器会有足够的信息来内联(inline)g的所有内容。 尽管如此,所有主要的C++编译器都能弄清楚f仅包含对g的尾调用,并生成如下代码:

f():
        jmp     g()

现在我们来考虑如下的代码片段:

int g();

int f() noexcept {
  return g();
}

编译器在不知道g是否会抛出异常的情况下被迫在f生成代码来处理异常的确被抛出的情况。下列是不同的编译器生成的汇编码:

msvc

$ip2state$int f(void) DB 02H
        DB      08H
        DB      00H
$cppxdata$int f(void) DB 060H
        DD      imagerel $ip2state$int f(void)

int f(void) PROC                                      ; f, COMDAT
$LN5:
        sub     rsp, 40                             ; 00000028H
        call    int g(void)                         ; g
        npad    1
        add     rsp, 40                             ; 00000028H
        ret     0
int f(void) ENDP                                      ; f

gcc

f():
        sub     rsp, 8
        call    g()
        add     rsp, 8
        ret

clang

f():
        push    rax
        call    g()
        pop     rcx
        ret
        mov     rdi, rax
        call    __clang_call_terminate
__clang_call_terminate:
        push    rax
        call    __cxa_begin_catch
        call    std::terminate()

如何处理C函数

现在我们知道,在noexcept函数中调用非noexcept的函数会产生低效的代码 我们如何处理某些保证不会抛出异常却没有被标记为noexcept的函数呢? 幸运的是,Hana Dusíková已经给我们提供了一个解决方案:

Did you ever get an suboptimal code, because you were calling external C function in your noexcept code?

Suffer no more:https://t.co/LA7C76a063

— Hana Dusíková 🍊 (@hankadusikova) June 27, 2020

你可以通过将noexcept_cast函数标记为强迫内联(force inline),这样的话即使在debug mode下noexcept_cast函数也不会造成性能损失。

结论

请小心对noexcept使用,特别要注意那些可能会运行用户提供代码的高阶函数(higher-order functions)。

❌