FIFTH EDITION
第五版
FIFTH EDITION
第五版
Steve Marschner
史蒂夫·马施纳
Cornell University
康奈尔大学
Peter Shirley
彼得·雪莉
NVIDIA
英伟达
with
和
Michael Ashikhmin, Gro Intelligence
迈克尔·阿希克明(Michael Ashikhmin)
Gleicher Michael, University of Wisconsin
格莱彻·迈克尔,威斯康星大学
Naty Hoffman, Lucasfilm
纳蒂·霍夫曼,卢卡斯影业
Garrett Johnson, Rochester Institute of Technology
罗彻斯特理工学院的加勒特·约翰逊
Tamara Munzner, University of British Columbia
Tamara Munzner,不列颠哥伦比亚大学
Erik Reinhard, InterDigital, Inc.
Erik Reinhard, InterDigital 公司
William B. Thompson, University of Utah
威廉·B·汤普森,犹他大学
Peter Willemsen, University of Minnesota Duluth
彼得·威廉森,明尼苏达大学德卢斯分校
Brian Wyvill, SceneWizard Software Ltd.
Brian Wyvill,SceneWizard Software Ltd.
Fifth edition published 2022
第五版于 2022 年出版
by CRC Press
作者:CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
6000 Broken Sound Parkway NW, Suite 300,博卡拉顿,FL 33487-2742
and by Routledge
以及劳特利奇
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
2 Park Square,米尔顿公园,阿宾登,奥克森,OX14 4RN
© 2022 Taylor & Francis Group, LLC
© 2022 Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, LLC
CRC Press 是 Taylor & Francis Group, LLC 的子公司
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
我们已尽合理努力发布可靠的数据和信息,但作者和出版商不能对所有材料的有效性或使用它们的后果承担责任。作者和出版商已尝试追踪本出版物中复制的所有材料的版权所有者,如果未获得以这种形式发布的许可,我们向版权所有者道歉。如果任何版权材料未得到承认,请写信告知我们,以便我们在将来的任何重印中纠正。
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
除美国版权法允许外,未经出版商书面许可,不得以任何电子、机械或其他方式(现在已知或以后发明的)重印、复制、传播或利用本书的任何部分,包括影印、缩微胶卷和录音,或在任何信息存储或检索系统中重印、复制、传播或利用本书的任何部分。
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@tandf.co.uk
如需复印或以电子方式使用本作品的资料,请访问www.copyright.com或联系 Copyright Clearance Center, Inc. (CCC),地址:222 Rosewood Drive, Danvers, MA 01923,电话:978-750-8400。如需 CCC 上未提供的作品,请联系mpkbookspermissions@tandf.co.uk
Trademark notice Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe.
商标声明产品或公司名称可能是商标或注册商标,仅用于识别和解释,并不意图侵权。
Library of Congress Cataloging‑in‑Publication Data
美国国会图书馆出版品目数据
Names: Marschner, Steve, author. | Shirley, Peter, author.
姓名:Marschner,Steve,作者。| Shirley,Peter,作者。
Title: Fundamentals of computer graphics / Steve Marschner, Peter Shirley.
标题:计算机图形学基础 / Steve Marschner、Peter Shirley。
Description: 5th edition. | Boca Raton: CRC Press, 2021. | Includes bibliographical references and index. Identifiers: LCCN 2021008492 | ISBN 9780367505035 (hardback) | ISBN 9781003050339 (ebook)
描述:第 5 版。| 博卡拉顿:CRC Press,2021 年。| 包括参考书目和索引。标识符:LCCN 2021008492 | ISBN 9780367505035(精装本)| ISBN 9781003050339(电子书)
Subjects: LCSH: Computer graphics.
主题:LCSH:计算机图形学。
Classification: LCC T385 .M36475 2021 | DDC 006.6—dc23
分类:LCC T385 .M36475 2021 | DDC 006.6—dc23
LC record available at https://lccn.loc.gov/2021008492
LC 记录可在https://lccn.loc.gov/2021008492上查阅
ISBN: 978-0-367-50503-5 (hbk)
ISBN: 978-0-367-50503-5 (精装)
ISBN: 978-0-367-50558-5 (pbk)
ISBN: 978-0-367-50558-5 (平装)
ISBN: 978-1-003-05033-9 (ebk)
ISBN: 978-1-003-05033-9 (电子书籍)
Typeset in Times
按时代排版
by codeMantra
作者:codeMantra
1.2 Major Applications
1.2 主要应用
1.7 Designing and Coding Graphics Programs
1.7 设计和编写图形程序
2.1 Sets and Mappings
2.1 集合和映射
2.2 Solving Quadratic Equations
2.2 解二次方程
2.7 Curves and Surfaces
2.7 曲线和曲面
2.8 Linear Interpolation
2.8 线性插值
2.10 Discrete probability
2.10 离散概率
2.11 Continuous probability
2.11 连续概率
2.12 Monte Carlo Integration
2.12 蒙特卡罗积分
3.2 Images, Pixels, and Geometry
3.2 图像、像素和几何形状
3.4 Alpha Compositing
3.4 Alpha 合成
4.1 The Basic Ray-Tracing Algorithm
4.1 基本光线追踪算法
4.3 Computing Viewing Rays
4.3 计算观察光线
4.4 Ray-Object Intersection
4.4 射线-物体相交
5.1 Point-like light sources
5.1 点光源
5.2 Basic reflection models
5.2 基本反射模型
5.3 Ambient illumination
5.3 环境照度
6.3 Computing with Matrices and Determinants
6.3 矩阵和行列式计算
6.4 Eigenvalues and Matrix Diagonalization
6.4 特征值和矩阵对角化
7 Transformation Matrices
7 变换矩阵
7.1 2D Linear Transformations
7.1 二维线性变换
7.2 3D Linear Transformations
7.2 三维线性变换
7.3 Translation and Affine Transformations
7.3 平移和仿射变换
7.4 Inverses of Transformation Matrices
7.4 变换矩阵的逆
7.5 Coordinate Transformations
7.5 坐标变换
8.1 Viewing Transformations
8.1 查看变换
8.2 Projective Transformations
8.2 射影变换
8.3 Perspective Projection
8.3 透视投影
8.4 Some Properties of the Perspective Transform
8.4 透视变换的一些性质
9.2 Operations Before and After Rasterization
9.2 光栅化前后的操作
9.3 Simple Antialiasing
9.3 简单的抗锯齿
9.4 Culling Primitives for Efficiency
9.4 剔除图元以提高效率
10.1 Digital Audio: Sampling in 1D
10.1 数字音频:1D 采样
10.3 Convolution Filters
10.3 卷积滤波器
10.4 Signal Processing for Images
10.4 图像信号处理
11.1 Looking Up Texture Values
11.1 查找纹理值
11.2 Texture Coordinate Functions
11.2 纹理坐标函数
11.3 Antialiasing Texture Lookups
11.3 抗锯齿纹理查找
11.4 Applications of Texture Mapping
11.4 纹理映射的应用
11.5 Procedural 3D Textures
11.5 程序 3D 纹理
12 Data Structures for Graphics
12 图形数据结构
12.1 Triangle Meshes
12.1 三角形网格
12.3 Spatial Data Structures
12.3 空间数据结构
12.4 BSP Trees for Visibility
12.4 BSP 树的可见性
12.5 Tiling Multidimensional Arrays
12.5 平铺多维数组
13.2 Continuous Probability
13.2 连续概率
13.3 Monte Carlo Integration
13.3 蒙特卡罗积分
13.4 Choosing Random Points
13.4 选择随机点
14 Physics-Based Rendering
14 基于物理的渲染
14.3 Smooth Dielectrics
14.3 光滑电介质
14.4 Dielectrics with Subsurface Scattering
14.4 具有次表面散射的电介质
14.5 A Brute Force Photon Tracer
14.5 强力光子示踪器
14.7 Radiometry of Scattering
14.7 散射辐射测量
14.8 Transport Equation
14.8 传输方程
14.9 Materials in Practice
14.9 实践中的材料
14.10 Monte Carlo Ray Tracing
14.10 蒙特卡罗射线追踪
15.2 Curve Properties
15.2 曲线属性
15.3 Polynomial Pieces
15.3 多项式片断
15.4 Putting Pieces Together
15.4 整合各部分
15.6 Approximating Curves
15.6 近似曲线
16.1 Principles of Animation
16.1 动画原理
16.4 Character Animation
16.4 角色动画
16.5 Physics-Based Animation
16.5 基于物理的动画
16.6 Procedural Techniques
16.6 程序技术
16.7 Groups of Objects
16.7 对象组
17 Using Graphics Hardware
17 使用图形硬件
17.1 Hardware Overview
17.1 硬件概述
17.2 What Is Graphics Hardware
17.2 什么是图形硬件
17.3 Heterogeneous Multiprocessing
17.3 异构多处理
17.4 Graphics Hardware Programming: Buffers, State, and Shaders
17.4 图形硬件编程:缓冲区、状态和着色器
17.6 Basic OpenGL Application Layout
17.6 基本 OpenGL 应用程序布局
17.8 A first Look at Shaders
17.8 初识着色器
17.9 Vertex Buffer Objects
17.9 顶点缓冲区对象
17.10 Vertex Array Objects
17.10 顶点数组对象
17.11 Transformation Matrices
17.11 变换矩阵
17.12 Shading with Per-Vertex Attributes
17.12 使用每个顶点属性进行着色
17.13 Shading in the fragment Processor
17.13 片段处理器中的着色
17.14 Meshes and Instancing
17.14 网格和实例
17.15 Texture Objects
17.15 纹理对象
17.16 Object-Oriented Design for Graphics Hardware Programming
17.16 面向对象图形硬件编程设计
17.17 Continued Learning
17.17 继续学习
18.3 Chromatic Adaptation
18.3 色彩适应
18.4 Color Appearance
18.4 颜色外观
19.2 Visual Sensitivity
19.2 视觉敏感度
19.4 Objects, Locations, and Events
19.4 物体、地点和事件
19.5 Picture Perception
19.5 图像感知
20.5 Frequency-Based Operators
20.5 基于频率的运算符
20.6 Gradient-Domain Operators
20.6 梯度域算子
20.7 Spatial Operators
20.7 空间运算符
20.10 Other Approaches
20.10 其他方法
20.11 Night Tonemapping
20.11 夜间色调映射
21.1 Implicit Functions, Skeletal Primitives, and Summation Blending
21.1 隐式函数、骨架基元和求和混合
21.3 Space Partitioning
21.3 空间划分
21.4 More on Blending
21.4 关于混合的更多信息
21.5 Constructive Solid Geometry
21.5 构造立体几何
21.7 Precise Contact Modeling
21.7 精确接触建模
21.8 The BlobTree
21.8 BlobTree
21.9 Interactive Implicit Modeling Systems
21.9 交互式隐式建模系统
22 Computer Graphics in Games
22 游戏中的计算机图形学
22.2 Limited Resources
22.2 有限的资源
22.3 Optimization Techniques
22.3 优化技术
22.5 The Game Production Process
22.5 游戏制作流程
23.3 Human-Centered Design Process
23.3 以人为本的设计流程
23.4 Visual Encoding Principles
23.4 视觉编码原理
23.5 Interaction Principles
23.5 交互原则
This edition of Fundamentals of Computer Graphics includes substantial rewrites of the material on shading, light reflection, and path tracing, as well as many corrections throughout. This book now provides a better introduction to the techniques that go by the names of physics-based materials and physics-based rendering and are becoming predominant in actual practice. This material is now better integrated, and we think this book maps well to the way many instructors are organizing graphics courses at present.
本版《计算机图形学基础》对着色、光反射和路径追踪等内容进行了大量的重写,并对全文进行了许多修正。本书现在更好地介绍了以物理为基础的材料和基于物理的渲染为名的技术,这些技术在实际实践中正变得越来越主流。这些材料现在整合得更好,我们认为这本书很好地反映了许多教师目前组织图形课程的方式。
The organization of this book remains substantially similar to the fourth edition. As we have revised this book over the years, we have endeavored to retain the informal, intuitive style of presentation that characterizes the earlier editions, while at the same time improving consistency, precision, and completeness. We hope the reader will find the result is an appealing platform for a variety of courses in computer graphics.
本书的组织结构与第四版基本相似。多年来,我们一直在修订本书,努力保留早期版本所特有的非正式、直观的呈现风格,同时提高一致性、准确性和完整性。我们希望读者会发现,本书是一个适用于各种计算机图形学课程的有吸引力的平台。
The cover image is from Tiger in the Water by J. W. Baker (brushed and air-brushed acrylic on canvas, 16” by 20”, www.jwbart.com).
封面图片来自 JW Baker 的《水中之虎》 (画布上刷涂和喷绘丙烯颜料,16” x 20”, www.jwbart.com )。
The subject of a tiger is a reference to a wonderful talk given by Alain Fournier (1943–2000) at a workshop at Cornell University in 1998. His talk was an evocative verbal description of the movements of a tiger. He summarized his point:
老虎这个主题指的是阿兰·福尼尔 (Alain Fournier,1943-2000) 于 1998 年在康奈尔大学的一个研讨会上发表的精彩演讲。他的演讲生动地描述了老虎的动作。他总结了自己的观点:
Even though modelling and rendering in computer graphics have been improved tremendously in the past 35 years, we are still not at the point where we can model automatically a tiger swimming in the river in all its glorious details. By automatically I mean in a way that does not need careful manual tweaking by an artist/expert.
尽管在过去 35 年中,计算机图形的建模和渲染技术已经取得了巨大进步,但我们仍未达到能够自动建模一只在河中游泳的老虎并展现其所有精彩细节的程度。所谓自动,是指不需要艺术家/专家进行仔细的手动调整。
The bad news is that we have still a long way to go.
坏消息是我们还有很长的路要走。
The good news is that we have still a long way to go.
好消息是我们还有很长的路要走。
The website for this book is http://www.cs.cornell.edu/~srm/fcg5/. We will continue to maintain a list of errata and links to courses that use the book, as well as teaching materials that match the book’s style. Most of the figures in this book are in Adobe Illustrator format, and we would be happy to convert specific figures into portable formats on request. Please feel free to contact us at srm@cs.cornell.edu or ptrshrl@gmail.com.
本书的网站是http://www.cs.cornell.edu/~srm/fcg5/ 。我们将继续维护勘误表和使用本书的课程链接,以及与本书风格相匹配的教学材料。本书中的大部分图片都是 Adobe Illustrator 格式,我们很乐意根据要求将特定图片转换为可移植格式。请随时通过srm@cs.cornell.edu或ptrshrl@gmail.com与我们联系。
The following people have provided helpful information, comments, or feedback about the various editions of this book: Ahmet Oğuz Akyüz, Josh Andersen, Beatriz Trinchãao Andrade Zeferino Andrade, Bagossy Attila, Kavita Bala, Mick Beaver, Robert Belleman, Adam Berger, Adeel Bhutta, Solomon Boulos, Stephen Chenney, Michael Coblenz, Greg Coombe, Frederic Cremer, Brian Curtin, Dave Edwards, Jonathon Evans, Karen Feinauer, Claude Fuhrer, Yotam Gingold, Amy Gooch, Eungyoung Han, Chuck Hansen, Andy Hanson, Razen Al Harbi, Dave Hart, John Hart, Yong Huang, John “Spike” Hughes, Helen Hu, Vicki Interrante, Wenzel Jakob, Doug James, Henrik Wann Jensen, Shi Jin, Mark Johnson, Ray Jones, Revant Kapoor, Kristin Kerr, Erum Arif Khan, Mark Kilgard, Fangjun Kuang, Dylan Lacewell, Mathias Lang, Philippe Laval, Joshua Levine, Marc Levoy, Howard Lo, Joann Luu, Mauricio Maurer, Andrew Medlin, Ron Metoyer, Keith Morley, Eric Mortensen, Koji Nakamaru, Micah Neilson, Blake Nelson, Michael Nikelsky, James O’Brien, Hongshu Pan , Steve Parker, Sumanta Pattanaik, Matt Pharr, Ken Phillis Jr, Nicolò Pinciroli, Peter Poulos, Shaun Ramsey, Rich Riesenfeld, Nate Robins, Nan Schaller, Chris Schryvers, Tom Sederberg, Richard Sharp, Sarah Shirley, Peter-Pike Sloan, Hannah Story, Tony Tahbaz, Jan-Phillip Tiesel, Bruce Walter, Alex Williams, Amy Williams, Chris Wyman, Kate Zebrose, and Angela Zhang.
以下人员提供了有关本书各个版本的有用信息、评论或反馈:Ahmet Oğuz Akyüz、Josh Andersen、Beatriz Trinchãao Andrade Zeferino Andrade、Bagossy Attila、Kavita Bala、Mick Beaver、Robert Belleman、Adam Berger、Adeel Bhutta、Solomon Boulos、Stephen Chenney、Michael Coblenz、Greg Coombe、Frederic Cremer、Brian Curtin、Dave Edwards、Jonathon Evans、Karen Feinauer、Claude Fuhrer、Yotam Gingold、Amy Gooch、Eungyoung Han、Chuck Hansen、Andy Hanson、Razen Al Harbi、Dave Hart、John Hart、Yong Huang、John “Spike” Hughes、Helen Hu、Vicki Interrante、Wenzel Jakob、Doug James、Henrik Wann Jensen、Shi Jin、Mark Johnson、Ray Jones、Revant Kapoor、Kristin Kerr、 Erum Arif Khan、Mark Kilgard、Fangjun Kuang、Dylan Lacewell、Mathias Lang、Philippe Laval、Joshua Levine、Marc Levoy、Howard Lo、Joann Luu、Mauricio Maurer、Andrew Medlin、Ron Metoyer、Keith Morley、Eric Mortensen、Koji Nakamaru、Micah Neilson、Blake Nelson、Michael Nikelsky、James O'Brien、Hongshu Pan、Steve Parker、Sumanta Pattanaik、Matt Pharr、Ken Phillis Jr、Nicolò Pinciroli、Peter Poulos、Shaun Ramsey、Rich Riesenfeld、Nate Robins、Nan Schaller、Chris Schryvers、Tom Sederberg、Richard Sharp、Sarah Shirley、Peter-Pike Sloan、Hannah Story、Tony Tahbaz、Jan-Phillip Tiesel、Bruce Walter、Alex Williams、Amy Williams、Chris Wyman、Kate Zebrose 和 Angela张.
Ching-Kuang Shene and David Solomon allowed us to borrow their examples. Henrik Wann Jensen, Eric Levin, Matt Pharr, and Jason Waltman generously provided images. Brandon Mansfield helped improve the discussion of hierarchical bounding volumes for ray tracing. Philip Greenspun (philip.greenspun.com) kindly allowed us to use his photographs. John “Spike” Hughes helped improve the discussion of sampling theory. Wenzel Jakob’s Mitsuba renderer was invaluable in creating many figures. We are extremely thankful to J. W. Baker for helping create the cover Pete envisioned. In addition to being a talented artist, he was a great pleasure to work with personally.
Ching-Kuang Shene 和 David Solomon 允许我们借用他们的示例。Henrik Wann Jensen、Eric Levin、Matt Pharr 和 Jason Waltman 慷慨地提供了图片。Brandon Mansfield 帮助改进了光线追踪的分层边界体积的讨论。Philip Greenspun ( philip.greenspun.com ) 慷慨地允许我们使用他的照片。John “Spike” Hughes 帮助改进了采样理论的讨论。Wenzel Jakob 的Mitsuba渲染器在创建许多图形时发挥了重要作用。我们非常感谢 JW Baker 帮助创建 Pete 设想的封面。他不仅是一位才华横溢的艺术家,与他共事也是一件非常愉快的事。
Many works that were helpful in preparing this book are cited in the chapter notes. However, a few key texts that influenced the content and presentation deserve special recognition here. These include the two classic computer graphics texts from which we both learned the basics: Computer Graphics: Principles & Practice (Foley, Van Dam, Feiner, & Hughes, 1990) and Computer Graphics (Hearn & Baker, 1986). Other texts include both of Alan Watt’s influential books (Watt, 1993, 1991), Hill’s Computer Graphics Using OpenGL (Francis S. Hill, 2000), Angel’s Interactive Computer Graphics: A Top-Down Approach Using OpenGL (Angel, 2002), Hugues Hoppe’s University of Washington dissertation (Hoppe, 1994), and Rogers’ two excellent graphics texts (Rogers, 1985, 1989).
章节注释中引用了许多对本书编写有帮助的著作。但是,一些影响内容和呈现的关键文本值得在此特别提及。其中包括两本经典的计算机图形学文本,我们都从中学习了基础知识: 《计算机图形学:原理与实践》 (Foley、Van Dam、Feiner 和 Hughes,1990 年)和《计算机图形学》 (Hearn 和 Baker,1986 年)。其他文本包括 Alan Watt 的两本有影响力的书籍(Watt,1993 年、1991 年)、Hill 的《使用 OpenGL 的计算机图形学》 (Francis S. Hill,2000 年)、Angel 的《交互式计算机图形学:使用 OpenGL 的自上而下方法》 (Angel,2002 年)、Hugues Hoppe 的华盛顿大学论文(Hoppe,1994 年)和 Rogers 的两本优秀图形学文本(Rogers,1985 年、1989 年)。
We would like to especially thank Alice and Klaus Peters for encouraging Pete to write the first edition of this book and for their great skill in bringing a book to fruition. Their patience with the authors and their dedication to making their books the best they can be has been instrumental in making this book what it is. This book certainly would not exist without their extraordinary efforts.
我们特别要感谢 Alice 和 Klaus Peters 鼓励 Pete 撰写本书的第一版,以及他们使这本书得以完成的出色技巧。他们对作者的耐心以及他们致力于将书做到极致的努力,对本书的问世起到了至关重要的作用。如果没有他们的非凡努力,这本书肯定不会存在。
Steve Marschner, Ithaca, NY
纽约州伊萨卡市的 Steve Marschner
Peter Shirley, Salt Lake City, UT
犹他州盐湖城的 Peter Shirley
February 2021
2021 年 2 月
Steve Marschner is a Professor of Computer Science at Cornell University. He obtained his Sc.B. from Brown University in 1993 and his Ph.D. from Cornell in 1998. He held research positions at Microsoft Research and Stanford University before joining Cornell in 2002. He is recipient of the SIGGRAPH Computer Graphics Achievement Award in 2015 and co-recipient of a 2003 Technical Academy Award.
Steve Marschner是康奈尔大学的计算机科学教授。他于 1993 年获得布朗大学理学学士学位,并于 1998 年获得康奈尔大学博士学位。在 2002 年加入康奈尔大学之前,他曾在微软研究院和斯坦福大学担任研究职位。他是 2015 年 SIGGRAPH 计算机图形成就奖的获得者,也是 2003 年技术学院奖的共同获得者。
Peter Shirley is a Distinguished Research Scientist at NVIDIA. He held academic positions at Indiana University, Cornell University, and the University of Utah. He obtained a B.A. in Physics from Reed College in 1985 and a Ph.D. in Computer Science from University of Illinois in 1991.
Peter Shirley是 NVIDIA 的杰出研究科学家。他曾在印第安纳大学、康奈尔大学和犹他大学担任学术职务。他于 1985 年获得里德学院物理学学士学位,并于 1991 年获得伊利诺伊大学计算机科学博士学位。
The term computer graphics describes any use of computers to create and manipulate images. This book introduces the algorithmic and mathematical tools that can be used to create all kinds of images—realistic visual effects, informative technical illustrations, or beautiful computer animations. Graphics can be two- or three-dimensional; images can be completely synthetic or can be produced by manipulating photographs. This book is about the fundamental algorithms and mathematics, especially those used to produce synthetic images of three-dimensional objects and scenes.
计算机图形学这一术语描述了使用计算机创建和处理图像的任何方式。本书介绍了可用于创建各种图像(逼真的视觉效果、信息丰富的技术插图或精美的计算机动画)的算法和数学工具。图形可以是二维或三维的;图像可以是完全合成的,也可以是通过处理照片生成的。本书介绍了基本算法和数学,尤其是用于生成三维物体和场景的合成图像的算法和数学。
Actually doing computer graphics inevitably requires knowing about specific hardware, file formats, and usually a graphics API (see Section 1.3) or two. Computer graphics is a rapidly evolving field, so the specifics of that knowledge are a moving target. Therefore, in this book we do our best to avoid depending on any specific hardware or API. Readers are encouraged to supplement the text with relevant documentation for their software and hardware environment. Fortunately, the culture of computer graphics has enough standard terminology and concepts that the discussion in this book should map nicely to most environments.
实际进行计算机图形学时,不可避免地需要了解特定的硬件、文件格式,通常还需要了解一个或两个图形 API(参见第 1.3 节)。计算机图形学是一个快速发展的领域,因此这些知识的具体内容是一个不断变化的目标。因此,在本书中,我们尽力避免依赖任何特定的硬件或 API。鼓励读者用与他们的软件和硬件环境相关的文档来补充文本。幸运的是,计算机图形学文化有足够多的标准术语和概念,本书中的讨论应该可以很好地映射到大多数环境中。
This chapter defines some basic terminology and provides some historical background, as well as information sources related to computer graphics.
本章定义了一些基本术语并提供了一些历史背景以及与计算机图形学相关的信息来源。
Imposing categories on any field is dangerous, but most graphics practitioners would agree on the following major areas of computer graphics:
对任何领域强加分类都是危险的,但大多数图形学从业者都同意计算机图形学有以下主要领域:
Modeling deals with the mathematical specification of shape and appearance properties in a way that can be stored on the computer. For example, a coffee mug might be described as a set of ordered 3D points along with some interpolation rule to connect the points and a reflection model that describes how light interacts with the mug.
建模以可以存储在计算机上的方式处理形状和外观属性的数学规范。例如,咖啡杯可能被描述为一组有序的 3D 点以及一些连接点的插值规则和一个描述光如何与杯子相互作用的反射模型。
Rendering is a term inherited from art and deals with the creation of shaded images from 3D computer models.
渲染是一个源自艺术的术语,涉及从 3D 计算机模型创建阴影图像。
Animation is a technique to create an illusion of motion through sequences of images. Animation uses modeling and rendering but adds the key issue of movement over time, which is not usually dealt with in basic modeling and rendering.
动画是一种通过一系列图像来创造运动幻觉的技术。动画使用建模和渲染,但增加了随时间推移的运动这一关键问题,而这在基本建模和渲染中通常不会得到处理。
There are many other areas that involve computer graphics, and whether they are core graphics areas is a matter of opinion. These will all be at least touched on in the text. Such related areas include the following:
还有许多其他领域涉及计算机图形学,至于它们是否是核心图形学领域,则见仁见智。本文至少会涉及这些领域。这些相关领域包括:
User interaction deals with the interface between input devices such as mice and tablets, the application, feedback to the user in imagery, and other sensory feedback. Historically, this area is associated with graphics largely because graphics researchers had some of the earliest access to the input/output devices that are now ubiquitous.
用户交互涉及鼠标和平板电脑等输入设备之间的接口、应用程序、以图像形式向用户反馈以及其他感官反馈。从历史上看,该领域与图形学密切相关,主要是因为图形学研究人员最早接触到如今无处不在的输入/输出设备。
Virtual reality attempts to immerse the user into a 3D virtual world. This typically requires at least stereo graphics and response to head motion. For true virtual reality, sound and force feedback should be provided as well. Because this area requires advanced 3D graphics and advanced display technology, it is often closely associated with graphics.
虚拟现实试图让用户沉浸在 3D 虚拟世界中。这通常至少需要立体图形和对头部运动的响应。对于真正的虚拟现实,还应提供声音和力反馈。由于该领域需要先进的 3D 图形和先进的显示技术,因此它通常与图形密切相关。
Visualization attempts to give users insight into complex information via visual display. Often, there are graphic issues to be addressed in a visualization problem.
可视化试图通过视觉展示让用户洞察复杂信息。可视化问题中通常需要解决图形问题。
Image processing deals with the manipulation of 2D images and is used in both the fields of graphics and vision.
图像处理涉及二维图像的操作,并用于图形和视觉领域。
Three-dimensional scanning uses range-finding technology to create measured 3D models. Such models are useful for creating rich visual imagery, and the processing of such models often requires graphics algorithms.
三维扫描使用测距技术来创建测量的 3D 模型。此类模型可用于创建丰富的视觉图像,而此类模型的处理通常需要图形算法。
Computational photography is the use of computer graphics, computer vision, and image processing methods to enable new ways of photographically capturing objects, scenes, and environments.
计算摄影是利用计算机图形学、计算机视觉和图像处理方法来实现以新的方式捕捉物体、场景和环境。
Almost any endeavor can make some use of computer graphics, but the major consumers of computer graphics technology include the following industries:
几乎任何事业都可以利用计算机图形学,但计算机图形技术的主要消费者包括以下行业:
Video games increasingly use sophisticated 3D models and rendering algorithms.
视频游戏越来越多地使用复杂的 3D 模型和渲染算法。
Cartoons are often rendered directly from 3D models. Many traditional 2D cartoons use backgrounds rendered from 3D models, which allow a continuously moving viewpoint without huge amounts of artist time.
动画片通常直接根据 3D 模型进行渲染。许多传统的 2D 动画片使用根据 3D 模型渲染的背景,这样无需花费大量艺术家时间即可实现连续移动的视点。
Visual effects use almost all types of computer graphics technology. Almost every modern film uses digital compositing to superimpose backgrounds with separately filmed foregrounds. Many films also use 3D modeling and animation to create synthetic environments, objects, and even characters that most viewers will never suspect are not real.
视觉效果几乎使用了所有类型的计算机图形技术。几乎每部现代电影都使用数字合成将背景与单独拍摄的前景叠加在一起。许多电影还使用 3D 建模和动画来创建合成环境、物体,甚至大多数观众永远不会怀疑它们不是真实的人物。
Animated films use many of the same techniques that are used for visual effects, but without necessarily aiming for images that look real.
动画电影使用了许多与视觉效果相同的技术,但并不一定以看起来真实的图像为目标。
CAD/CAM stands for computer-aided design and computer-aided manufacturing. These fields use computer technology to design parts and products on the computer and then, using these virtual designs, to guide the manufacturing process. For example, many mechanical parts are designed in a 3D computer modeling package and then automatically produced on a computer-controlled milling device.
CAD/CAM代表计算机辅助设计和计算机辅助制造。这些领域使用计算机技术在计算机上设计零件和产品,然后使用这些虚拟设计来指导制造过程。例如,许多机械零件是在 3D 计算机建模程序中设计的,然后在计算机控制的铣削设备上自动生产。
Simulation can be thought of as accurate video gaming. For example, a flight simulator uses sophisticated 3D graphics to simulate the experience of flying an airplane. Such simulations can be extremely useful for initial training in safety-critical domains such as driving, and for scenario training for experienced users such as specific fire-fighting situations that are too costly or dangerous to create physically.
模拟可以看作是精确的视频游戏。例如,飞行模拟器使用复杂的 3D 图形来模拟驾驶飞机的体验。这种模拟对于安全关键领域(例如驾驶)的初始培训以及经验丰富的用户的情景培训(例如无法实际创建的成本过高或危险的特定消防情况)非常有用。
Medical imaging creates meaningful images of scanned patient data. For example, a computed tomography (CT) dataset is composed of a large 3D rectangular array of density values. Computer graphics is used to create shaded images that help doctors extract the most salient information from such data.
医学成像可为扫描的患者数据创建有意义的图像。例如,计算机断层扫描 (CT) 数据集由大量密度值的 3D 矩形阵列组成。计算机图形学用于创建阴影图像,帮助医生从此类数据中提取最显著的信息。
Information visualization creates images of data that do not necessarily have a “natural” visual depiction. For example, the temporal trend of the price of ten different stocks does not have an obvious visual depiction, but clever graphing techniques can help humans see the patterns in such data.
信息可视化可以创建不一定具有“自然”视觉描绘的数据图像。例如,十支不同股票的价格时间趋势没有明显的视觉描绘,但巧妙的制图技术可以帮助人们看到此类数据中的模式。
A key part of using graphics libraries is dealing with a graphics API. An application program interface (API) is a standard collection of functions to perform a set of related operations, and a graphics API is a set of functions that perform basic operations such as drawing images and 3D surfaces into windows on the screen.
使用图形库的一个关键部分是处理图形 API 。应用程序接口(API) 是执行一组相关操作的标准函数集合,图形 API 是一组执行基本操作(例如在屏幕上的窗口中绘制图像和 3D 表面)的函数。
Every graphics program needs to be able to use two related APIs: a graphics API for visual output and a user-interface API to get input from the user. There are currently two dominant paradigms for graphics and user-interface APIs. The first is the integrated approach, exemplified by Java, where the graphics and user-interface toolkits are integrated and portable packages that are fully standardized and supported as part of the language. The second is represented by Direct3D and OpenGL, where the drawing commands are part of a software library tied to a language such as C++, and the user-interface software is an independent entity that might vary from system to system. In this latter approach, it is problematic to write portable code, although for simple programs, it may be possible to use a portable library layer to encapsulate the system specific user-interface code.
每个图形程序都需要能够使用两个相关的 API:用于视觉输出的图形 API 和用于从用户那里获取输入的用户界面 API。目前,图形和用户界面 API 有两种主流范式。第一种是集成方法,以 Java 为例,其中图形和用户界面工具包是集成的、可移植的软件包,它们完全标准化并作为语言的一部分得到支持。第二种以 Direct3D 和 OpenGL 为代表,其中绘图命令是与 C++ 等语言绑定的软件库的一部分,而用户界面软件是一个独立的实体,可能因系统而异。在后一种方法中,编写可移植代码是有问题的,尽管对于简单的程序,可以使用可移植库层来封装特定于系统的用户界面代码。
Whatever your choice of API, the basic graphics calls will be largely the same, and the concepts of this book will apply.
无论您选择哪种 API,基本的图形调用都大致相同,并且本书的概念也将适用。
Every desktop computer today has a powerful 3D graphics pipeline. This is a special software/hardware subsystem that efficiently draws 3D primitives in perspective. Usually, these systems are optimized for processing 3D triangles with shared vertices. The basic operations in the pipeline map the 3D vertex locations to 2D screen positions and shade the triangles so that they both look realistic and appear in proper back-to-front order.
如今,每台台式计算机都拥有强大的 3D图形管道。这是一种特殊的软件/硬件子系统,可以高效地绘制透视 3D 图元。通常,这些系统针对处理具有共享顶点的 3D 三角形进行了优化。管道中的基本操作将 3D 顶点位置映射到 2D 屏幕位置并对三角形进行着色,使它们看起来逼真并以正确的从后到前的顺序显示。
Although drawing the triangles in valid back-to-front order was once the most important research issue in computer graphics, it is now almost always solved using the z-buffer, which uses a special memory buffer to solve the problem in a brute-force manner.
尽管以有效的从后到前的顺序绘制三角形曾经是计算机图形学中最重要的研究问题,但现在几乎总是使用 z缓冲区来解决,它使用特殊的内存缓冲区以强力的方式解决问题。
It turns out that the geometric manipulation used in the graphics pipeline can be accomplished almost entirely in a 4D coordinate space composed of three traditional geometric coordinates and a fourth homogeneous coordinate that helps with perspective viewing. These 4D coordinates are manipulated using 4 × 4 matrices and 4-vectors. The graphics pipeline, therefore, contains much machinery for efficiently processing and composing such matrices and vectors. This 4D coordinate system is one of the most subtle and beautiful constructs used in computer science, and it is certainly the biggest intellectual hurdle to jump when learning computer graphics. A big chunk of the first part of every graphics book deals with these coordinates.
事实证明,图形管道中使用的几何操作几乎完全可以在由三个传统几何坐标和第四个有助于透视观察的齐次坐标组成的 4D 坐标空间中完成。这些 4D 坐标使用 4 × 4 矩阵和 4 向量进行操作。因此,图形管道包含许多用于高效处理和组合此类矩阵和向量的机制。这个 4D 坐标系统是计算机科学中最精妙、最美丽的构造之一,它无疑是学习计算机图形学时最大的智力障碍。每本图形学书籍的第一部分都大量涉及这些坐标。
The speed at which images can be generated depends strongly on the number of triangles being drawn. Because interactivity is more important in many applications than visual quality, it is worthwhile to minimize the number of triangles used to represent a model. In addition, if the model is viewed in the distance, fewer triangles are needed than when the model is viewed from a closer distance. This suggests that it is useful to represent a model with a varying level of detail (LOD).
图像生成速度在很大程度上取决于绘制的三角形数量。由于交互性在许多应用中比视觉质量更重要,因此有必要尽量减少用于表示模型的三角形数量。此外,如果在远处查看模型,则所需的三角形数量要比从近处查看模型时少。这表明使用不同的细节级别(LOD) 来表示模型很有用。
Many graphics programs are really just 3D numerical codes. Numerical issues are often crucial in such programs. In the “old days,” it was very difficult to handle such issues in a robust and portable manner because machines had different internal representations for numbers, and even worse, handled exceptions in different and incompatible ways. Fortunately, almost all modern computers conform to the IEEE floating-point standard (IEEE Standards Association, 1985). This allows the programmer to make many convenient assumptions about how certain numeric conditions will be handled.
许多图形程序实际上只是 3D 数字代码。数字问题在此类程序中通常至关重要。在“过去”,很难以稳健且可移植的方式处理此类问题,因为机器对数字有不同的内部表示,更糟糕的是,以不同且不兼容的方式处理异常。幸运的是,几乎所有现代计算机都符合IEEE 浮点标准(IEEE 标准协会,1985 年)。这允许程序员对如何处理某些数字条件做出许多方便的假设。
Although IEEE floating-point has many features that are valuable when coding numeric algorithms, there are only a few that are crucial to know for most situations encountered in graphics. First, and most important, is to understand that there are three “special” values for real numbers in IEEE floating-point:
尽管 IEEE 浮点数具有许多在编码数值算法时非常有用的特性,但对于图形中遇到的大多数情况来说,只有少数特性是必须了解的。首先,也是最重要的,是了解 IEEE 浮点数中实数有三个“特殊”值:
Infinity (∞). This is a valid number that is larger than all other valid numbers.
无穷大 (∞)。这是大于所有其他有效数字的有效数字。
Minus infinity (–∞). This is a valid number that is smaller than all other valid numbers.
负无穷 (-∞)。这是小于所有其他有效数字的有效数字。
Not a number (NaN). This is an invalid number that arises from an operation with undefined consequences, such as zero divided by zero.
非数字 (NaN)。这是由结果未定义的运算(例如零除以零)产生的无效数字。
The designers of IEEE floating-point made some decisions that are extremely convenient for programmers. Many of these relate to the three special values above in handling exceptions such as division by zero. In these cases, an exception is logged, but in many cases, the programmer can ignore that. Specifically, for any positive real number a, the following rules involving division by infinite values hold
IEEE 浮点数的设计者做出了一些对程序员极为方便的决定。其中许多与上述三个特殊值有关,用于处理除以零等异常。在这些情况下,会记录异常,但在许多情况下,程序员可以忽略它。具体来说,对于任何正实数a ,以下涉及除以无穷值的规则成立
Other operations involving infinite values behave the way one would expect. Again for positive a, the behavior is as follows:
涉及无限值的其他运算的行为与预期一致。同样,对于正a ,行为如下:
The rules in a Boolean expression involving infinite values are as expected:
涉及无限值的布尔表达式中的规则与预期一致:
All finite valid numbers are less than +∞.
所有有限有效数都小于+∞。
All finite valid numbers are greater than –∞.
所有有限有效数都大于 –∞。
–∞ is less than +∞.
–∞ 小于 +∞。
The rules involving expressions that have NaN values are simple:
涉及具有 NaN 值的表达式的规则很简单:
Any arithmetic expression that includes NaN results in NaN.
任何包含 NaN 的算术表达式都会产生 NaN。
Any Boolean expression involving NaN is false.
任何涉及 NaN 的布尔表达式都为假。
Perhaps the most useful aspect of IEEE floating-point is how divide-by-zero is handled; for any positive real number a, the following rules involving division by zero values hold
IEEE 浮点数最有用的方面可能是如何处理除以零的情况;对于任何正实数a ,以下涉及除以零值的规则成立
There are many numeric computations that become much simpler if the programmer takes advantage of the IEEE rules. For example, consider the expression:
如果程序员利用 IEEE 规则,许多数值计算就会变得简单得多。例如,考虑以下表达式:
Such expressions arise with resistors and lenses. If divide-by-zero resulted in a program crash (as was true in many systems before IEEE floating-point), then two if statements would be required to check for small or zero values of b or c. Instead, with IEEE floating-point, if b or c is zero, we will get a zero value for a as desired. Another common technique to avoid special checks is to take advantage of the Boolean properties of NaN. Consider the following code segment:
电阻和透镜中会出现这样的表达式。如果除以零导致程序崩溃(在 IEEE 浮点之前的许多系统中都是如此),则需要两个if语句来检查b或c是否为小值或零值。相反,对于 IEEE 浮点,如果b或c为零,我们将获得所需的a 的零值。避免特殊检查的另一种常用方法是利用 NaN 的布尔属性。请考虑以下代码段:
a = f(x) if (a > 0) then do something
Here, the function f may return “ugly” values such as ∞ or NaN, but the if condition is still well-defined: it is false for a = NaN or a = –∞ and true for a = +∞. With care in deciding which values are returned, often the if can make the right choice, with no special checks needed. This makes programs smaller, more robust, and more efficient.
这里,函数f可能会返回“丑陋”的值,例如 ∞ 或 NaN,但if条件仍然定义明确:对于a = NaN 或a = –∞,它为假;对于a = +∞,它为真。如果谨慎决定返回哪些值, if通常可以做出正确的选择,而无需进行特殊检查。这使得程序更小、更强大、更高效。
There are no magic rules for making code more efficient. Efficiency is achieved through careful tradeoffs, and these tradeoffs are different for different architectures. However, for the foreseeable future, a good heuristic is that programmers should pay more attention to memory access patterns than to operation counts. This is the opposite of the best heuristic of two decades ago. This switch has occurred because the speed of memory has not kept pace with the speed of processors. Since that trend continues, the importance of limited and coherent memory access for optimization should only increase.
没有什么神奇的规则可以让代码更高效。效率是通过谨慎的权衡来实现的,而这些权衡对于不同的架构是不同的。然而,在可预见的未来,一个好的启发式方法是程序员应该更多地关注内存访问模式而不是操作计数。这与二十年前最好的启发式方法正好相反。这种转变之所以发生,是因为内存的速度没有跟上处理器的速度。由于这种趋势持续下去,有限且一致的内存访问对于优化的重要性只会增加。
A reasonable approach to making code fast is to proceed in the following order, taking only those steps which are needed:
提高代码速度的合理方法是按以下顺序进行,仅采取必要的步骤:
Write the code in the most straightforward way possible. Compute intermediate results as needed on the fly rather than storing them.
以最直接的方式编写代码。根据需要实时计算中间结果,而不是存储它们。
Compile in optimized mode.
以优化模式编译。
Use whatever profiling tools exist to find critical bottlenecks.
使用任何现有的分析工具来查找关键瓶颈。
Examine data structures to look for ways to improve locality. If possible, make data unit sizes match the cache/page size on the target architecture.
检查数据结构以寻找改善局部性的方法。如果可能,使数据单元大小与目标架构上的缓存/页面大小相匹配。
If profiling reveals bottlenecks in numeric computations, examine the assembly code generated by the compiler for missed efficiencies. Rewrite source code to solve any problems you find.
如果分析发现数值计算存在瓶颈,请检查编译器生成的汇编代码,以查找效率缺失。重写源代码以解决您发现的任何问题。
The most important of these steps is the first one. Most “optimizations” make the code harder to read without speeding things up. In addition, time spent upfront optimizing code is usually better spent correcting bugs or adding features. Also, beware of suggestions from old texts; some classic tricks such as using integers instead of reals may no longer yield speed because modern CPUs can usually perform floating-point operations just as fast as they perform integer operations. In all situations, profiling is needed to be sure of the merit of any optimization for a specific machine and compiler.
这些步骤中最重要的是第一步。大多数“优化”都会使代码更难阅读,而不会加快速度。此外,花在前期优化代码上的时间通常比花在纠正错误或添加功能上的时间更好。另外,要小心旧文本中的建议;一些经典技巧(例如使用整数代替实数)可能不再能提高速度,因为现代 CPU 执行浮点运算的速度通常与执行整数运算一样快。在所有情况下,都需要进行分析以确保针对特定机器和编译器的任何优化的优点。
Certain common strategies are often useful in graphics programming. In this section, we provide some advice that you may find helpful as you implement the methods you learn about in this book.
某些常见策略在图形编程中通常很有用。在本节中,我们将提供一些建议,这些建议可能会对您在实施本书中学到的方法有所帮助。
A key part of any graphics program is to have good classes or routines for geometric entities such as vectors and matrices, as well as graphics entities such as RGB colors and images. These routines should be made as clean and efficient as possible. A universal design question is whether locations and displacements should be separate classes because they have different operations; e.g., a location multiplied by one-half makes no geometric sense while one-half of a displacement does (Goldman, 1985; DeRose, 1989). There is little agreement on this question, which can spur hours of heated debate among graphics practitioners, but for the sake of example, let’s assume we will not make the distinction.
任何图形程序的关键部分都是要有好的类或例程来处理几何实体(例如矢量和矩阵)以及图形实体(例如 RGB 颜色和图像)。这些例程应尽可能简洁高效。一个普遍的设计问题是位置和位移是否应该作为单独的类,因为它们具有不同的操作;例如,位置乘以一半没有几何意义,而位移的一半有几何意义(Goldman,1985;DeRose,1989)。这个问题几乎没有一致意见,这可能会在图形从业者中引发数小时的激烈争论,但为了举例说明,我们假设我们不会做出区分。
This implies that some basic classes to be written include
这意味着要编写的一些基本类包括
vector2. A 2D vector class that stores an x- and y-component. It should store these components in a length-2 array so that an indexing operator can be well supported. You should also include operations for vector addition, vector subtraction, dot product, cross product, scalar multiplication, and scalar division.
vector2。存储x和y分量的 2D 向量类。它应将这些分量存储在长度为 2 的数组中,以便能够很好地支持索引运算符。您还应包括向量加法、向量减法、点积、叉积、标量乘法和标量除法的运算。
vector3. A 3D vector class analogous to vector2.
vector3.类似于 vector2 的 3D 矢量类。
hvector. A homogeneous vector with four components (see Chapter 8).
hvector。具有四个分量的齐次向量(参见第 8 章)。
rgb. An RGB color that stores three components. You should also include operations for RGB addition, RGB subtraction, RGB multiplication, scalar multiplication, and scalar division.
rgb。存储三个分量的 RGB 颜色。您还应包括 RGB 加法、RGB 减法、RGB 乘法、标量乘法和标量除法的运算。
transform. A 4 × 4 matrix for transformations. You should include a matrix multiply and member functions to apply to locations, directions, and surface normal vectors. As shown in Chapter 7, these are all different.
变换。用于变换的 4 × 4 矩阵。您应该包含矩阵乘法和成员函数,以应用于位置、方向和表面法向量。如第 7 章所示,这些都是不同的。
image. A 2D array of RGB pixels with an output operation.
图像。具有输出操作的 RGB 像素的二维数组。
In addition, you might or might not want to add classes for intervals, orthonormal bases, and coordinate frames.
此外,您可能想要或不想添加区间、正交基和坐标框架的类。
Modern architecture suggests that keeping memory use down and maintaining coherent memory access are the keys to efficiency. This suggests using single-precision data. However, avoiding numerical problems suggests using double-precision arithmetic. The tradeoffs depend on the program, but it is nice to have a default in your class definitions.
现代架构表明,降低内存使用率和保持一致的内存访问是提高效率的关键。这意味着使用单精度数据。但是,避免数值问题意味着使用双精度算法。权衡取决于程序,但在类定义中有一个默认值是很好的。
If you ask around, you may find that as programmers become more experienced, they use traditional debuggers less and less. One reason for this is that using such debuggers is more awkward for complex programs than for simple programs. Another reason is that the most difficult errors are conceptual ones where the wrong thing is being implemented, and it is easy to waste large amounts of time stepping through variable values without detecting such cases. We have found several debugging strategies to be particularly useful in graphics.
如果你四处打听,你可能会发现,随着程序员经验的增加,他们使用传统调试器的次数越来越少。其中一个原因是,使用这种调试器调试复杂程序比调试简单程序更麻烦。另一个原因是,最难发现的错误是概念性错误,即执行了错误的事情,很容易浪费大量时间单步执行变量值而没有发现这种情况。我们发现几种调试策略在图形方面特别有用。
In graphics programs, there is an alternative to traditional debugging that is often very useful. The downside to it is that it is very similar to what computer programmers are taught not to do early in their careers, so you may feel “naughty” if you do it: we create an image and observe what is wrong with it. Then, we develop a hypothesis about what is causing the problem and test it. For example, in a ray-tracing program we might have many somewhat random looking dark pixels. This is the classic “shadow acne” problem that most people run into when they write a ray tracer. Traditional debugging is not helpful here; instead, we must realize that the shadow rays are hitting the surface being shaded. We might notice that the color of the dark spots is the ambient color, so the direct lighting is what is missing. Direct lighting can be turned off in shadow, so you might hypothesize that these points are incorrectly being tagged as in shadow when they are not. To test this hypothesis, we could turn off the shadowing check and recompile. This would indicate that these are false shadow tests, and we could continue our detective work. The key reason that this method can sometimes be good practice is that we never had to spot a false value or really determine our conceptual error. Instead, we just narrowed in on our conceptual error experimentally. Typically, only a few trials are needed to track things down, and this type of debugging is enjoyable.
在图形程序中,有一种替代传统调试的方法,通常非常有用。它的缺点是,它与计算机程序员在职业生涯早期被教导不要做的事情非常相似,所以如果你这样做,你可能会觉得“不听话”:我们创建一个图像并观察它有什么问题。然后,我们提出一个关于导致问题的原因的假设并测试它。例如,在光线追踪程序中,我们可能有许多看起来有些随机的暗像素。这是大多数人在编写光线追踪器时遇到的经典“阴影痤疮”问题。传统的调试在这里没有用;相反,我们必须意识到阴影光线正在照射被阴影的表面。我们可能会注意到黑点的颜色是环境光的颜色,所以缺少的是直接照明。在阴影中可以关闭直接照明,因此您可能会假设这些点被错误地标记为在阴影中,但实际上它们不在。为了测试这个假设,我们可以关闭阴影检查并重新编译。这表明这些是虚假的影子测试,我们可以继续我们的侦查工作。这种方法有时可以成为良好实践的关键原因是我们从来不需要发现错误值或真正确定我们的概念错误。相反,我们只是通过实验缩小了我们的概念错误范围。通常,只需要几次试验就可以追踪到问题,这种调试方式很有趣。
In many cases, the easiest channel by which to get debugging information out of a graphics program is the output image itself. If you want to know the value of some variable for part of a computation that runs for every pixel, you can just modify your program temporarily to copy that value directly to the output image and skip the rest of the calculations that would normally be done. For instance, if you suspect a problem with surface normals is causing a problem with shading, you can copy the normal vectors directly to the image (x goes to red, y goes to green, z goes to blue), resulting in a color-coded illustration of the vectors actually being used in your computation. Or, if you suspect a particular value is sometimes out of its valid range, make your program write bright red pixels where that happens. Other common tricks include drawing the back sides of surfaces with an obvious color (when they are not supposed to be visible), coloring the image by the ID numbers of the objects, or coloring pixels by the amount of work they took to compute.
在许多情况下,从图形程序中获取调试信息的最简单渠道是输出图像本身。如果您想知道针对每个像素运行的计算的一部分中某个变量的值,您可以临时修改程序以将该值直接复制到输出图像并跳过通常会进行的其余计算。例如,如果您怀疑表面法线的问题导致了阴影问题,您可以将法线向量直接复制到图像( x变为红色, y变为绿色, z变为蓝色),从而产生实际用于计算的向量的颜色编码说明。或者,如果您怀疑某个特定值有时超出其有效范围,请让您的程序在发生这种情况的地方写入亮红色像素。其他常见技巧包括用明显的颜色绘制表面的背面(当它们不应该可见时),根据对象的 ID 号为图像着色,或根据它们计算的工作量为像素着色。
There are still cases, particularly when the scientific method seems to have led to a contradiction, when there’s no substitute for observing exactly what is going on. The trouble is that graphics programs often involve many, many executions of the same code (once per pixel, for instance, or once per triangle), making it completely impractical to step through in the debugger from the start. And the most difficult bugs usually only occur for complicated inputs.
仍然有一些情况,特别是当科学方法似乎导致了矛盾时,当没有其他方法可以替代观察到底发生了什么时。问题在于图形程序通常涉及多次执行相同的代码(例如,每个像素一次,或每个三角形一次),这使得从一开始就逐步调试程序变得完全不切实际。而且最困难的错误通常只发生在复杂的输入中。
A useful approach is to “set a trap” for the bug. First, make sure your program is deterministic—run it in a single thread and make sure that all random numbers are computed from fixed seeds. Then, find out which pixel or triangle is exhibiting the bug and add a statement before the code you suspect is incorrect that will be executed only for the suspect case. For instance, if you find that pixel (126, 247) exhibits the bug, then add
一种有用的方法是“为错误设置陷阱”。首先,确保您的程序是确定性的——在单个线程中运行它,并确保所有随机数都是从固定种子计算出来的。然后,找出哪个像素或三角形出现了错误,并在您怀疑不正确的代码之前添加一个语句,该语句将仅针对可疑情况执行。例如,如果您发现像素 (126, 247) 出现了错误,则添加
if x = 126 and y = 247 then print “blarg!”
If you set a breakpoint on the print statement, you can drop into the debugger just before the pixel you’re interested in is computed. Some debuggers have a “conditional breakpoint” feature that can achieve the same thing without modifying the code.
如果您在打印语句上设置断点,则可以在计算您感兴趣的像素之前进入调试器。一些调试器具有“条件断点”功能,可以在不修改代码的情况下实现相同的功能。
In the cases where the program crashes, a traditional debugger is useful for pinpointing the site of the crash. You should then start backtracking in the program, using asserts and recompiles, to find where the program went wrong. These asserts should be left in the program for potential future bugs you will add. This again means the traditional step-through process is avoided, because that would not be adding the valuable asserts to your program.
在程序崩溃的情况下,传统的调试器可用于精确定位崩溃的位置。然后,您应该开始在程序中回溯,使用断言和重新编译来查找程序出错的位置。这些断言应该留在程序中,以防将来可能添加的错误。这再次意味着避免了传统的逐步执行过程,因为这不会为您的程序添加有价值的断言。
Often, it is hard to understand what your program is doing, because it computes a lot of intermediate results before it finally goes wrong. The situation is similar to a scientific experiment that measures a lot of data, and one solution is the same: make good plots and illustrations for yourself to understand what the data mean. For instance, in a ray tracer you might write code to visualize ray trees so you can see what paths contributed to a pixel, or in an image resampling routine you might make plots that show all the points where samples are being taken from the input. Time spent writing code to visualize your program’s internal state is also repaid in a better understanding of its behavior when it comes time to optimize it.
通常,很难理解程序在做什么,因为它在最终出错之前会计算大量中间结果。这种情况类似于测量大量数据的科学实验,解决方法是相同的:为自己制作好的图表和插图,以了解数据的含义。例如,在光线追踪器中,您可以编写代码来可视化光线树,以便查看哪些路径对像素有贡献,或者在图像重采样例程中,您可以制作图表来显示从输入中抽取样本的所有点。花费在编写代码以可视化程序内部状态上的时间也会在优化程序时更好地理解其行为。
The discussion of software engineering is influenced by the Effective C++ series (Meyers, 1995, 1997), the Extreme Programming movement (Beck & Andres, 2004), and The Practice of Programming (Kernighan & Pike, 1999). The discussion of experimental debugging is based on discussions with Steve Parker.
软件工程的讨论受到了Effective C++系列 (Meyers, 1995, 1997)、极限编程运动 (Beck & Andres, 2004) 和编程实践(Kernighan & Pike, 1999) 的影响。实验调试的讨论基于与 Steve Parker 的讨论。
There are a number of annual conferences related to computer graphics, including ACM SIGGRAPH and SIGGRAPH Asia, Graphics Interface, the Game Developers Conference (GDC), Eurographics, Pacific Graphics, High Performance Graphics, the Eurographics Symposium on Rendering, and IEEE VisWeek. These can be readily found by web searches on their names.
计算机图形学相关的年度会议有很多,包括 ACM SIGGRAPH 和 SIGGRAPH Asia、图形界面、游戏开发者大会 (GDC)、欧洲图形学会、太平洋图形学会、高性能图形学会、欧洲图形学会渲染研讨会和 IEEE VisWeek。只要在网络上搜索这些会议的名称,就能轻松找到它们。
Much of graphics is just translating math directly into code. The cleaner the math, the cleaner the resulting code; so much of this book concentrates on using just the right math for the job. This chapter reviews various tools from high school and college mathematics and is designed to be used more as a reference than as a tutorial. It may appear to be a hodge-podge of topics and indeed it is; each topic is chosen because it is a bit unusual in “standard” math curricula, because it is of central importance in graphics, or because it is not typically treated from a geometric standpoint. In addition to establishing a review of the notation used in this book, this chapter also emphasizes a few points that are sometimes skipped in the standard undergraduate curricula, such as barycentric coordinates on triangles. This chapter is not intended to be a rigorous treatment of the material; instead, intuition and geometric interpretation are emphasized. A discussion of linear algebra is deferred until Chapter 6 just before transformation matrices are discussed. Readers are encouraged to skim this chapter to familiarize themselves with the topics covered and to refer back to it as needed. The exercises at the end of this chapter may be useful in determining which topics need a refresher.
图形学的大部分内容就是将数学直接转换成代码。数学越清晰,生成的代码就越清晰;本书的大部分内容都集中在使用正确的数学来完成工作。本章回顾了高中和大学数学的各种工具,旨在作为参考而不是教程。它看起来似乎是一堆杂乱无章的主题,事实也是如此;每个主题之所以被选中,是因为它在“标准”数学课程中有点不寻常,因为它在图形学中至关重要,或者因为它通常不是从几何角度来处理的。除了回顾本书中使用的符号外,本章还强调了标准本科课程中有时会忽略的几点,例如三角形的重心坐标。本章并非旨在对材料进行严格的处理;相反,本章强调直觉和几何解释。线性代数的讨论被推迟到第 6 章,在讨论变换矩阵之前。建议读者浏览本章以熟悉所涵盖的主题,并在需要时参考。本章末尾的练习可能有助于确定哪些主题需要复习。
Mappings, also called functions, are basic to mathematics and programming. Like a function in a program, a mapping in math takes an argument of one type and maps it to (returns) an object of a particular type. In a program, we say “type”; in math, we would identify the set. When we have an object that is a member of a set, we use the ∈ symbol. For example,
映射,也称为函数,是数学和编程的基础。与程序中的函数一样,数学中的映射采用一种类型的参数并将其映射到(返回)特定类型的对象。在程序中,我们说“类型”;在数学中,我们会识别集合。当我们有一个作为集合成员的对象时,我们使用 ∈ 符号。例如,
a ∈ S,
a∈S ,
can be read “a is a member of set S.” Given any two sets A and B, we can create a third set by taking the Cartesian product of the two sets, denoted A × B. This set A × B is composed of all possible ordered pairs (a, b) where a ∈ A and b ∈ B. As a shorthand, we use the notation A2 to denote A × A. We can extend the Cartesian product to create a set of all possible ordered triples from three sets and so on for arbitrarily long ordered tuples from arbitrarily many sets.
可以读作“ a是集合S的成员”。给定任意两个集合A和B ,我们可以通过对这两个集合取笛卡尔积来创建第三个集合,记为A × B。这个集合A × B由所有可能的有序对 ( a, b ) 组成,其中a ∈ A且b ∈ B。为了简写,我们使用符号A 2来表示A × A。我们可以扩展笛卡尔积,从三个集合创建一组所有可能的有序三元组,对于任意多个集合中的任意长度的有序元组亦如此。
Common sets of interest include
常见的兴趣包括
ℝ—the real numbers;
ℝ —实数;
ℝ +—the nonnegative real numbers (includes zero);
ℝ + —非负实数(包括零);
ℝ 2—the ordered pairs in the real 2D plane;
ℝ 2 ——实数二维平面中的有序对;
ℝ n—the points in n-dimensional Cartesian space;
ℝ n ——n维笛卡尔空间中的点;
Z—the integers;
Z ——整数;
S2—the set of 3D points (points in ℝ3) on the unit sphere.
S 2 ——单位球面上的三维点集(ℝ 3中的点)。
Note that although S2 is composed of points embedded in three-dimensional space, it is on a surface that can be parameterized with two variables, so it can be thought of as a 2D set. Notation for mappings uses the arrow and a colon, for example,
请注意,虽然S 2由嵌入在三维空间中的点组成,但它位于可以用两个变量参数化的表面上,因此可以将其视为二维集合。映射的符号使用箭头和冒号,例如,
which you can read as “There is a function called f that takes a real number as input and maps it to an integer.” Here, the set that comes before the arrow is called the domain of the function, and the set on the right-hand side is called the target. Computer programmers might be more comfortable with the following equivalent language: “There is a function called f which has one real argument and returns an integer.” In other words, the set notation above is equivalent to the common programming notation:
你可以将其理解为“有一个名为f的函数,它以实数作为输入并将其映射到整数”。这里,箭头之前的集合称为函数的定义域,右侧的集合称为目标。计算机程序员可能更喜欢以下等效语言:“有一个名为f的函数,它有一个实数参数并返回一个整数。”换句话说,上面的集合符号等同于常见的编程符号:
So the colon-arrow notation can be thought of as a programming syntax. It’s that simple.
因此,冒号箭头符号可以被认为是一种编程语法。就这么简单。
The point f (a) is called the image of a, and the image of a set A (a subset of the domain) is the subset of the target that contains the images of all points in A. The image of the whole domain is called the range of the function.
点f ( a )称为a的像,集合A (定义域的子集)的像是指包含A中所有点的像的目标子集,整个定义域的像称为函数的值域。
If we have a function f : A ⟼ B, there may exist an inverse function f–1: B ⟼ A, which is defined by the rule f–1(b) = a where b = f (a) . This definition only works if every b ∈ B is an image of some point under f (i.e., the range equals the target) and if there is only one such point (i.e., there is only one a for which f (a) = b). Such mappings or functions are called bijections. A bijection maps every a ∈ A to a unique b ∈ B, and for every b ∈ B, there is exactly one a ∈ A such that f (a) = b (Figure 2.1). A bijection between a group of riders and horses indicates that everybody rides a single horse, and every horse is ridden. The two functions would be rider (horse) and horse (rider). These are inverse functions of each other. Functions that are not bijections have no inverse (Figure 2.2).
假设有一个函数f : A ⟼ B ,那么可能存在一个反函数f –1 : B ⟼ A ,其定义规则为f –1 ( b ) = a ,其中b = f ( a )。此定义仅当每个b ∈ B都是某个点在f下的像(即范围等于目标)并且仅有一个这样的点(即只有一个a使得f ( a ) = b )时才成立。这样的映射或函数称为双射。双射将每个a ∈ A映射到唯一的b ∈ B ,并且对于每个b ∈ B ,都有且仅有一个a ∈ A使得f ( a ) = b (图 2.1 )。一组骑手与马之间的双射表示每个人都骑一匹马,并且每匹马都被骑过。这两个函数分别是骑手(马)和马(骑手) 。它们是互为反函数。非双射函数没有逆(图 2.2 )。
An example of a bijection is f : ℝ ⟼ ℝ, with f (x) = x3. The inverse is . This example shows that the standard notation can be somewhat awkward because x is used as a dummy variable in both f and f–1. It is sometimes more intuitive to use different dummy variables, with y = f (x) and x = f–1(y) . This yields the more intuitive y = x3 and . An example of a function that does not have an inverse is sqr : ℝ ⟼ ℝ, where sqr(x) = x2. This is true for two reasons: first x2 = (–x)2, and second no members of the domain map to the negative portions of the target. Note that we can define an inverse if we restrict the domain and range to R+. Then, is a valid inverse.
双射的一个例子是f : ℝ ⟼ ℝ,其中f ( x ) = x 3 。逆函数是f − 1 (十) =十3 。此示例表明,标准符号可能有些不方便,因为x在f和f –1中都用作虚拟变量。有时使用不同的虚拟变量更为直观,例如y = f ( x ) 和x = f –1 ( y )。这会产生更直观的y = x 3和十=是3 。没有逆函数的一个例子是sqr : ℝ ⟼ ℝ,其中sqr ( x ) = x 2 。这是正确的,原因有二:首先x 2 = (– x ) 2 ,其次,域中没有成员映射到目标的负部分。请注意,如果我们将域和范围限制为 R + ,我们可以定义逆函数。然后,十是有效的逆。
Often, we would like to specify that a function deals with real numbers that are restricted in value. One such constraint is to specify an interval. An example of an interval is the real numbers between zero and one, not including zero or one. We denote this (0, 1) . Because it does not include its endpoints, this is referred to as an open interval. The corresponding closed interval, which does contain its endpoints, is denoted with square brackets: [0, 1]. This notation can be mixed; i.e., [0, 1) includes zero but not one. When writing an interval [a, b], we assume that a ≤ b. The three common ways to represent an interval are shown in Figure 2.3. The Cartesian products of intervals are often used. For example, to indicate that a point x is in the unit cube in 3D, we say x ∈ [0, 1]3.
我们经常会想说明一个函数所处理的实数在值上有限制。指定一个区间就是这样一个约束。区间的一个例子是零到一之间的实数,不包括零或一。我们将其表示为(0, 1)。因为它不包括其端点,所以这被称为开区间。相应的闭区间包含其端点,用方括号表示:[0, 1]。这种表示法可以混合使用;即 [0, 1) 包括零但不包括一。当写区间 [ a, b ] 时,我们假设a ≤ b 。图 2.3显示了表示区间的三种常用方式。通常使用区间的笛卡尔积。例如,为了表示点x位于三维中的单位立方体中,我们说x ∈ [0, 1] 3 。
Figure 2.1. A bijection f and the inverse function f-1. Note that f-1 is also a bijection.
图 2.1。双射 f 和反函数f -1 。请注意, f -1也是双射。
Figure 2.2. The function g does not have an inverse because two elements of d map to the same element of E. The function h has no inverse because element T of F has no element of d mapped to it.
图 2.2。函数g没有逆函数,因为d的两个元素映射到E的同一个元素。函数h没有逆函数,因为F中的元素T没有d的元素映射到它。
Figure 2.3. Three equivalent ways to denote the interval from a to b that includes b but not a.
图 2.3。表示从a到b 的区间(包括b但不包括a)的三种等效方法。
Intervals are particularly useful in conjunction with set operations: intersection, union, and difference. For example, the intersection of two intervals is the set of points they have in common. The symbol ∩ is used for intersection. For example, [3, 5)∩[4, 6] = [4, 5) . For unions, the symbol ∪ is used to denote points in either interval. For example, [3, 5) ∪ [4, 6] = [3, 6]. Unlike the first two operators, the difference operator produces different results depending on argument order. The minus sign is used for the difference operator, which returns the points in the left interval that are not also in the right. For example, [3, 5) – [4, 6] = [3, 4) and [4, 6] – [3, 5) = [5, 6]. These operations are particularly easy to visualize using interval diagrams (Figure 2.4).
区间与集合运算结合使用特别有用:路口,联盟,以及差。例如,两个区间的交集是它们共同的点集。符号 ∩ 用于表示交集。例如,[3, 5)∩[4, 6] = [4, 5) 。对于并集,符号∪用于表示任一区间内的点。例如,[3, 5) ∪ [4, 6] = [3, 6]。与前两个运算符不同,差运算符根据参数顺序产生不同的结果。差运算符使用减号,它返回左侧区间中不在右侧区间的点。例如,[3, 5) – [4, 6] = [3, 4) 和 [4, 6] – [3, 5) = [5, 6]。使用区间图可以特别容易地可视化这些运算(图 2.4 )。
Although not as prevalent today as they were before calculators, logarithms are often useful in problems where equations with exponential terms arise. By definition, every logarithm has a base a. The “log base a”of x is writtena x and is defined as “the exponent to which a must be raised to get x,” i.e.,
尽管如今对数不像计算器出现之前那样流行,但在涉及指数项的方程式的问题中,对数通常很有用。根据定义,每个对数都有一个底数a 。x的“底数为a的对数”写为a x ,其定义为“必须将a提升到哪个指数才能得到x ”,即
Figure 2.4. Interval operations on [3,5) and [4,6].
图 2.4. [3,5) 和 [4,6] 上的区间运算。
Note that the logarithm base a and the function that raises a to a power are inverses of each other. This basic definition has several consequences:
请注意,对数底数a和对a求幂的函数互为逆。这个基本定义有几个后果:
When we apply calculus to logarithms, the special number e = 2.718... often turns up. The logarithm with base e is called the natural logarithm. We adopt the common shorthand ln to denote it:
当我们将微积分应用于对数时,经常会出现特殊数字e = 2.718...。以e为底的对数称为自然对数。我们采用常见的简写 ln 来表示它:
Note that the “≡” symbol can be read “is equivalent by definition.” Like π, the special number e arises in a remarkable number of contexts. Many fields use a particular base in addition to e for manipulations and omit the base in their notation, i.e., log x. For example, astronomers often use base 10 and theoretical computer scientists often use base 2. Because computer graphics borrows technology from many fields, we will avoid this shorthand.
请注意,“ ≡ ”符号可以理解为“根据定义等价”。与π一样,特殊数字e出现在大量上下文中。许多领域除了使用e之外还使用特定底数进行运算,并在其符号中省略底数,即log x 。例如,天文学家通常使用10底数,而理论计算机科学家通常使用2底数。由于计算机图形学借鉴了许多领域的技术,我们将避免这种简写。
The derivatives of logarithms and exponents illuminate why the natural logarithm is “natural”:
对数和指数的导数解释了为什么自然对数是“自然的”:
The constant multipliers above are unity only for a = e.
上述常数乘数仅当a = e时才是 1。
A quadratic equation has the form
二次方程的形式为
where x is a real unknown, and A, B,and C are known constants. If you think of a2D xy plot with y = Ax2 + Bx + C, the solution is just whatever x values are “zero crossings” in y. Because y = Ax2 + Bx + C is a parabola, there will be zero, one, or two real solutions depending on whether the parabola misses, grazes, or hits the x-axis (Figure 2.5).
其中x是实数未知数, A 、 B和C是已知常数。如果将y = Ax 2 + Bx + C视为二维xy图,则解就是y中“过零点”的x值。由于y = Ax 2 + Bx + C是抛物线,因此将有零个、一个或两个实数解,具体取决于抛物线是否未碰到、擦过或碰到x轴(图 2.5 )。
To solve the quadratic equation analytically, we first divide by A:
为了解析地解二次方程,我们首先除以A :
Then, we “complete the square” to group terms:
然后,我们“完成平方”来对项进行分组:
Moving the constant portion to the right-hand side and taking the square root give
将常数部分移到右侧并取平方根,得出
Subtracting B/(2A) from both sides and grouping terms with the denominator 2A gives the familiar form:1
从两边减去B/ (2 A ),并将分母为 2 A 的项分组,得到熟悉的形式: 1
Here, the “±” symbol means there are two solutions, one with a plus sign and one with a minus sign. Thus, 3 ± 1 equals “two or four.” Note that the term that determines the number of real solutions is
此处的“±”符号表示有两个解,一个带加号,一个带减号。因此,3 ± 1 等于“二或四”。请注意,确定实数解数量的项是
which is called the discriminant of the quadratic equation. If D > 0, there are two real solutions (also called roots). If D = 0, there is one real solution (a “double” root). If D < 0, there are no real solutions.
称为二次方程的判别式。如果D > 0,则有两个实数解(也称为根)。如果D = 0,则有一个实数解(“双”根)。如果D < 0,则没有实数解。
For example, the roots of 2x2 +6x +4 = 0 are x = –1 and x = –2, and the equation x2 + x+1 has no real solutions. The discriminants of these equations are D = 4 and D = –3, respectively, so we expect the number of solutions given. In programs, it is usually a good idea to evaluate D first and return “no roots” without taking the square root if D is negative.
例如,2 x 2 +6 x +4 = 0 的根为x = – 1 和x = – 2,而方程x 2 + x +1 没有实数解。这两个方程的判别式分别为D = 4 和D = – 3,因此我们期望给出解的数量。在程序中,通常最好先求D ,如果D为负数,则返回“无根”,而不取平方根。
Figure 2.5. The geometric interpretation of the roots of a quadratic equation is the intersection points of a parabola with the x-axis.
图 2.5。二次方程根的几何解释是抛物线与x轴的交点。
In graphics, we use basic trigonometry in many contexts. Usually, it is nothing too fancy, and it often helps to remember the basic definitions.
在图形学中,我们在许多情况下使用基本三角函数。通常,它并不太复杂,而且它通常有助于记住基本定义。
Although we take angles somewhat for granted, we should return to their definition so we can extend the idea of the angle onto the sphere. An angle is formed between two half-lines (infinite rays stemming from an origin) or directions, and some convention must be used to decide between the two possibilities for the angle created between them as shown in Figure 2.6. An angle is defined by the length of the arc segment it cuts out on the unit circle. A common convention is that the smaller arc length is used, and the sign of the angle is determined by the order in which the two half-lines are specified. Using that convention, all angles are in the range [–π, π].
尽管我们在某种程度上认为角度是理所当然的,但我们应该回到它们的定义,这样我们就可以将角度的概念扩展到球面上。两个半线(从原点发出的无限射线)或方向之间会形成一个角度,必须使用一些约定来决定它们之间形成的两种角度的可能性,如图 2.6所示。角度由它在单位圆上切出的圆弧段的长度定义。常见的约定是使用较小的弧长,并且角度的符号由指定两个半线的顺序决定。使用该约定,所有角度都在 [-π, π] 范围内。
Figure 2.6. Two halflines cut the unit circle into two arcs. The length of either arc is a valid angle “between” the two half-lines. Either we can use the convention that the smaller length is the angle, or that the two halflines are specified in a certain order and the arc that determines angle ϕ is the one swept out counterclockwise from the first to the second half-line.
图 2.6。两条半线将单位圆切成两条圆弧。每条圆弧的长度都是两条半线“之间”的有效角度。我们既可以使用约定,即较小的长度是角度,也可以使用约定,即两条半线按特定顺序指定,并且确定角度 ϕ 的圆弧是从第一条半线到第二条半线逆时针扫出的圆弧。
Each of these angles is the length of the arc of the unit circle that is “cut” by the two directions. Because the perimeter of the unit circle is 2π, the two possible angles sum to 2π. The unit of these arc lengths is radians. Another common unit is degrees, where the perimeter of the circle is 360°. Thus, an angle that is π radians is 180°, usually denoted 180°. The conversion between degrees and radians is
这些角度中的每一个都是被两个方向“切割”的单位圆的弧长。因为单位圆的周长是 2 π ,所以两个可能的角度之和为 2 π 。这些弧长的单位是弧度。另一个常见单位是度,其中圆的周长是 360°。因此, π弧度的角度为 180°,通常表示为 180°。度和弧度之间的转换是
Figure 2.7. A geometric demonstration of the Pythagorean theorem.
图 2.7.勾股定理的几何证明。
Given a right triangle with sides of length a, o, and h, where h is the length of the longest side (which is always opposite the right angle), or hypotenuse, an important relation is described by the Pythagorean theorem:
给定一个直角三角形,其边长分别为a 、 o和h ,其中h是最长边的长度(始终与直角相对),或者斜边,一个重要的关系由勾股定理描述:
You can see that this is true from Figure 2.7, where the big square has area (a+o)2, the four triangles have the combined area 2ao, and the center square has area h2.
从图 2.7中可以看出情况确实如此,其中大正方形的面积为 ( a + o ) 2 ,四个三角形的面积合计为 2 ao ,中心正方形的面积为h 2 。
Because the triangles and inner square subdivide the larger square evenly, we have 2ao + h2 = (a + o)2, which is easily manipulated to the form above.
因为三角形和内部正方形将大正方形均匀地细分,所以我们有 2 ao + h 2 = ( a + o ) 2 ,它可以很容易地处理成上面的形式。
We define sine and cosine of ϕ, as well as the other ratio-based trigonometric expressions:
我们定义正弦和ϕ 的余弦,以及其他基于比率的三角表达式:
These definitions allow us to set up polar coordinates, where a point is coded as a distance from the origin and a signed angle relative to the positive x-axis (Figure 2.8). Note the convention that angles are in the range ϕ ∈ (–π, π], and that the positive angles are counterclockwise from the positive x-axis. This convention that counterclockwise maps to positive numbers is arbitrary, but it is used in many contexts in graphics so it is worth committing to memory.
这些定义使我们能够建立极坐标,其中点被编码为与原点的距离和相对于正x轴的带符号角度(图 2.8 )。请注意惯例:角度在 ϕ ∈ (-π, π ] 范围内,并且正角度从正x轴逆时针旋转。逆时针映射到正数的惯例是任意的,但它在图形学的许多情况下都有使用,因此值得记住。
Figure 2.8. Polar coordinates for the point is (ra, ϕa) = (2, π/3).
图 2.8。点 (xaya)=(13) 的极坐标为 ( r a , ϕ a ) = (2, π/3)。
Trigonometric functions are periodic and can take any angle as an argument. For example, sin(A) = sin(A +2π) . This means the functions are not invertible when considered with the domain R. This problem is avoided by restricting the range of standard inverse functions, and this is done in a standard way in almost all modern math libraries (e.g., Plauger (1991)). The domains and ranges are
三角函数是周期性的,可以以任意角度作为参数。例如,sin( A ) = sin( A +2 π )。这意味着当用域 R 考虑时,函数不可逆。通过限制标准反函数的范围可以避免这个问题,几乎所有现代数学库(例如 Plauger (1991))都以标准方式做到这一点。域和范围是
The last function, atan2(s, c) is often very useful. It takes an s value proportional to sin A and a c value that scales cos A by the same factor and returns A. The factor is assumed to be positive. One way to think of this is that it returns the angle of a 2D Cartesian point (s, c) in polar coordinates (Figure 2.9).
最后一个函数 atan2( s, c ) 通常非常有用。它采用与 sin A成比例的s值和将 cos A按相同因子缩放的c值并返回A 。假定因子为正。一种思考方式是,它返回极坐标中二维笛卡尔点 ( s, c ) 的角度(图 2.9 )。
Figure 2.9. The function atan2(s,c) returns the angle A and is often very useful in graphics.
图 2.9。函数 atan2 (s,c)返回角度 A,这在图形学中通常非常有用。
This section lists without derivation a variety of useful trigonometric identities.
本节列出了各种有用的三角恒等式(但不作推导)。
Pythagorean identities:
毕达哥拉斯恒等式:
Half-angle identities:
半角恒等式:
Half-angle identities:
半角恒等式:
Product identities:
产品标识:
The following identities are for arbitrary triangles with side lengths a, b, and c, each with an angle opposite it given by A, B, C, respectively (Figure 2.10),
以下恒等式适用于边长为a、b和c的任意三角形,每个三角形的对角分别由A、B、C给出(图 2.10 ),
The area of a triangle can also be computed in terms of these side lengths:
三角形的面积也可以根据以下边长来计算:
Traditional trigonometry in this section deals with triangles on the plane. Triangles can be defined on non-planar surfaces as well, and one that arises in many fields, astronomy, for example, is triangles on the unit-radius sphere. These spherical triangles have sides that are segments of the great circles (unit-radius circles) on the sphere. The study of these triangles is a field called spherical trigonometry and is not used that commonly in graphics, but sometimes, it is critical when it does arise. We wont discuss the details of it here, but want the reader to be aware that area exists for when those problems do arise, and there are a lot of useful rules such as a spherical law of cosines and a spherical law of sines. For an example of the machinery of spherical trigonometry being used, see the paper on sampling triangle lights (which project to a spherical triangle) (Arvo, 1995b).
本节中的传统三角学涉及平面上的三角形。三角形也可以在非平面表面上定义,在许多领域(例如天文学)中出现的三角形是单位半径球面上的三角形。这些球面三角形的边是球面上的大圆(单位半径圆)。这些三角形的研究领域称为球面三角学,在图形学中并不常用,但有时,当它出现时,它至关重要。我们不会在这里讨论它的细节,但希望读者知道当这些问题确实出现时,存在面积,并且有很多有用的规则,例如球面余弦定律和球面正弦定律。有关正在使用的球面三角学机制的示例,请参阅关于采样三角形光(投射到球面三角形)的论文(Arvo,1995b)。
Of more central importance to computer graphics are solid angles. While angles allow us to quantify things like “what is the separation of those two poles in my visual field,” solid angles let us quantify things like “how much of my visual field does that airplane cover.” For traditional angles, we project the posts onto the unit circle and measure arc length between them on the unit circle. We work with angles often enough that many of us can forget this definition because it is all so intuitive to us now. Solid angles are just as simple, but they may seem more confusing because most of us learn about them as adults. For solid angles, we project the visible directions that “see” the airplane and project it onto the unit sphere and measure the area. This area is the solid angle in the same way the arc length is the angle. While angles are measured in radians and sum to 2π (the total length of a unit circle), solid angles are measured in steradians and sum to 4π (the total area of a unit sphere).
对计算机图形学来说,立体角更为重要。虽然角度使我们能够量化诸如“我的视野中两极之间的距离是多少”之类的事物,但立体角使我们能够量化诸如“那架飞机覆盖了我的视野的多少”之类的事物。对于传统角度,我们将柱子投影到单位圆上,并在单位圆上测量它们之间的弧长。我们经常使用角度,以至于我们中的许多人可能会忘记这个定义,因为它现在对我们来说非常直观。立体角同样简单,但它们可能看起来更令人困惑,因为我们大多数人都是成年后才了解它们的。对于立体角,我们将“看到”飞机的可见方向投影到单位球面上并测量面积。这个面积就是立体角,就像弧长就是角度一样。虽然角度以弧度为单位测量,总和为 2 π (单位圆的总长度),但立体角以球面度,总和为 4 π (单位球体的总面积)。
Figure 2.11. These two vectors are the same because they have the same length and direction.
图 2.11。这两个向量是相同的,因为它们具有相同的长度和方向。
A vector describes a length and a direction. It can be usefully represented by an arrow. Two vectors are equal if they have the same length and direction even if we think of them as being located in different places (Figure 2.11). As much as possible, you should think of a vector as an arrow and not as coordinates or numbers. At some point, we will have to represent vectors as numbers in our programs, but even in code, they should be manipulated as objects and only the low-level vector operations should know about their numeric representation (DeRose, 1989). Vectors will be represented as bold characters, e.g., a. A vector’s length is denoted ||a||. A unit vector is any vector whose length is one. The zero vector is the vector of zero length. The direction of the zero vector is undefined.
向量描述长度和方向。它可以用箭头表示。如果两个向量具有相同的长度和方向,即使我们认为它们位于不同位置,它们也是相等的(图 2.11 )。尽可能将向量视为箭头,而不是坐标或数字。在某些时候,我们必须在程序中将向量表示为数字,但即使在代码中,也应该将它们作为对象进行操作,并且只有低级向量操作才应该知道它们的数字表示(DeRose,1989)。向量将表示为粗体字符,例如a 。向量的长度表示为 ||a||。单位向量是长度为 1 的任意向量。零向量是长度为零的向量。零向量的方向未定义。
Vectors can be used to represent many different things. For example, they can be used to store an offset, also called a displacement. If we know “the treasure is buried two paces east and three paces north of the secret meeting place,” then we know the offset, but we don’t know where to start. Vectors can also be used to store a location, another word for position or point. Locations can be represented as a displacement from another location. Usually, there is some understood origin location from which all other locations are stored as offsets. Note that locations are not vectors. As we shall discuss, you can add two vectors. However, it usually does not make sense to add two locations unless it is an intermediate operation when computing weighted averages of a location (Goldman, 1985). Adding two offsets does make sense, so that is one reason why offsets are vectors. But this emphasizes that a location is not an offset; it is an offset from a specific origin location. The offset by itself is not the location.
向量可用于表示许多不同的东西。例如,它们可用于存储偏移量,也称为位移。如果我们知道“宝藏埋在秘密会面地点以东两步、以北三步处”,那么我们知道偏移量,但不知道从哪里开始。向量还可用于存储位置,即位置或点。位置可以表示为与另一个位置的位移。通常,存在某个已知的原点位置,所有其他位置都从该原点位置存储为偏移量。请注意,位置不是向量。正如我们将要讨论的,您可以将两个向量相加。但是,除非在计算位置的加权平均值时需要进行中间运算,否则通常没有必要将两个位置相加(Goldman,1985)。将两个偏移量相加确实有意义,所以这也是偏移量是向量的原因之一。但这强调了位置不是偏移量;它是与特定原点位置的偏移量。偏移量本身不是位置。
Figure 2.12. Two vectors are added by arranging them head to tail. This can be done in either order.
图 2.12.两个向量以首尾相接的方式相加。可以按任意顺序进行。
Figure 2.13. The vector –a has the same length but opposite direction of the vector a.
图 2.13。向量 -a 与向量 a 长度相同,但方向相反。
Vectors have most of the usual arithmetic operations that we associate with real numbers. Two vectors are equal if and only if they have the same length and direction. Two vectors are added according to the parallelogram rule. This rule states that the sum of two vectors is found by placing the tail of either vector against the head of the other (Figure 2.12). The sum vector is the vector that “completes the triangle” started by the two vectors. The parallelogram is formed by taking the sum in either order. This emphasizes that vector addition is commutative:
向量具有与实数相关的大多数常见算术运算。当且仅当两个向量具有相同的长度和方向,它们才相等。两个向量相加符合平行四边形规则。该规则指出,两个向量的和是通过将任一向量的尾部靠在另一个向量的头部上来获得的(图 2.12 )。和向量是“完成由两个向量组成的三角形”的向量。以任意顺序求和可形成平行四边形。这强调了向量加法是交换的:
Note that the parallelogram rule just formalizes our intuition about displacements. Think of walking along one vector, tail to head, and then walking along the other. The net displacement is just the parallelogram diagonal. You can also create a unary minus for a vector: –a (Figure 2.13) is a vector with the same length as a but opposite direction. This allows us to also define subtraction:
请注意,平行四边形规则只是形式化了我们对位移的直觉。想象一下沿着一个向量行走,从尾部到头部,然后沿着另一个向量行走。净位移就是平行四边形的对角线。你也可以创建一个向量的一元减法: – a (图 2.13 )是与a长度相同但方向相反的向量。这使我们能够定义减法:
You can visualize vector subtraction with a parallelogram (Figure 2.14). We can write
可以用平行四边形来直观地展示向量减法(图 2.14 )。我们可以这样写
Figure 2.14. Vector subtraction is just vector addition with a reversal of the second argument.
图 2.14向量减法只是向量加法加上第二个参数的反转。
Vectors can also be multiplied. In fact, there are several kinds of products involving vectors. First, we can scale the vector by multiplying it by a real number k.
向量也可以相乘。事实上,向量的乘法有好几种。首先,我们可以通过将向量乘以实数k来缩放该向量。
This just multiplies the vector’s length without changing its direction. For example, 3.5a is a vector in the same direction as a, but it is 3.5 times as long as a. We discuss two products involving two vectors, the dot product and the cross product, later in this section, and a product involving three vectors, the determinant, in Chapter 6.
这只是将向量的长度相乘,而不改变其方向。例如,3.5 a是一个与a 方向相同的向量,但它的长度是a的 3.5 倍。我们将在本节后面讨论涉及两个向量的两种乘积,即点积和叉积,并在第 6 章讨论涉及三个向量的乘积,即行列式。
A 2D vector can be written as a combination of any two nonzero vectors which are not parallel. This property of the two vectors is called linear independence. Two linearly independent vectors form a 2D basis, and the vectors are thus referred to as basis vectors. For example, a vector c may be expressed as a combination of two basis vectors a and b (Figure 2.15):
二维向量可以写成任意两个不平行的非零向量的组合。这两个向量的这种性质称为线性独立。两个线性独立的向量构成一个二维基,因此这些向量被称为基向量。例如,向量c可以表示为两个基向量a和b的组合(图 2.15 ):
Figure 2.15. Any 2D vector c is a weighted sum of any two nonparallel 2D vectors a and b.
图 2.15任何二维向量c都是任意两个非平行二维向量a和b 的加权和。
Note that the weights ac and bc are unique. Bases are especially useful if the two vectors are orthogonal; i.e., they are at right angles to each other. It is even more useful if they are also unit vectors in which case they are orthonormal. If we assume two such “special” vectors x and y are known to us, then we can use them to represent all other vectors in a Cartesian coordinate system, where each vector is represented as two real numbers. For example, a vector a might be represented as
注意,权重a c和b c是唯一的。如果两个向量正交;即它们彼此成直角。如果它们也是单位向量,则更有用,在这种情况下它们是正交的。如果我们假设我们知道两个这样的“特殊”向量x和y ,那么我们可以用它们来表示笛卡尔坐标系中的所有其他向量,其中每个向量都表示为两个实数。例如,向量a可以表示为
where xa and ya are the real Cartesian coordinates of the 2D vector a (Figure 2.16). Note that this is not really any different conceptually from Equation (2.3), where the basis vectors were not orthonormal. But there are several advantages to a Cartesian coordinate system. For instance, by the Pythagorean theorem, the length of a is
其中x a和y a是二维向量a的实数笛卡尔坐标(图 2.16 )。请注意,这与公式 (2.3) 的概念并没有什么不同,因为公式中的基向量不是正交的。但笛卡尔坐标系有几个优点。例如,根据勾股定理, a的长度为
Figure 2.16. A 2D Cartesian basis for vectors.
图 2.16.向量的二维笛卡尔基。
It is also simple to compute dot products, cross products, and coordinates of vectors in Cartesian systems, as we’ll see in the following sections.
在笛卡尔系统中计算点积、叉积和向量坐标也很简单,正如我们将在以下章节中看到的。
By convention, we write the coordinates of a either as an ordered pair (xa,ya) or a column matrix:
按照惯例,我们将a的坐标写为有序对 ( x a , y a ) 或列矩阵:
The form we use will depend on typographic convenience. We will also occasionally write the vector as a row matrix, which we will indicate as aT:
我们使用的形式取决于印刷方便性。我们有时也会将向量写成行矩阵,我们将其表示为 T :
We can also represent 3D, 4D, etc., vectors in Cartesian coordinates. For the 3D case, we use a basis vector z that is orthogonal to both x and y.
我们还可以在笛卡尔坐标系中表示 3D、4D 等向量。对于 3D 情况,我们使用与x和y正交的基向量z 。
The simplest way to multiply two vectors is the dot product. The dot product of a and b is denoted a · b and is often called the scalar product because it returns a scalar. The dot product returns a value related to its arguments’ lengths and the angle ϕ between them (Figure 2.17):
两个向量相乘最简单的方法是点积。a和b的点积表示为a · b ,通常称为标量积,因为它返回标量。点积返回与其参数长度以及它们之间的角度 ϕ 相关的值(图 2.17 ):
The most common use of the dot product in graphics programs is to compute the cosine of the angle between two vectors.
图形程序中点积的最常见用途是计算两个向量之间角度的余弦。
The dot product can also be used to find the projection of one vector onto another. This is the length a→b of a vector a that is projected at right angles onto a vector b (Figure 2.18):
点积也可用于求一个向量在另一个向量上的投影。这是向量 a 以直角投影到向量 b上的长度a → b (图 2.18 ):
Figure 2.17. The dot product is related to length and angle and is one of the most important formulas in graphics.
图 2.17。点积与长度和角度有关,是图形学中最重要的公式之一。
The dot product obeys the familiar associative and distributive properties we have in real arithmetic:
点积遵循实数算术中我们熟悉的结合律和分配律:
Figure 2.18. The projection of a onto b is a length found by Equation (2.5).
图 2.18. a到b的投影是通过公式 (2.5) 找到的长度。
If 2D vectors a and b are expressed in Cartesian coordinates, we can take advantage of x · x = y · y = 1 and x · y = 0 to derive that their dot product is
如果二维向量 a 和 b 以笛卡尔坐标表示,我们可以利用x · x = y · y = 1 和x · y = 0 来推导出它们的点积为
Similarly in 3Dwe can find
同样在 3D 中我们可以找到
The cross product a × b is usually used only for three-dimensional vectors; generalized cross products are discussed in references given in the chapter notes. The cross product returns a 3D vector that is perpendicular to the two arguments of the cross product. The length of the resulting vector is related to sin ϕ:
叉积a × b通常仅用于三维向量;广义叉积在章节注释中给出的参考文献中讨论。叉积返回一个垂直于叉积的两个参数的三维向量。结果向量的长度与 sin ϕ 有关:
The magnitude ||a × b|| is equal to the area of the parallelogram formed by vectors a and b. In addition, a × b is perpendicular to both a and b (Figure 2.19). Note that there are only two possible directions for such a vector. By definition, the vectors in the direction of the x-, y- and z-axes are given by
幅值 || a × b || 等于由向量a和b构成的平行四边形的面积。此外, a × b与a和b都垂直(图 2.19 )。请注意,此类向量只有两个可能的方向。根据定义, x 轴、 y 轴和z轴方向上的向量由下式给出
and we set as a convention that x × y must be in the plus or minus z direction. The choice is somewhat arbitrary, but it is standard to assume that
并且我们约定x × y必须位于正或负z方向。选择有点随意,但标准做法是假设
Figure 2.19. The cross product a × b is a 3D vector perpendicular to both 3D vectors a and b, and its length is equal to the area of the parallelogram shown.
图 2.19。叉积a × b是与三维向量 a 和 b 都垂直的三维向量,其长度等于所示平行四边形的面积。
All possible permutations of the three Cartesian unit vectors are
三个笛卡尔单位向量的所有可能排列是
Because of the sin ϕ property, we also know that a vector cross itself is the zero vector, so x × x = 0 and so on. Note that the cross product is not commutative, i.e., x × y = y × x. The careful observer will note that the above discussion does not allow us to draw an unambiguous picture of how the Cartesian axes relate. More specifically, if we put x and y on a sidewalk, with x pointing east and y pointing north, then does z point up to the sky or into the ground? The usual convention is to have z point to the sky. This is known as a right-handed coordinate system. This name comes from the memory scheme of “grabbing” x with your right palm and fingers and rotating it toward y. The vector z should align with your thumb. This is illustrated in Figure 2.20.
由于 sin ϕ 性质,我们还知道向量叉积本身是零向量,因此x × x = 0 ,依此类推。请注意,叉积不交换,即x × y = y × x 。细心的观察者会注意到,上述讨论并未让我们明确地描绘出笛卡尔坐标轴之间的关系。更具体地说,如果我们将x和y放在人行道上, x指向东, y指向北,那么z指向天空还是地面?通常的惯例是让z指向天空。这被称为右手坐标系。这个名称来自于用右手掌和手指“抓住” x并将其向y旋转的记忆方案。向量z应该与拇指对齐。如图 2.20所示。
Figure 2.20. The “righthand rule” for cross products. Imagine placing the base of your right palm where a and b join at their tails, and pushing the arrow of a toward b. Your extended right thumb should point toward a × b.
图 2.20。叉积的“右手定则”。想象一下,将右手掌根部放在a和b尾部连接处,并将a的箭头推向b 。伸出的右手拇指应指向a × b 。
The cross product has the nice property that
叉积具有良好的性质
and
和
However, a consequence of the right-hand rule is
然而,右手定则的结果是
In Cartesian coordinates, we can use an explicit expansion to compute the cross product:
在笛卡尔坐标中,我们可以使用显式展开来计算叉积:
So, in coordinate form,
因此,以坐标形式,
Managing coordinate systems is one of the core tasks of almost any graphics program; the key to this is managing orthonormal bases. Any set of two 2D vectors u and v form an orthonormal basis provided that they are orthogonal (at right angles) and are each of unit length. Thus,
管理坐标系是几乎所有图形程序的核心任务之一;其中的关键是管理正交基。任何两个二维向量u和v的集合构成一个正交基,前提是它们正交(成直角)且每个向量的长度为单位。因此,
and
和
In 3D, three vectors u, v,and w form an orthonormal basis if
在三维空间中,三个向量u 、 v和w构成一个正交基,若
and
和
This orthonormal basis is right-handed provided
该正交基为右手系,条件是
and otherwise, it is left-handed.
否则,它就是左撇子。
Note that the Cartesian canonical orthonormal basis is just one of infinitely many possible orthonormal bases. What makes it special is that it and its implicit origin location are used for low-level representation within a program. Thus, the vectors x, y, and z are never explicitly stored and neither is the canonical origin location o. The global model is typically stored in this canonical coordinate system, and it is thus often called the global coordinate system. However, if we want to use another coordinate system with origin p and orthonormal basis vectors u, v, and w, then we do store those vectors explicitly. Such a system is called a frame of reference or coordinate frame. For example, in a flight simulator, we might want to maintain a coordinate system with the origin at the nose of the plane, and the orthonormal basis aligned with the airplane. Simultaneously, we would have the master canonical coordinate system (Figure 2.21). The coordinate system associated with a particular object, such as the plane, is usually called a local coordinate system.
请注意,笛卡尔正则标准正交基只是无限多个可能的正交基之一。它的特殊之处在于,它及其隐式原点位置用于程序中的低级表示。因此,向量x 、 y和z从未明确存储,正则原点位置o也是如此。全局模型通常存储在这个正则坐标系中,因此它通常被称为全局坐标系。但是,如果我们想使用另一个以原点p和正交基向量u 、 v和w为原点的坐标系,那么我们确实会明确存储这些向量。这样的系统称为参考系或坐标系。例如,在飞行模拟器中,我们可能希望维护一个坐标系,其原点位于飞机机头,正交基与飞机对齐。同时,我们将拥有主标准坐标系(图 2.21 )。与特定对象(例如飞机)相关联的坐标系通常称为局部 坐标 系 报告 错误.
At a low level, the local frame is stored in canonical coordinates. For example, if u has coordinates (xu,yu,zu) ,
在低层次上,局部框架存储在标准坐标中。例如,如果u的坐标为 ( x u , y u , z u ),
A location implicitly includes an offset from the canonical origin:
位置隐式包含与规范原点的偏移量:
where (xp,yp,zp) are the coordinates of p.
其中( xp , yp , zp )是p的坐标。
Note that if we store a vector a with respect to the u-v-w frame, we store a triple (ua,va,wa) which we can interpret geometrically as
请注意,如果我们存储相对于u - v - w框架的向量a ,我们存储一个三元组 ( u a ,v a ,w a ),我们可以将其几何解释为
To get the canonical coordinates of a vector a stored in the u-v-w coordinate system, simply recall that u, v,and w are themselves stored in terms of Cartesian coordinates, so the expression uau + vav + waw is already in Cartesian coordinates if evaluated explicitly. To get the u-v-w coordinates of a vector b stored in the canonical coordinate system, we can use dot products:
要获取存储在u - v - w坐标系中的向量a的标准坐标,只需记住u 、 v和w本身是以笛卡尔坐标形式存储的,因此如果明确求值,表达式u a u + v a v + w a w已经是笛卡尔坐标。要获取存储在标准坐标系中的向量b的u - v - w坐标,我们可以使用点积:
Figure 2.21. There is always a master or “canonical” coordinate system with origin o and orthonormal basis x, y, and z. This coordinate system is usually defined to be aligned to the global model and is thus often called the “global” or “world” coordinate system. This origin and basis vectors are never stored explicitly. All other vectors and locations are stored with coordinates that relate them to the global frame. The coordinate system associated with the plane is explicitly stored in terms of global coordinates.
图 2.21。始终存在一个主坐标系或“规范”坐标系,其原点为o ,正交基为x 、 y和z 。该坐标系通常定义为与全局模型对齐,因此通常称为“全局”或“世界”坐标系。此原点和基向量从未明确存储。所有其他向量和位置都与与全局框架相关的坐标一起存储。与平面关联的坐标系以全局坐标的形式明确存储。
This works because we know that for some ub, vb,and wb,
这是可行的,因为我们知道对于某些u b 、 v b和w b ,
and the dot product isolates the ub coordinate:
点积分离出u b坐标:
This works because u, v,and w are orthonormal.
这是可行的,因为u 、 v和w是正交的。
Using matrices to manage changes of coordinate systems is discussed in Sections 7.2.1 and 7.5.
使用矩阵来管理坐标系的变化在第 7.2.1 和7.5节中讨论。
Often we need an orthonormal basis that is aligned with a given vector. That is, given a vector a, we want an orthonormal u, v, and w such that w points in the same direction as a (Hughes & Möller, 1999), but we don’t particularly care what u and v are. One vector isn’t enough to uniquely determine the answer; we just need a robust procedure that will find any one of the possible bases.
我们经常需要与给定向量对齐的正交基。也就是说,给定向量a ,我们需要正交u 、 v和w ,使得w指向与a相同的方向(Hughes & Möller,1999),但我们并不特别关心u和v是什么。一个向量不足以唯一地确定答案;我们只需要一个可以找到任何可能基的强大程序。
This can be done using cross products as follows. First, make w a unit vector in the direction of a:
这可以使用交叉积来实现,如下所示。首先,使w成为a方向上的单位向量:
Then, choose any vector t not collinear with w, and use the cross product to build a unit vector u perpendicular to w:
然后,选择任何与w不共线的向量t ,并使用叉积构建一个垂直于w 的单位向量u :
If t is collinear with w, the denominator will vanish, and if they are nearly collinear, the results will have low precision. A simple procedure to find a vector sufficiently different from w is to start with t equal to w and change the smallest magnitude component of t to 1. For example, if then . Once w and u are in hand, completing the basis is simple:
如果t与w共线,分母将消失,如果它们几乎共线,结果的精度将很低。找到与w足够不同的矢量的一个简单方法是从t等于w开始,然后将t的最小幅度分量更改为 1。例如,如果 w=(1/2−1/20),则 t=(1/2−1/21)。一旦掌握了w和u ,完成基础就很简单:
An example of a situation where this construction is used is surface shading, where a basis aligned to the surface normal is needed but the rotation around the normal is often unimportant.
使用这种构造的情况的一个例子是表面着色,其中需要与表面法线对齐的基础,但围绕法线的旋转通常并不重要。
For serious production code, recently researchers at Pixar have developed a rather remarkable method for constructing a vector from two vectors that is impressive in its compactness and efficiency (Duff et al., 2017). They provide battle-tested code, and readers are encouraged to use it as there are not “gotchas” that have emerged as it used throughout the industry.
对于严肃的生产代码,皮克斯的研究人员最近开发了一种相当出色的方法,用于从两个向量构建一个向量,其紧凑性和效率令人印象深刻(Duff 等人,2017 年)。他们提供了经过实战检验的代码,鼓励读者使用它,因为它在整个行业中使用时没有出现“陷阱”。
The procedure in the previous section can also be used in situations where the rotation of the basis around the given vector is important. A common example is building a basis for a camera: it’s important to have one vector aligned in the direction the camera is looking, but the orientation of the camera around that vector is not arbitrary, and it needs to be specified somehow. Once the orientation is pinned down, the basis is completely determined.
上一节中的过程还可用于基线围绕给定向量的旋转很重要的情况。一个常见的例子是为相机构建基线:重要的是让一个向量与相机所看的方向对齐,但相机围绕该向量的方向不是任意的,需要以某种方式指定。一旦确定了方向,基线就完全确定了。
A common way to fully specify a frame is by providing two vectors a (which specifies w) and b (which specifies v). If the two vectors are known to be perpendicular, it is a simple matter to construct the third vector by u = b × a.
完整指定框架的常用方法是提供两个向量a (指定w )和b (指定v )。如果已知这两个向量垂直,则通过u = b × a构造第三个向量是很简单的事情。
To be sure that the resulting basis really is orthonormal, even if the input vectors weren’t quite, a procedure much like the single-vector procedure is advisable:
为了确保最终得到的基础确实是正交的,即使输入向量不完全正交,建议采用类似于单向量过程的过程:
In fact, this procedure works just fine when a and b are not perpendicular. In this case, w will be constructed exactly in the direction of a,and v is chosen to be the closest vector to b among all vectors perpendicular to w.
事实上,当a和b不垂直时,此过程也很好用。在这种情况下, w将精确地沿a的方向构造,并且v被选为所有垂直于w的向量中最接近b的向量。
This procedure won’t work if a and b are collinear. In this case, b is of no help in choosing which of the directions perpendicular to a we should use: it is perpendicular to all of them.
如果a和b共线,则此过程将不起作用。在这种情况下, b对于选择我们应该使用哪个垂直于a 的方向毫无帮助:它垂直于所有方向。
In the example of specifying camera positions (Section 4.3), we want to construct a frame that has w parallel to the direction the camera is looking, and v should point out the top of the camera. To orient the camera upright, we build the basis around the view direction, using the straight-up direction as the reference vector to establish the camera’s orientation around the view direction. Setting v as close as possible to straight up exactly matches the intuitive notion of “holding the camera straight.”
在指定相机位置的示例中(第 4.3 节),我们希望构建一个框架,其中w与相机观察的方向平行,并且v应指向相机的顶部。为了使相机直立,我们围绕视线方向构建基础,使用直立方向作为参考向量来确定相机围绕视线方向的方向。将v设置为尽可能接近直立方向完全符合“保持相机直立”的直观概念。
Occasionally, you may find problems caused in your computations by a basis that is supposed to be orthonormal but where error has crept in—due to rounding error in computation, or to the basis having been stored in a file with low precision, for instance.
有时,您可能会发现计算中出现问题,问题是由本应是正交的基引起的,但其中却出现了错误 - 例如由于计算中的舍入误差,或者由于基存储在精度较低的文件中。
The procedure of the previous section can be used; simply constructing the basis anew using the existing w and v vectors will produce a new basis that is orthonormal and is close to the old one.
可以使用上一节的步骤;只需使用现有的w和v向量重新构建基,就会产生一个正交且接近旧基的新基。
This approach is good for many applications, but it is not the best available. It does produce accurately orthogonal vectors, and for nearly orthogonal starting bases, the result will not stray far from the starting point. However, it is asymmetric: it “favors” w over v and v over u (whose starting value is thrown away). It chooses a basis close to the starting basis but has no guarantee of choosing the closest orthonormal basis. When this is not good enough, the SVD (Section 6.4.1) can be used to compute an orthonormal basis that is guaranteed to be closest to the original basis.
这种方法适用于许多应用,但并非最佳方法。它确实能产生精确正交的向量,并且对于几乎正交的起始基,结果不会偏离起始点太远。但是,它是不对称的:它“偏爱” w而不是v ,偏爱 v而不是u (其起始值被丢弃)。它选择一个接近起始基的基,但不能保证选择最接近的正交基。当这还不够好时,可以使用 SVD(第 6.4.1 节)来计算保证最接近原始基的正交基。
A possibly misleading thing about graphics is that it is full of integrals and thus one might think one has to be good at algebraically solving integrals. This is most definitely not the case. Most of the integrals in graphics are not analytically solvable and are thus solved numerically. It is quite possible to have a great career in graphics and never algebraically solve a single integral.
图形学中可能存在一个误导性的事实,即它充满了积分,因此人们可能会认为必须擅长代数解积分。事实绝非如此。图形学中的大多数积分都不是解析解,因此需要用数值方法求解。在图形学领域拥有伟大事业的人,完全有可能从未用代数方法求解过一个积分。
While you do not need to be able to algebraically solve integrals, you do need to be able to read them so you can numerically solve them. In one dimension, integrals are usually pretty readable. For example, this integral
虽然你不需要能够用代数方法求解积分,但你需要能够读懂它们,这样你才能用数字方法求解它们。在一维空间中,积分通常非常易读。例如,这个积分
can be read as “compute the area of the function sin (x) between x = π and x = 2π.” A computer scientist might view this part:
可以理解为“计算函数 sin( x ) 在x = π和x = 2 π之间的面积”。计算机科学家可能会认为这部分内容是:
as a function call. We might call it “integrate().” It takes two objects: a function and a domain (interval). So the whole call might be
作为函数调用。我们可以称之为“integrate()”。它需要两个对象:一个函数和一个域(间隔)。因此整个调用可能是
float area = integrate(sin(), [pi,2pi]).
In more advanced calculus, we might start taking integrals over spheres, and the neat thing for graphics is we can still think of things that way:
在更高级的微积分中,我们可能会开始对球面进行积分,而图形的巧妙之处在于我们仍然可以这样思考事物:
float area = integrate(cos(), unit-sphere)
The machinery inside this function may be different, but all integrals have two things:
该函数内部的机制可能不同,但所有积分都有两件事:
The function being integrated
正在集成的功能
The domain over which it is integrated.
其集成的域。
The trick, usually, is just carefully decoding what 1 and 2 are for a problem at hand. This is pretty similar in spirit to getting an API call right from sometimes confusing documentation.
通常,诀窍就是仔细解读 1 和 2 对应手头的问题。这在本质上与从有时令人困惑的文档中获取 API 调用非常相似。
Integrals compute the total of things. Lengths, areas, volumes, etc. But they are often used to compute averages. For example, we can compute the total volume of a region by integrating the elevation over a region (like a country).
积分计算事物的总和。长度、面积、体积等。但它们通常用于计算平均值。例如,我们可以通过对某个区域(如一个国家)的海拔进行积分来计算该区域的总体积。
float volume = integrate(elevation(), country)
But we could also compute the average elevation:
但我们也可以计算平均海拔:
float averageElevation = integrate(elevation(), country) / integrate(1, country)
This is basically “divide the volume by the area.” This can be abstracted as
这基本上就是“用体积除以面积”。这可以抽象为
Float averageElevation = average(elevation, country)
We can also take a weighted average. Here, we add a weighting function to emphasize some points in the average more than others. For example, if we want to emphasize a parts of the region by the temperature (this is pretty arbitrary, and we will see more graphics relevant examples in the next section):
我们也可以取加权平均值。在这里,我们添加一个加权函数来强调平均值中的某些点。例如,如果我们想通过温度来强调该区域的某些部分(这非常随意,我们将在下一节中看到更多与图形相关的示例):
float weightedAverageElevation =
integrate(temperature()*elevation(),
country) / integrate(temperature(), country)
It’s a good idea to keep an eye out for this form; often integrals contain a weighted average without explicitly pointing that out and it can sometimes help intuition.
留意这种形式是个好主意;积分通常包含加权平均值而没有明确指出这一点,有时它可以帮助直觉。
One example of a type of integral we see a lot is one of these forms or something related:
我们经常看到的一种积分类型的一个例子是这些形式之一或相关形式:
float shade = integrate(cos()*f*(),
unit-hemisphere)
Note that since integrate(cos(), unit-hemisphere) = pi, the weighted average version is just
请注意,由于integrate(cos(), unit-hemisphere) = pi ,加权平均版本只是
float shade = integrate((1/pi)*cos()*f*(),
unit-hemisphere)
The more traditional form of this integral is
这个积分的更传统形式是
Or with spherical coordinates as we might use to solve such integrals algebraically:
或者使用球坐标,我们可以用代数方法求解此类积分:
The sine term if an area-correction factor for spherical coordinates. Note that in graphics, we will rarely need to write that all out and will use simpler forms without explicit coordinates as we numerically solve the integrals
正弦项是球面坐标的面积校正因子。请注意,在图形中,我们很少需要将其全部写出,并且在我们以数字方式求解积分时,我们将使用没有明确坐标的更简单的形式
The particular integral above is the shade of a perfectly reflective matte (dif-fuse) surface, and it is also a weighted average of all incident colors. This structure can be great for intuition; the color of a surface is usually related to a weighted average of incident colors.
上面的特定积分是完全反射的哑光(漫反射)表面的色调,也是所有入射颜色的加权平均值。这种结构对于直觉来说非常有用;表面的颜色通常与入射颜色的加权平均值相关。
The integrals over solid angle are almost always the same but use a wide variety of notations. Key is to recognize this is just notations and map the notations you see to one you are most comfortable with. This is much like reading pseudocode!
立体角上的积分几乎总是相同的,但使用各种各样的符号。关键是要认识到这只是符号,并将您看到的符号映射到您最熟悉的符号。这很像阅读伪代码!
Density functions come up all the time in graphics (e.g., “probability density functions”) and they can be surprisingly confusing at times, but getting a handle on what precisely they are will help us use them and navigate out of confusion when it strikes us. We know what a function is, and a density function is just one that returns a density. So what is a density? Density is something that is a “per unit something,” or more formally an intensive quantity. For example, your weight is not a density, it is an extensive quantity, or just an amount of stuff, not an amount of stuff per unit something. The amount of weight a person might gain in a set period of time, say a year, is an amount of stuff, is measured in kilograms, and is thus an extensive quantity and not a density. The amount of weight the person was gaining “per day” or “per hour” is an intensive quantity, so is a density.
密度函数在图形中经常出现(例如“概率密度函数”),有时它们会让人感到非常困惑,但了解它们究竟是什么将有助于我们使用它们,并在困惑时摆脱困惑。我们知道什么是函数,密度函数就是返回密度的函数。那么什么是密度?密度是“每单位某物”的东西,或者更正式地说是强度量。例如,你的体重不是密度,而是一个广义量,或者只是物质的数量,而不是每单位某物物质的数量。一个人在一定时期内(比如一年)可能增加的体重是物质的数量,以公斤为单位,因此是一个广义量而不是密度。一个人“每天”或“每小时”增加的体重是一个强度量,密度也是。
As an example of a non-density function, consider the amount of energy that is produced by a solar panel on a given day, July 1, 2014, and let’s say it is 120 kilojoules. That is an amount of “stuff.” Well that is fine, but is it enough to run my computer? My computer, if a desktop, needs a density of energy, or rate of energy, to keep working. So how do we take that day of energy and convert it into a rate of energy. We could divide it into segments of time. For example, we could do four-hour blocks, two-hour blocks, or one-hour blocks, and we would see that the rate changes during the day, but also that the amounts keep getting shorter as shown in Figure 2.22.
举一个非密度函数的例子,考虑在某一天(2014 年 7 月 1 日)太阳能电池板产生的能量,假设为 120 千焦耳。这是一个“能量”的量。好吧,这没问题,但是足以运行我的电脑吗?我的电脑(如果是台式机)需要一定的能量密度或能量速率才能继续工作。那么我们如何将当天的能量转换为能量速率呢?我们可以将其划分为多个时间段。例如,我们可以划分为四小时时间段、两小时时间段或一小时时间段,我们会看到速率在一天中发生变化,但能量量也会不断缩短,如图 2.22所示。
Figure 2.22. As the histogram of how much energy is produced in set time interval lowers the time intervals, the heights of the boxes go down (with zero height in the limit case as the width goes to zero).
图 2.22.随着设定时间间隔内产生的能量直方图降低时间间隔,框的高度会下降(在极限情况下,宽度为零,高度为零)。
As we divide time finer and finer, we would eventually get down to minutes and seconds and we would get more information about time variation, but the box heights would get so small that we wouldn’t see anything. So what we could do is re-scale the height of their boxes based on their widths, so (30kJ)/(0.5 h) = 60 kJ/h. If we use this new “KJ per hour” measure, the boxes no longer get shorter, as shown in Figure 2.23. If we take this process to the limit where the width of the box becomes infinitesimal, we get a smooth curve.
随着我们将时间划分得越来越细,最终我们会精确到分钟和秒,我们会得到更多关于时间变化的信息,但盒子的高度会变得太小,我们什么都看不到。所以我们可以根据盒子的宽度重新调整它们的高度,所以 (30kJ)/(0.5 h) = 60 kJ/h。如果我们使用这个新的“每小时千焦耳”测量方法,盒子就不会再变短了,如图 2.23所示。如果我们将这个过程推到盒子宽度无穷小的极限,我们会得到一条平滑的曲线。
This curve is an example of a density function. It would be called by some an “energy density” function where the dimension the density is taken over is time, and some contexts would be called a “temporal energy density” function. Because this particular density is so useful and commonly talked about, it gets its own name, power, and instead of saying “joules per hour,” we say Watts. Note that “Watts” is joules per second rather than per hour by convention; the specific units rather than dimension are chosen for convenience. For example, some physical units make more sense with meters, some with kilometers, and some with nanometers (and a few like spectral radiance for light use both meters and nanometers in the same quantity, so when you find yourself confused, it is not your fault).
该曲线是密度函数的一个例子。有人把它称为“能量密度”函数,其中密度所占的维度是时间,而有些情况下它被称为“时间能量密度”函数。因为这种特殊的密度非常有用且经常被谈论,所以它有自己的名字——功率,我们不说“焦耳每小时”,而是说瓦特。请注意,“瓦特”按照惯例是焦耳每秒而不是每小时;选择特定的单位而不是维度是为了方便。例如,有些物理单位用米更有意义,有些用公里,有些用纳米(还有一些像光的光谱辐射度一样,在同一数量中使用米和纳米,所以当你发现自己感到困惑时,这不是你的错)。
Putting this all together, (1) a density is always some kind of ratio where you say “so many X per unit Y” or “so many X per Y” like “so many kilometers per hour” (saying “so many kilometers per unit length” would be odd, but makes sense if everybody agrees what the unit of length is by default), and (2) a density function is a function that returns a density.
综上所述,(1)密度总是某种比率,你可以说“单位 Y 有多少个 X”或“每 Y 有多少个 X”,就像“每小时多少公里”(说“单位长度多少公里”会很奇怪,但如果每个人都同意长度的默认单位,那么这是有意义的),(2)密度函数是返回密度的函数。
Figure 2.23. If we divide the energy by the width of the box, it gets more detailed as we divide further.
图 2.23.如果我们用能量除以盒子的宽度,则除得越细,结果就越详细。
Density functions by themselves are useful for comparing relative concentrations at two different points. For example, with our energy density function defined over time (power), we can say “there is twice as much power at 2 pm as at 9 am” for example. But another way we can use them is to compute total quantity in a region. For example, to compute how much energy is produced between 2 pm and 4 pm, we just integrate:
密度函数本身可用于比较两个不同点的相对浓度。例如,使用随时间(功率)定义的能量密度函数,我们可以说“下午 2 点的功率是上午 9 点的两倍”。但我们可以使用它们的另一种方法是计算某个区域的总量。例如,要计算下午 2 点到 4 点之间产生了多少能量,我们只需积分:
Many integrals are this sort of “integrate a density function” but that is not spelled out. It can sometimes make things more clear if you tease out whether an integral is processing the “mass” of a density function in some interval or region.
许多积分都是这种“积分密度函数”但并没有明确说明。如果你弄清楚积分是否在处理某个区间或区域中的密度函数的“质量”,有时事情会变得更清楚。
The geometry of curves, and especially surfaces, plays a central role in graphics, and here, we review the basics of curves and surfaces in 2D and 3D space.
曲线的几何形状,尤其是曲面的几何形状,在图形学中起着核心的作用,在这里,我们回顾二维和三维空间中曲线和曲面的基础知识。
Intuitively, a curve is a set of points that can be drawn on a piece of paper without lifting the pen. A common way to describe a curve is using an implicit equation. An implicit equation in two dimensions has the form
直观地讲,曲线是一组点,可以不用抬起笔就画在一张纸上。描述曲线的常用方法是使用隐式方程。二维隐式方程的形式为
The function f (x, y) returns a real value. Points (x, y) where this value is zero are on the curve, and points where the value is nonzero are not on the curve. For example, let’s say that f (x, y) is
函数f ( x, y ) 返回一个实数值。该值为零的点 ( x, y ) 在曲线上,而该值非零的点不在曲线上。例如,假设f ( x, y ) 是
where (xc,yc) is a 2D point and r is a nonzero real number. If we take f (x, y) = 0, the points where this equality holds are on the circle with center (xc,yc) and radius r. The reason that this is called an “implicit” equation is that the points (x, y) on the curve cannot be immediately calculated from the equation and instead must be determined by solving the equation. Thus, the points on the curve are not generated by the equation explicitly, but they are buried somewhere implicitly in the equation.
其中 ( x c , y c ) 是二维点, r是非零实数。如果取f ( x, y ) = 0,则该等式成立的点位于以 ( x c , y c ) 为圆心、半径为r 的圆上。之所以将其称为“隐式”方程,是因为曲线上的点 ( x, y ) 不能立即从方程中计算出来,而是必须通过求解方程来确定。因此,曲线上的点不是由方程明确生成的,而是隐式地埋在方程的某个地方。
It is interesting to note that f does have values for all (x, y) . We can think of f as a terrain, with sea level at f = 0 (Figure 2.24). The shore is the implicit curve. The value of f is the altitude. Another thing to note is that the curve partitions space into regions where f > 0, f < 0, and f = 0. So you evaluate f to decide whether a point is “inside” a curve. Note that f (x, y) = c is a curve for any constant c,and c = 0 is just used as a convention. For example, if f (x, y) = x2 + y2 – 1, varying c just gives a variety of circles centered at the origin (Figure 2.25).
值得注意的是, f对于所有的 ( x, y ) 都有值。我们可以把f想象成地形,在f = 0 处为海平面(图 2.24 )。海岸是隐式曲线。f 的值是海拔。另外要注意的是,曲线将空间划分为f > 0、 f < 0 和f = 0 的区域。因此,通过计算f可以判断某个点是否在曲线“内部”。请注意, f ( x, y ) = c是任意常数c 的曲线,而c = 0 只是作为惯例。例如,如果f ( x, y ) = x2 + y2 – 1,则改变c只会给出以原点为中心的各种圆(图 2.25 )。
We can compress our notation using vectors. If we have c = (xc,yc) and p = (x, y) , then our circle with center c and radius r is defined by those position vectors that satisfy
我们可以使用向量来压缩符号。如果我们有c = ( x c , y c ) 和p = ( x, y ) ,那么以c为圆心、 r为半径的圆由满足以下条件的位置向量定义:
This equation, if expanded algebraically, will yield Equation (2.9), but it is easier to see that this is an equation for a circle by “reading” the equation geometrically. It reads, “points p on the circle have the following property: the vector from c to p when dotted with itself has value r2.” Because a vector dotted with itself is just its own length squared, we could also read the equation as, “points p on the circle have the following property: the vector from c to p has squared length r2.”
如果用代数方式展开这个方程,就会得到方程 (2.9),但通过用几何方式“解读”这个方程,更容易看出这是一个圆的方程。它是这样理解的:“圆上的点p具有以下属性:从c到p的向量加自身点后,其值为r 2 。”因为加自身点的向量就是其自身长度的平方,所以我们也可以将方程解读为:“圆上的点p具有以下属性:从c到p的向量的长度平方为r 2 。”
Figure 2.24. An implicit function f(x,y) = 0 can be thought of as a height field where f is the height (top). A path where the height is zero is the implicit curve (bottom).
图 2.24。隐函数f(x,y) = 0 可以看作是一个高度场,其中 f 是高度(顶部)。高度为零的路径是隐式曲线(底部)。
Figure 2.25. An implicit function defines a curve for any constant value, with zero being the usual convention.
图 2.25。隐函数为任意常数值定义一条曲线,通常为零。
Even better, is to observe that the squared length is just the squared distance from c to p, which suggests the equivalent form
更好的是,观察到长度的平方就是从c到p的距离的平方,这表明等效形式
and, of course, this suggests
当然,这意味着
The above could be read “the points p on the circle are those a distance r from the center point c,” which is as good a definition of circle as any. This illustrates that the vector form of an equation often suggests more geometry and intuition than the equivalent full-blown Cartesian form with x and y. For this reason, it is usually advisable to use vector forms when possible. In addition, you can support a vector class in your code; the code is cleaner when vector forms are used. The vector-oriented equations are also less error prone in implementation: once you implement and debug vector types in your code, the cut-and-paste errors involving x, y,and z will go away. It takes a little while to get used to vectors in these equations, but once you get the hang of it, the payoff is large.
上述内容可以理解为“圆上的点p是与中心点c距离为r的点”,这与任何圆的定义一样好。这说明方程的矢量形式通常比等效的具有x和y 的完整的笛卡尔形式更能体现几何学和直觉。因此,通常建议尽可能使用矢量形式。此外,您可以在代码中支持矢量类;使用矢量形式时代码会更简洁。面向矢量的方程在实现时也更不容易出错:一旦您在代码中实现并调试矢量类型,涉及x 、 y和z 的剪切粘贴错误就会消失。需要一点时间来适应这些方程中的矢量,但一旦掌握了窍门,回报将是巨大的。
If we think of the function f (x, y) as a height field with height = f (x, y) , the gradient vector points in the direction of maximum upslope, i.e., straight uphill. The gradient vector ∇f(x, y) is given by
如果我们将函数f ( x, y ) 视为高度 = f ( x, y ) 的高度场,则梯度向量指向最大上坡方向,即直线上坡。梯度向量 ∇ f ( x, y ) 由下式给出
The gradient vector evaluated at a point on the implicit curve f (x, y) = 0 is perpendicular to the tangent vector of the curve at that point. This perpendicular vector is usually called the normal vector to the curve. In addition, since the gradient points uphill, it indicates the direction of the f (x, y) > 0 region.
在隐式曲线f ( x, y ) = 0 上某一点求得的梯度向量垂直于该点处曲线的切向量。这个垂直向量通常称为曲线的法向量。此外,由于梯度指向上坡,它指示了f ( x, y ) > 0 区域的方向。
Figure 2.26. A surface height = f (x,y) is locally planar near (x,y) = (a,b). The gradient is a projection of the uphill direction onto the height = 0 plane.
图 2.26。表面高度 = f ( x,y ) 在 ( x,y ) = ( a,b ) 附近局部为平面。梯度是上坡方向在高度 = 0 平面上的投影。
In the context of height fields, the geometric meaning of partial derivatives and gradients is more visible than usual. Suppose that near the point (a, b) , f (x, y) is a plane (Figure 2.26). There is a specific uphill and downhill direction. At right angles to this direction is a direction that is level with respect to the plane. Any intersection between the plane and the f (x, y) = 0 plane will be in the direction that is level. Thus, the uphill/downhill directions will be perpendicular to the line of intersection f (x, y) = 0. To see why the partial derivative has something to do with this, we need to visualize its geometric meaning. Recall that the conventional derivative of a 1D function y = g(x) is
在高度场中,偏导数和梯度的几何意义比平常更加明显。假设在点 ( a, b ) 附近, f ( x, y ) 是一个平面(图 2.26 )。有一个特定的上坡和下坡方向。与此方向垂直的方向是与平面水平的方向。平面与f ( x, y ) = 0 平面之间的任何交点都将在水平方向上。因此,上坡/下坡方向将垂直于交线f ( x, y ) = 0。要了解偏导数为何与此有关,我们需要将其几何意义形象化。回想一下,一维函数y = g ( x ) 的传统导数为
This measures the slope of the tangent line to g (Figure 2.27).
测量结果为g的切线斜率(图 2.27 )。
Figure 2.27. The derivative of a 1D function measures the slope of the line tangent to the curve.
图 2.27。一维函数的导数测量曲线切线的斜率。
The partial derivative is a generalization of the 1D derivative. For a 2D function f (x, y) , we can’t take the same limit for x as in Equation (2.10), because f can change in many ways for a given change in x. However, if we hold y constant, we can define an analog of the derivative, called the partial derivative (Figure 2.28):
偏导数是一维导数的推广。对于二维函数f ( x, y ) ,我们不能像公式 (2.10) 中那样对x取相同的极限,因为f会随着x的给定变化而发生多种变化。但是,如果我们保持y不变,我们可以定义导数的类似物,称为偏导数(图 2.28 ):
Why is it that the partial derivatives with respect to x and y are the components of the gradient vector? Again, there is more obvious insight in the geometry than in the algebra. In Figure 2.29, we see the vector a travels along a path where f does not change. Note that this is again at a small enough scale that the surface height (x, y) = f (x, y) can be considered locally planar. From the figure, we see that the vector a = (Δx, Δy) .
为什么关于x和y 的偏导数是梯度向量的分量?同样,几何学比代数更能说明这个问题。在图 2.29中,我们看到向量a沿着f不变的路径行进。请注意,这又是在一个足够小的尺度上,表面高度 ( x, y ) = f ( x, y ) 可以被认为是局部平面的。从图中,我们看到向量a = (Δ x, Δ y )。
Because the uphill direction is perpendicular to a, we know the dot product is equal to zero:
因为上坡方向垂直于a ,所以我们知道点积等于零:
Figure 2.28. The partial derivative of a function f with respect to x must hold y constant to have a unique value, as shown by the dark point. The hollow points show other values of f that do not hold y constant.
图 2.28。函数f关于x的偏导数必须保持y不变才能具有唯一值,如黑点所示。空心点表示f的其他值,这些值不保持y不变。
We also know that the change in f in the direction (xa,ya) equals zero:
我们还知道f在方向 ( x a , y a ) 上的变化等于零:
Given any vectors (x, y) and (x,y) that are perpendicular, we know that the angle between them is 90 degrees, and thus, their dot product equals zero (recall that the dot product is proportional to the cosine of the angle between the two vectors). Thus, we have xx + yy = 0. Given (x, y) , it is easy to construct valid vectors whose dot product with (x, y) equals zero, the two most obvious being (y,–x) and (–y, x) ; you can verify that these vectors give the desired zero dot product with (x, y). A generalization of this observation is that (x, y) is perpendicular to k(y,–x) where k is any nonzero constant. This implies that
给定任何垂直向量 ( x, y ) 和 ( x,y ),我们知道它们之间的角度是 90 度,因此它们的点积等于零(回想一下,点积与两个向量之间角度的余弦成比例)。因此,我们有xx + yy = 0。给定 ( x, y ),很容易构造与 ( x, y ) 的点积等于零的有效向量,最明显的两个是 ( y,–x ) 和 (– y, x ) ;您可以验证这些向量与 ( x, y ) 的点积是否为零。这一观察的概括是 ( x, y ) 垂直于k ( y ,–x),其中k是任何非零常数。这意味着
Figure 2.29. The vector a points in a direction where f has no change and is thus perpendicular to the gradient vector ∇f.
图 2.29。向量a指向f没有变化的方向,因此垂直于梯度向量 ∇ f 。
Combining Equations (2.11) and (2.12) gives
结合方程 (2.11) 和 (2.12) 可得
where k is any nonzero constant. By definition, “uphill” implies a positive change in f , so we would like k > 0,and k = 1 is a perfectly good convention.
其中k是任何非零常数。根据定义,“上坡”意味着f发生正变化,因此我们希望k > 0,和k = 1 是一个非常好的惯例。
As an example of the gradient, consider the implicit circle x2 + y2 – 1 = 0 with gradient vector (2x, 2y) , indicating that the outside of the circle is the positive region for the function f (x, y) = x2 + y2 – 1. Note that the length of the gradient vector can be different depending on the multiplier in the implicit equation. For example, the unit circle can be described by Ax2 + Ay2 – A = 0 for any nonzero A. The gradient for this curve is (2Ax, 2Ay) . This will be normal (perpendicular) to the circle, but will have a length determined by A. For A > 0, the normal will point outward from the circle, and for A < 0, it will point inward. This switch from outward to inward is as it should be, since the positive region switches inside the circle. In terms of the height-field view, h = Ax2 + Ay2 – A, and the circle is at zero altitude. For A > 0, the circle encloses a depression, and for A < 0, the circle encloses a bump. As A becomes more negative, the bump increases in height, but the h = 0 circle doesn’t change. The direction of maximum uphill doesn’t change, but the slope increases. The length of the gradient reflects this change in degree of the slope. So intuitively, you can think of the gradient’s direction as pointing uphill and its magnitude as measuring how uphill the slope is.
以梯度为例,考虑隐式圆x 2 + y 2 – 1 = 0,其梯度向量为 (2 x, 2 y ) ,表示圆外部是函数f ( x, y ) = x 2 + y 2 – 1 的正区域。请注意,梯度向量的长度可能因隐式方程中的乘数而不同。例如,对于任何非零的A ,单位圆可以用Ax 2 + Ay 2 – A = 0 来描述。此曲线的梯度为 (2 Ax, 2 Ay )。这将垂直于圆,但长度由A决定。对于A > 0,法线将指向圆外,而对于A < 0,它将指向圆内。这种从外到内的转变是理所当然的,因为正区域在圆内切换。从高度场角度看, h = Ax 2 + Ay 2 – A ,圆位于零高度。对于A > 0,圆包围一个凹陷,对于A < 0,圆包围一个凸起。随着A变得越来越负,凸起的高度会增加,但h = 0 圆不会改变。最大上坡方向不会改变,但坡度会增加。坡度的长度反映了坡度的变化。 因此,直观地讲,你可以将坡度的方向视为指向上坡,将其大小视为测量斜坡的上坡程度。
The familiar “slope-intercept” form of the line is
熟悉的“斜率截距”直线形式是
This can be converted easily to implicit form (Figure 2.30):
这可以很容易地转换为隐式形式(图 2.30 ):
Here, m is the “slope” (ratio of rise to run), and b is the y value where the line crosses the y-axis, usually called the y-intercept. The line also partitions the 2D plane, but here “inside” and “outside” might be more intuitively called “over” and “under.”
这里, m是“斜率”(上升与下降的比率), b是直线与y轴相交处的y值,通常称为y 截距。该线也将二维平面进行划分,但这里的“内部”和“外部”可能更直观地称为“上方”和“下方”。
Figure 2.30. A 2D line can be described by the equation y - mx - b = 0.
图 2.30。二维线可以用方程y - mx - b = 0 来描述。
Because we can multiply an implicit equation by any constant without changing the points where it is zero, kf (x, y) = 0 is the same curve for any nonzero k. This allows several implicit forms for the same line, for example,
因为我们可以将隐式方程乘以任何常数而不改变其为零的点,所以对于任何非零k来说, kf ( x, y ) = 0 都是同一条曲线。这允许同一条直线有几种隐式形式,例如,
One reason the slope-intercept form is sometimes awkward is that it can’t represent some lines such as x = 0 because m would have to be infinite. For this reason, a more general form is often useful:
斜率截距形式有时不方便的一个原因是它不能表示某些直线,例如x = 0,因为m必须是无限的。因此,更一般的形式通常很有用:
for real numbers A, B, C.
对于实数A 、 B 、 C 。
Suppose we know two points on the line, (x0, y0) and (x1,y1) . What A, B, and C describe the line through these two points? Because these points lie on the line, they must both satisfy Equation (2.15):
假设我们知道直线上的两个点, ( x0 , y0 )和( x1 ,y1 ) 。A 、 B和C分别描述过这两点的直线吗?因为这些点位于直线上,所以它们都必须满足公式(2.15 ) :
Unfortunately, we have two equations and three unknowns: A, B, and C. This problem arises because of the arbitrary multiplier we can have with an implicit equation. We could set C = 1 for convenience:
不幸的是,我们有两个方程和三个未知数: A、B和C。这个问题的出现是因为我们可以用隐式方程得到任意乘数。为了方便起见,我们可以设置C = 1:
but we have a similar problem to the infinite slope case in slope-intercept form: lines through the origin would need to have A(0) + B(0) + 1 = 0, which is a contradiction. For example, the equation for a 45–° line through the origin can be written x – y = 0, or equally well y – x = 0, or even 17y – 17x = 0, but it cannot be written in the form Ax + By +1 = 0.
但在斜率截距形式中,我们有一个与无限斜率情况类似的问题:通过原点的直线需要有A (0) + B (0) + 1 = 0,这是一个矛盾。例如,通过原点的 45 °直线方程可以写成x – y = 0,或者同样可以写成y – x = 0,甚至可以写成 17 y – 17 x = 0,但不能写成Ax + By +1 = 0 的形式。
Whenever we have such pesky algebraic problems, we try to solve the problems using geometric intuition as a guide. One tool we have, as discussed in Section 2.7.2, is the gradient. For the line Ax + By + C = 0, the gradient vector is (A, B) . This vector is perpendicular to the line (Figure 2.31), and points to the side of the line where Ax + By + C is positive. Given two points on the line (x0,y0) and (x1,y1) , we know that the vector between them points in the same direction as the line. This vector is just (x1 – x0,y1 – y0), and because it is parallel to the line, it must also be perpendicular to the gradient vector (A, B) . Recall that there are an infinite number of (A, B, C) that describe the line because of the arbitrary scaling property of implicits. We want any one of the valid (A, B, C) .
每当我们遇到这类棘手的代数问题时,我们都会尝试以几何直觉为指导来解决这些问题。如第 2.7.2 节所述,我们拥有的一个工具是梯度。对于直线Ax + By + C = 0,梯度向量为( A,B )。该向量垂直于直线(图 2.31 ),并指向直线上Ax + By + C为正的一侧。给定直线上的两点( x0 ,y0 )和( x1 ,y1 ) ,我们知道它们之间的向量指向与直线相同的方向。这个向量就是( x1 - x0 ,y1 - y0 ) ,因为它平行于直线,所以它也必须垂直于梯度向量( A,B )。回想一下,由于隐式函数的任意缩放属性,描述直线的( A,B,C )有无数个。我们想要有效的( A,B,C )中的任意一个。
Figure 2.31. The gradient vector (A, B) is perpendicular to the implicit line Ax + By + C = 0.
图 2.31。梯度向量 ( A, B ) 垂直于隐式直线Ax + By + C = 0。
We can start with any (A, B) perpendicular to (x1–x0,y1–y0). Such a vector is just (A, B) = (y0 –y1, x1 – x0) by the same reasoning as in Section 2.7.2. This means that the equation of the line through (x0,y0) and (x1,y1) is
我们可以从任何垂直于 ( x 1 – x 0 , y 1 – y 0 ) 的 ( A, B ) 开始。根据与第 2.7.2 节相同的推理,这样的向量就是 ( A, B ) = ( y 0 –y 1 , x 1 – x 0 )。这意味着通过 ( x 0 ,y 0 ) 和 ( x 1 ,y 1 ) 的直线方程为
Now we just need to find C. Because (x0,y0) and (x1,y1) are on the line, they must satisfy Equation (2.16). We can plug either value in and solve for C. Doing this for (x0,y0) yields C = x0y1 – x1y0, and thus, the full equation for the line is
现在我们只需要找到C 。因为( x 0 ,y 0 )和( x 1 ,y 1 )在直线上,所以它们必须满足公式 (2.16)。我们可以代入任意一个值并求解C 。对( x 0 ,y 0 )执行此操作可得出C = x 0 y 1 – x 1 y 0 ,因此,该直线的完整方程为
Again, this is one of infinitely many valid implicit equations for the line through two points, but this form has no division operation and thus no numerically degenerate cases for points with finite Cartesian coordinates. A nice thing about Equation (2.17) is that we can always convert to the slope-intercept form (when it exists) by moving the non-y terms to the right-hand side of the equation and dividing by the multiplier of the y term:
同样,这是通过两点的直线的无数有效隐式方程之一,但这种形式没有除法运算,因此对于具有有限笛卡尔坐标的点,没有数值退化的情况。方程 (2.17) 的一个好处是,我们总是可以通过将非y项移到等式的右侧并除以y项的乘数来转换为斜率截距形式(当它存在时):
An interesting property of the implicit line equation is that it can be used to find the signed distance from a point to the line. The value of Ax + By + C is proportional to the distance from the line (Figure 2.32). As shown in Figure 2.33, the distance from a point to the line is the length of the vector k(A, B) ,which is
隐式直线方程的一个有趣特性是,它可以用来求出从点到直线的有符号距离。Ax + By + C的值与到直线的距离成正比(图 2.32 )。如图 2.33所示,从点到直线的距离是矢量k ( A, B ) 的长度,即
Figure 2.32. The value of the implicit function f (x,y) = Ax + By + C is a constant times the signed distance from Ax + By + C = 0.
图 2.32。隐函数f ( x,y ) = Ax + By + C的值是一个常数乘以与Ax + By + C = 0 的有符号距离。
For the point (x, y)+ k(A, B) ,thevalueof f (x, y) = Ax + By + C is
对于点 ( x, y )+ k ( A, B ) , f ( x, y ) = Ax + By + C 的值为
Figure 2.33. The vector k(A,B) connects a point (x,y) on the line closest to a point not on the line. The distance is proportional to k.
图 2.33。向量k ( A,B ) 连接直线上的点 ( x,y ) 和直线外的点之间的最近距离。距离与 k 成正比。
The simplification in that equation is a result of the fact that we know (x, y) is on the line, so Ax + By + C = 0. From Equations (2.18) and (2.19), we can see that the signed distance from line Ax + By + C = 0 to a point (a, b) is
该方程的简化是因为我们知道 ( x, y ) 在线上,所以Ax + By + C = 0。从方程 (2.18) 和 (2.19),我们可以看出从直线Ax + By + C = 0 到点 ( a, b ) 的有符号距离为
Here, “signed distance” means that its magnitude (absolute value) is the geometric distance, but on one side of the line, distances are positive and on the other, they are negative. You can choose between the equally valid representations f (x, y) = 0 and –f (x, y) = 0 if your problem has some reason to prefer a particular side being positive. Note that if (A, B) is a unit vector, then f (a, b) is the signed distance. We can multiply Equation (2.17) by a constant that ensures that (A, B) is a unit vector:
此处,“有符号距离”表示其量级(绝对值)是几何距离,但在线的一侧,距离为正,而另一侧则为负。如果您的问题有某种原因需要某一侧为正,则可以在同样有效的表示f ( x, y ) = 0 和–f ( x, y ) = 0 之间进行选择。请注意,如果 ( A, B ) 是单位向量,则f ( a, b ) 是有符号距离。我们可以将公式 (2.17) 乘以一个常数,以确保 ( A, B ) 是单位向量:
Note that evaluating f (x, y) in Equation (2.20) directly gives the signed distance, but it does require a square root to set up the equation. Implicit lines will turn out to be very useful for triangle rasterization (Section 9.1.2). Other forms for 2D lines are discussed in Chapter 13.
请注意,在公式 (2.20) 中直接求f ( x, y ) 可得出有符号距离,但需要平方根才能建立公式。隐式线对于三角形光栅化非常有用(第 9.1.2 节)。第 13 章将讨论二维线的其他形式。
In the previous section, we saw that a linear function f (x, y) gives rise to an implicit line f (x, y) = 0. If f is instead a quadratic function of x and y, with the general form
在上一节中,我们看到线性函数f ( x, y ) 会产生一条隐式直线f ( x, y ) = 0。如果f是x和y的二次函数,则一般形式为
the resulting implicit curve is called a quadric. Two-dimensional quadric curves include ellipses and hyperbolas, as well as the special cases of parabolas, circles, and lines.
由此产生的隐式曲线称为二次曲线。二维二次曲线包括椭圆和双曲线,以及特殊情况的抛物线、圆和直线。
Examples of quadric curves include the circle with center (xc, yc) and radius r,
二次曲线的例子包括以 ( xc , yc ) 为中心、半径为r 的圆,
and axis-aligned ellipses of the form
以及轴对齐椭圆的形式
where (xc,yc) is the center of the ellipse, and a and b are the minor and major semi-axes (Figure 2.34).
其中( xc , yc )是椭圆的中心, a和b分别是短半轴和长半轴(图 2.34 )。
Figure 2.34. The ellipse with center (xc, yc) and semiaxes of length a and b.
图 2.34以 ( xc , yc ) 为中心、半轴长度为a和b 的椭圆。
Just as implicit equations can be used to define curves in 2D, they can be used to define surfaces in 3D. As in 2D, implicit equations implicitly define a set of points that are on the surface:
隐式方程可用于定义二维曲线,同样,它也可用于定义三维曲面。与二维一样,隐式方程隐式定义了曲面上的一组点:
Any point (x, y, z) that is on the surface results in zero when given as an argument to f . Any point not on the surface results in some number other than zero. You can check whether a point is on the surface by evaluating f , or you can check which side of the surface the point lies on by looking at the sign of f , but you cannot always explicitly construct points on the surface. Using vector notation, we will write such functions of p = (x, y, z) as
任何位于表面上的点 ( x, y, z ) 作为f的参数给出时都会返回零。任何不在表面上的点都会返回除零以外的某个数字。您可以通过评估f来检查某个点是否位于表面上,或者您可以通过查看f的符号来检查该点位于表面的哪一侧,但您无法始终明确构造表面上的点。使用向量符号,我们将p = ( x, y, z ) 的函数写为
A surface normal (which is needed for lighting computations, among other things) is a vector perpendicular to the surface. Each point on the surface may have a different normal vector. In the same way that the gradient provides a normal to an implicit curve in 2D, the surface normal at a point p on an implicit surface is given by the gradient of the implicit function
表面法线(除其他外,还用于照明计算)是垂直于表面的向量。表面上的每个点可能具有不同的法线向量。与梯度为二维中的隐式曲线提供法线的方式相同,隐式表面上点p处的表面法线由隐式函数的梯度给出
The reasoning is the same as for the 2D case: the gradient points in the direction of fastest increase in f , which is perpendicular to all directions tangent to the surface, in which f remains constant. The gradient vector points toward the side of the surface where f (p) > 0, which we may think of as “into” the surface or “out from” the surface in a given context. If the particular form of f creates inward-facing gradients, and outward-facing gradients are desired, the surface –f (p) = 0 is the same as surface f (p) = 0 but has directionally reversed gradients, i.e., –∇f (p) = ∇(–f (p)) .
其原因与二维情况相同:梯度指向f增长最快的方向,该方向垂直于与表面相切的所有方向,其中f保持不变。梯度向量指向f ( p ) > 0 所在的表面一侧,在特定情况下,我们可以将其视为“进入”表面或“离开”表面。如果f的特定形式会产生向内的梯度,而需要向外的梯度,则表面–f ( p ) = 0 与表面f ( p ) = 0 相同,但梯度方向相反,即 –∇ f ( p ) = ∇ (–f ( p ))。
As an example, consider the infinite plane through point a with surface normal n. The implicit equation to describe this plane is given by
例如,考虑通过点a且表面法线为n的无限平面。描述该平面的隐式方程为
Note that a and n are known quantities. The point p is any unknown point that satisfies the equation. In geometric terms this equation says “the vector from a to p is perpendicular to the plane normal.” If p were not in the plane, then (p – a) would not make a right angle with n (Figure 2.35).
请注意, a和n是已知量。点p是满足该方程的任意未知点。从几何角度看,该方程表示“从a到p的矢量垂直于平面法线”。如果p不在平面上,则 ( p - a ) 不会与n成直角(图 2.35 )。
Figure 2.35. Any of the points p shown are in the plane with normal vector n that includes point a if Equation (2.21) is satisfied.
图 2.35.如果满足方程 (2.21),则所示的任何点p都位于包含点a的法向量为n的平面上。
Sometimes, we want the implicit equation for a plane through points a, b, and c. The normal to this plane can be found by taking the cross product of any two vectors in the plane. One such cross product is
有时,我们想要通过点a 、 b和c 的平面的隐式方程。可以通过对平面中任意两个向量进行叉积来找到该平面的法线。其中一个叉积是
This allows us to write the implicit plane equation:
这使得我们可以写出隐式平面方程:
A geometric way to read this equation is that the volume of the parallelepiped defined by p – a, b – a, and c – a is zero; i.e., they are coplanar. This can only be true if p is in the same plane as a, b, and c. The full-blown Cartesian representation for this is given by the determinant (this is discussed in more detail in Section 6.3):
用几何学的方法来理解这个方程,就是p – a 、 b – a和c – a定义的平行六面体的体积为零;也就是说,它们是共面的。只有当p与a 、 b和c位于同一平面时,这才是正确的。完整的笛卡尔表示由行列式给出(第 6.3 节将对此进行更详细的讨论):
The determinant can be expanded (see Section 6.3 for the mechanics of expanding determinants) to the bloated form with many terms.
行列式可以展开(有关展开行列式的机制见第 6.3 节)为具有多项式的膨胀形式。
Equations (2.22) and (2.23) are equivalent, and comparing them is instructive. Equation (2.22) is easy to interpret geometrically and will yield efficient code. In addition, it is relatively easy to avoid a typographic error that compiles into incorrect code if it takes advantage of debugged cross and dot product code. Equation (2.23) is also easy to interpret geometrically and will be efficient provided an efficient 3 × 3 determinant function is implemented. It is also easy to implement without a typo if a function determinant (a, b, c) is available. It will be especially easy for others to read your code if you rename the determinant function volume. So both Equations (2.22) and (2.23) map well into code. The full expansion of either equation into x-, y-, and z-components is likely to generate typos. Such typos are likely to compile and, thus, to be especially pesky. This is an excellent example of clean math generating clean code and bloated math generating bloated code.
方程 (2.22) 和 (2.23) 是等价的,对它们进行比较很有启发。方程 (2.22) 很容易从几何角度解释,并且可以产生高效的代码。此外,如果利用已调试的交叉积和点积代码,则可以相对容易地避免编译成错误代码的印刷错误。方程 (2.23) 也很容易从几何角度解释,并且只要实现了高效的 3 × 3 行列式函数,它就会很高效。如果有函数行列式( a , b , c ),它也很容易实现而没有打字错误。如果将行列式函数重命名为体积,其他人将特别容易阅读您的代码。因此,方程 (2.22) 和 (2.23) 都可以很好地映射到代码中。将任何一个方程完全展开为x -、 y - 和z -分量都可能产生打字错误。这种打字错误很可能通过编译,因此特别麻烦。这是干净的数学生成干净的代码和臃肿的数学生成臃肿的代码的绝佳例子。
Just as quadratic polynomials in two variables define quadric curves in 2D, quadratic polynomials in x, y,and z define quadric surfaces in 3D. For instance, a sphere can be written as
正如二元二次多项式定义二维中的二次曲线一样, x 、 y和z中的二次多项式定义三维中的二次曲面。例如,球面可以写成
and an axis-aligned ellipsoid may be written as
轴对齐的椭圆体可以写成
One might hope that an implicit 3D curve could be created with the form f (p) = 0. However, all such curves are just degenerate surfaces and are rarely useful in practice. A 3D curve can be constructed from the intersection of two simultaneous implicit equations:
人们可能希望能够创建形式为f ( p ) = 0 的隐式三维曲线。然而,所有此类曲线都只是退化曲面,在实践中很少有用。可以通过两个同时隐式方程的交集构造三维曲线:
For example, a 3D line can be formed from the intersection of two implicit planes. Typically, it is more convenient to use parametric curves instead; they are discussed in the following sections.
例如,一条三维线可以由两个隐式平面的相交形成。通常,使用参数曲线更为方便;它们将在以下章节中讨论。
A parametric curve is controlled by a single parameter that can be considered a sort of index that moves continuously along the curve. Such curves have the form
参数曲线由单个参数控制,该参数可以看作是沿曲线连续移动的一种指数。此类曲线的形式为
Here, (x, y) is a point on the curve, and t is the parameter that influences the curve. For a given t, there will be some point determined by the functions g and h. For continuous g and h, a small change in t will yield a small change in x and y. Thus, as t continuously changes, points are swept out in a continuous curve. This is a nice feature because we can use the parameter t to explicitly construct points on the curve. Often, we can write a parametric curve in vector form,
这里,( x,y )是曲线上的一个点, t是影响曲线的参数。对于给定的t ,将存在由函数g和h确定的某个点。对于连续的g和h , t的微小变化将导致x和y的微小变化。因此,随着t的连续变化,点会以连续曲线的形式扫过。这是一个很好的特性,因为我们可以使用参数t来明确构造曲线上的点。通常,我们可以以矢量形式写出参数曲线,
where f is a vector-valued function, . Such vector functions can generate very clean code, so they should be used when possible.
其中f是矢量值函数, f : R → R 2 。此类向量函数可以生成非常干净的代码,因此应尽可能使用它们。
We can think of the curve with a position as a function of time. The curve can go anywhere and could loop and cross itself. We can also think of the curve as having a velocity at any point. For example, the point p(t) is traveling slowly near t = –2 and quickly between t = 2 and t = 3. This type of “moving point” vocabulary is often used when discussing parametric curves even when the curve is not describing a moving point.
我们可以将具有位置的曲线视为时间函数。曲线可以到达任何地方,可以循环并交叉自身。我们还可以认为曲线在任何一点都有速度。例如,点p ( t ) 在t = -2附近缓慢移动,在t = 2 和t = 3 之间快速移动。即使曲线不是在描述移动点,在讨论参数曲线时也经常使用这种“移动点”词汇。
A parametric line in 2D that passes through points p0 = (x0, y0) and p1 = (x1,y1) can be written as
经过点p 0 = ( x 0 , y 0 ) 和p 1 = ( x 1 , y 1 ) 的二维参数线可以写成
Because the formulas for x and y have such similar structure, we can use the vector form for p = (x, y) (Figure 2.36):
由于x和y的公式具有非常相似的结构,我们可以使用p = ( x, y ) 的向量形式(图 2.36 ):
Figure 2.36. A 2D parametric line through p0 and p1. The line segment defined by t ∈ [0,1] is shown in bold.
图 2.36过p 0和p 1 的二维参数线。t ∈ [0,1] 定义的线段以粗体显示。
You can read this in geometric form as “start at point p0 and go some distance toward p1 determined by the parameter t.” A nice feature of this form is that p(0) = p0 and p(1) = p1. Since the point changes linearly with t, the value of t between p0 and p1 measures the fractional distance between the points. Points with t < 0 are to the “far” side of p0, and points with t > 1 are to the “far” side of p1.
您可以用几何形式将其理解为“从点p 0开始,向p 1移动一段距离,该距离由参数t决定”。此形式的一个很好的特点是p (0) = p 0和p (1) = p 1 。由于点随t线性变化,因此p 0和p 1之间的t值测量了点之间的分数距离。t < 0 的点位于p 0的“远”侧, t > 1 的点位于p 1的“远”侧。
Parametric lines can also be described as just a point o and a vector d:
参数线也可以描述为一个点o和一个向量d :
When the vector d has unit length, the line is arc-length parameterized. This means t is an exact measure of distance along the line. Any parametric curve can be arc-length parameterized, which is obviously a very convenient form, but not all can be converted analytically.
当向量d具有单位长度时,直线是弧长参数化的。这意味着t是沿直线距离的精确测量值。任何参数曲线都可以是弧长参数化的,这显然是一种非常方便的形式,但并非所有曲线都可以进行解析转换。
A circle with center (xc,yc) and radius r has a parametric form:
以 ( xc , yc ) 为圆心,以r为半径的圆具有参数形式:
To ensure that there is a unique parameter ϕ for every point on the curve, we can restrict its domain: ϕ ∈ [0, 2π) or ϕ ∈ (–π, π] or any other half-open interval of length 2π.
为了确保曲线上每个点都有一个唯一的参数 ϕ,我们可以限制它的定义域: ϕ ∈ [0, 2 π ) 或 ϕ ∈ (-π, π] 或任何其他长度为 2 π的半开区间。
An axis-aligned ellipse can be constructed by scaling the x and y parametric equations separately:
可以通过分别缩放x和y参数方程来构建轴对齐椭圆:
A 3D parametric curve operates much like a 2D parametric curve:
3D 参数曲线的操作与 2D 参数曲线非常相似:
For example, a spiral around the z-axis is written as
例如,绕z轴的螺旋线写为
As with 2D curves, the functions f , g, and h are defined on a domain D ⊂ R if we want to control where the curve starts and ends. In vector form, we can write
与二维曲线一样,如果我们想控制曲线的起点和终点,函数f 、 g和h定义在域D ⊂ R 上。以矢量形式,我们可以写成
In this chapter, we only discuss 3D parametric lines in detail. General 3D parametric curves are discussed more extensively in Chapter 15.
在本章中,我们仅详细讨论 3D 参数线。第 15 章将更广泛地讨论一般 3D 参数曲线。
A 3D parametric line can be written as a straightforward extension of the 2D parametric line, e.g.,
三维参数线可以写成二维参数线的直接扩展,例如,
This is cumbersome and does not translate well to code variables, so we will write it in vector form:
这很麻烦,并且不能很好地转换为代码变量,因此我们将其写为向量形式:
where, for this example, o and d are given by
其中,对于此示例, o和d由下式给出
Note that this is very similar to the 2D case. The way to visualize this is to imagine that the line passes through o and is parallel to d. Given any value of t, you get some point p(t) on the line. For example, at t = 2, p(t) = (2, 1, 3) + 2(7, 2, –5) = (16, 5, –7) . This general concept is the same as for two dimensions (Figure 2.36).
请注意,这与二维的情况非常相似。可视化的方法是想象一条线经过o并且与d平行。给定任何t值,你都会得到线上的某个点p ( t )。例如,在t = 2 时, p ( t ) = (2, 1, 3) + 2(7, 2, – 5) = (16, 5, – 7) 。这个一般概念与二维相同(图 2.36 )。
As in 2D, a line segment can be described by a 3D parametric line and an interval t ∈ [ta,tb]. The line segment between two points a and b is given by p(t) = a + t(b – a) with t ∈ [0, 1]. Here, p(0) = a, p(1) = b, and p(0.5) = (a + b)/2, the midpoint between a and b.
与二维一样,线段可以用三维参数线和区间t ∈ [ t a , t b ] 来描述。两点a和b之间的线段由p ( t ) = a + t ( b – a ) 给出,其中t ∈ [0, 1]。这里, p (0) = a , p (1) = b ,以及p (0.5) = ( a + b ) / 2,即a和b之间的中点。
A ray, or half-line, is a 3D parametric line with a half-open interval, usually [0, ∞) . From now on, we will refer to all lines, line segments, and rays as “rays.” This is sloppy, but corresponds to common usage and makes the discussion simpler.
射线,或半线,是具有半开区间的 3D 参数线,通常为 [0, ∞) 。从现在开始,我们将所有线、线段和射线都称为“射线”。这很草率,但符合常见用法,并使讨论更简单。
The parametric approach can be used to define surfaces in 3D space in much the same way we define curves, except that there are two parameters to address the two-dimensional area of the surface. These surfaces have the form
参数化方法可用于定义三维空间中的曲面,其方式与定义曲线的方式非常相似,只是有两个参数用于处理曲面的二维区域。这些曲面具有以下形式
or, in vector form,
或者以矢量形式,
With implicit surfaces, the derivative of the function f gave us the surface normal. With parametric surfaces, the derivatives of p also give information about the surface geometry.
对于隐式曲面,函数f的导数给出了曲面法线。对于参数曲面, p的导数还给出了曲面几何的信息。
Consider the function q(t) = p(t, v0) . This function defines a parametric curve obtained by varying u while holding v fixed at the value v0. This curve, called an isoparametric curve (or sometimes “isoparm” for short), lies in the surface. The derivative of q gives a vector tangent to the curve, and since the curve lies in the surface, the vector q also lies in the surface. Since it was obtained by varying one argument of p, the vector q is the partial derivative of p with respect to u, which we’ll denote pu. A similar argument shows that the partial derivative pv gives the tangent to the isoparametric curves for constant u, which is a second tangent vector to the surface.
考虑函数q ( t ) = p ( t, v 0 ) 。该函数定义了一条参数曲线,该曲线通过改变u获得,同时将v保持在v 0 的值不变。这条曲线称为等参曲线(有时简称为“等参线”)位于曲面上。q的导数给出与曲线相切的向量,由于曲线位于曲面上,向量q也位于曲面上。由于它是通过改变p的一个参数获得的,因此向量q是p对u的偏导数,我们将其表示为p u 。类似的论证表明,偏导数p v给出常数u 时等参曲线的切线,它是曲面的第二个切向量。
Figure 2.37. The geometry for spherical coordinates.
图 2.37.球坐标的几何形状。
The derivative of p, then, gives two tangent vectors at any point on the surface. The normal to the surface may be found by taking the cross product of these vectors: since both are tangent to the surface, their cross product, which is perpendicular to both tangents, is normal to the surface. The right-hand rule for cross products provides a way to decide which side is the front, or outside, of the surface; we will use the convention that the vector
然后, p的导数给出曲面上任意一点的两个切向量。曲面的法线可以通过取这些向量的叉积来找到:由于两者都与曲面相切,因此它们的叉积(垂直于两个切线)垂直于曲面。叉积的右手定则提供了一种确定哪一侧是曲面的前端或外侧的方法;我们将使用以下惯例:向量
points toward the outside of the surface.
指向表面的外侧。
Implicit curves in 2D or surfaces in 3D are defined by scalar-valued functions of two or three variables, f : ℝ2 → ℝ or f : ℝ3 → ℝ, and the surface consists of all points where the function is zero:
二维中的隐式曲线或三维中的曲面由两个或三个变量的标量值函数定义, f : ℝ 2 → ℝ 或f : ℝ 3 → ℝ,曲面由函数为零的所有点组成:
Parametric curves in 2D or 3D are defined by vector-valued functions of one variable, p : D ⊂ ℝ → ℝ2 or p : D ⊂ ℝ → ℝ3, and the curve is swept out as t varies over all of D:
二维或三维中的参数曲线由一个变量的向量值函数定义, p : D ⊂ ℝ → ℝ 2或p : D ⊂ ℝ → ℝ 3 ,并且曲线随着t在整个D上的变化而变化:
Parametric surfaces in 3D are defined by vector-valued functions of two variables, p : D ⊂ ℝ2 → ℝ3, and the surface consists of the images of all points (u, v) in the domain:
三维中的参数曲面由两个变量的向量值函数定义, p : D⊂ℝ2 → ℝ3 ,曲面由域内所有点( u,v )的图像组成:
For implicit curves and surfaces, the normal vector is given by the derivative of f (the gradient), and the tangent vector (for a curve) or vectors (for a surface) can be derived from the normal by constructing a basis.
对于隐式曲线和曲面,法向量由f的导数(梯度)给出,并且可以通过构建基从法向量中导出切向量(对于曲线)或向量(对于曲面)。
For parametric curves and surfaces, the derivative of p gives the tangent vector (for a curve) or vectors (for a surface), and the normal vector can be derived from the tangents by constructing a basis.
对于参数曲线和曲面, p的导数给出切向量(对于曲线)或向量(对于曲面),而法向量可以通过构建基从切线推导出来。
Perhaps the most common mathematical operation in graphics is linear interpolation. We have already seen an example of linear interpolation of position to form line segments in 2D and 3D, where two points a and b are associated with a parameter t to form the line p = (1 – t)a + tb. Thisis interpolation because p goes through a and b exactly at t = 0 and t = 1. Itis linear interpolation because the weighting terms t and 1 – t are linear polynomials of t.
图形学中最常见的数学运算可能是线性插值。我们已经看到了位置线性插值的例子,它形成二维和三维中的线段,其中两个点a和b与参数t相关联,形成直线p = (1 – t ) a + t b 。这是插值,因为p在t = 0 和t = 1 时恰好经过a和b 。这是线性插值,因为权重项t和 1 – t是t的线性多项式。
Another common linear interpolation is among a set of positions on the x-axis: x0, x1, ..., xn, and for each xi, we have an associated height, yi. We want to create a continuous function y = f (x) that interpolates these positions, so that f goes through every data point, i.e., f (xi) = yi. For linear interpolation, the points (xi,yi) are connected by straight line segments. It is natural to use parametric line equations for these segments. The parameter t is just the fractional distance between xi and xi+1:
另一种常见的线性插值是在x轴上的一组位置之间进行: x 0 , x 1 , ... , x n ,并且对于每个 x ,我们都有一个关联的高度 y 。我们要创建一个连续函数y = f ( x ) 来插值这些位置,以便f经过每个数据点,即f (x) = y 。对于线性插值,点 (x,y) 由直线段连接。对这些线段使用参数线方程是很自然的。参数t只是 x 和 x +1之间的分数距离:
Because the weighting functions are linear polynomials of x, this is linear interpolation.
因为加权函数是x的线性多项式,所以这是线性插值。
The two examples above have the common form of linear interpolation. We create a variable t that varies from 0 to 1 as we move from data item A to data item B. Intermediate values are just the function (1 – t)A + tB. Notice that Equation (2.26) has this form with
上述两个例子具有线性插值的共同形式。我们创建一个变量t ,随着我们从数据项A移动到数据项B ,该变量从 0 变为 1。中间值就是函数 (1 – t ) A + tB 。请注意,公式 (2.26) 具有这种形式
Triangles in both 2D and 3D are the fundamental modeling primitive in many graphics programs. Often information such as color is tagged onto triangle vertices, and this information is interpolated across the triangle. The coordinate system that makes such interpolation straightforward is called barycentric coordinates; we will develop these from scratch. We will also discuss 2D triangles, which must be understood before we can draw their pictures on 2D screens.
二维和三维中的三角形是许多图形程序中的基本建模图元。通常,颜色等信息被标记到三角形顶点上,并且这些信息在三角形上进行插值。使这种插值变得简单的坐标系称为重心坐标;我们将从头开始开发这些坐标。我们还将讨论 2D 三角形,我们必须先理解这些三角形,然后才能在 2D 屏幕上绘制它们的图像。
If we have a 2D triangle defined by 2D points a, b, and c, we can first find its area:
如果我们有一个由二维点a 、 b和c定义的二维三角形,我们首先可以找到它的面积:
The derivation of this formula can be found in Section 6.3. This area will have a positive sign if the points a, b,and c are in counterclockwise order and a negative sign, otherwise.
这个公式的推导可以在6.3 节中找到。如果点a 、 b和c是按逆时针顺序排列的,则该区域为正号,否则为负号。
Often in graphics, we wish to assign a property, such as color, at each triangle vertex and smoothly interpolate the value of that property across the triangle. There are a variety of ways to do this, but the simplest is to use barycentric coordinates. One way to think of barycentric coordinates is as a nonorthogonal coordinate system as was discussed briefly in Section 2.4.2. Such a coordinate system is shown in Figure 2.38, where the coordinate origin is a and the vectors from a to b and c are the basis vectors. With that origin and those basis vectors, any point p can be written as
在图形学中,我们经常希望为每个三角形顶点分配一个属性(例如颜色),并在整个三角形中平滑地插入该属性的值。有多种方法可以做到这一点,但最简单的方法是使用重心坐标。一种思考重心坐标的方法是将其视为非正交坐标系,如第 2.4.2 节中简要讨论的那样。这种坐标系如图 2.38所示,其中坐标原点为a ,从a到b和c的向量为基向量。有了该原点和这些基向量,任何点p都可以写成
Figure 2.38. A 2D triangle with vertices a, b, c can be used to set up a nonorthogonal coordinate system with origin a and basis vectors (b – a) and (c – a). A point is then represented by an ordered pair (β, γ) . For example, the point p = (2.0, 0.5), i.e., p = a +2.0 (b – a)+0.5(c – a).
图 2.38.顶点为a 、 b 、 c 的二维三角形可用于建立非正交坐标系,其原点为a ,基向量为 ( b - a ) 和 ( c - a )。然后,一个点由有序对 ( β , γ ) 表示。例如,点p = (2.0, 0.5),即p = a +2.0 ( b - a ) + 0.5 ( c - a )。
Note that we can reorder the terms in Equation (2.28) to get
请注意,我们可以重新排序公式 (2.28) 中的项,得到
Often people define a new variable α to improve the symmetry of the equations:
人们常常定义一个新变量α来改善方程的对称性:
which yields the equation
得出以下等式
with the constraint that
约束条件是
Barycentric coordinates seem like an abstract and unintuitive construct at first, but they turn out to be powerful and convenient. You may find it useful to think of how street addresses would work in a city where there are two sets of parallel streets, but where those sets are not at right angles. The natural system would essentially be barycentric coordinates, and you would quickly get used to them. Barycentric coordinates are defined for all points on the plane. A particularly nice feature of barycentric coordinates is that a point p is inside the triangle formed by a, b,and c if and only if
重心坐标乍一看似乎是一种抽象且不直观的结构,但事实证明它功能强大且方便。您可能会发现,想象一下在有两组平行街道但两组街道不成直角的城市中街道地址的工作原理很有用。自然系统本质上是重心坐标,您很快就会习惯它们。重心坐标适用于平面上的所有点。重心坐标的一个特别好的特征是,当且仅当点p位于由a 、 b和c形成的三角形内时
If one of the coordinates is zero and the other two are between zero and one, then you are on an edge. If two of the coordinates are zero, then the other is one, and you are at a vertex. Another nice property of barycentric coordinates is that Equation (2.29) in effect mixes the coordinates of the three vertices in a smooth way. The same mixing coefficients (α, β, γ) can be used to mix other properties, such as color, as we will see in the next chapter.
如果其中一个坐标为零,而其他两个坐标介于零和一之间,则您位于边缘。如果两个坐标为零,则另一个坐标为一,则您位于顶点。重心坐标的另一个好特性是,方程 (2.29) 实际上以平滑的方式混合了三个顶点的坐标。相同的混合系数 ( α、β、γ ) 可用于混合其他属性,例如颜色,我们将在下一章中看到。
Given a point p, how do we compute its barycentric coordinates? One way is to write Equation (2.28) as a linear system with unknowns β and γ,solve,andset α = 1 – β – γ . That linear system is
给定一个点p ,我们如何计算它的重心坐标?一种方法是将方程 (2.28) 写成具有未知数β和γ的线性系统,求解并设置 α = 1 – β – γ 。该线性系统是
Although it is straightforward to solve Equation (2.31) algebraically, it is often fruitful to compute a direct geometric solution.
尽管用代数方法求解方程 (2.31) 很简单,但计算直接几何解通常会很有成效。
One geometric property of barycentric coordinates is that they are the signed scaled distance from the lines through the triangle sides, as is shown for β in Figure 2.39. Recall from Section 2.7.2 that evaluating the equation f (x, y) for the line f (x, y) = 0 returns the scaled signed distance from (x, y) to the line. Also recall that if f (x, y) = 0 is the equation for a particular line, so is kf (x, y) = 0 for any nonzero k. Changing k scales the distance and controls which side of the line has positive signed distance, and which negative. We would like to choose k such that, for example, kf (x, y) = β. Since k is only one unknown, we can force this with one constraint, namely, that at point b, we know β = 1. So if the line fac(x, y) = 0 goes through both a and c, then we can compute β for a point (x, y) as follows:
重心坐标的一个几何性质是,它们是从通过三角形边的直线到原点的带符号的缩放距离,如图 2.39中β所示。回想一下2.7.2 节,对直线f ( x, y ) = 0 求方程f ( x, y ) 的值将返回从 ( x, y ) 到该直线的带符号的缩放距离。还记得吗,如果f ( x, y ) = 0 是某条直线的方程,那么对于任何非零k , kf ( x, y ) = 0 也是这样的。改变k会缩放距离并控制直线的哪一侧具有正符号距离,哪一侧具有负符号距离。我们希望选择k使得,例如, kf ( x, y ) = β 。由于k只有一个未知数,我们可以用一个约束来强制实现这一点,即在点b处,我们知道β = 1。因此,如果直线f ac ( x, y ) = 0 同时经过a和c ,则我们可以按如下方式计算点 ( x, y ) 的β :
and we can compute γ and α in a similar fashion. For efficiency, it is usually wise to compute only two of the barycentric coordinates directly and to compute the third using Equation (2.30).
我们可以用类似的方式计算γ和 α。为了提高效率,通常明智的做法是直接计算两个重心坐标,然后使用公式 (2.30) 计算第三个坐标。
To find this “ideal” form for the line through p0 and p1, we can first use the technique of Section 2.7.2 to find some valid implicit lines through the vertices. Equation (2.17) gives us
为了找到通过p 0和p 1 的直线的“理想”形式,我们可以首先使用2.7.2 节中的技巧来找到一些通过顶点的有效隐式直线。公式 (2.17) 给出
Note that fab(xc,yc) probably does not equal one, so it is probably not the ideal form we seek. By dividing through by fab(xc,yc) ,weget
请注意, f ab ( x c , y c ) 可能不等于 1,因此它可能不是我们寻求的理想形式。通过除以f ab ( x c , y c ),我们得到
Figure 2.39. The barycentric coordinate β is the signed scaled distance from the line through a and c.
图 2.39。重心坐标β是从a和c 点的直线开始的带符号的缩放距离。
The presence of the division might worry us because it introduces the possibility of divide-by-zero, but this cannot occur for triangles with areas that are not near zero. There are analogous formulas for α and β, but typically only one is needed:
除法的存在可能会让我们担心,因为它引入了除以零的可能性,但对于面积不接近零的三角形,这种情况不会发生。α 和β有类似的公式,但通常只需要一个:
Another way to compute barycentric coordinates is to compute the areas Aa, Ab, and Ac, of subtriangles as shown in Figure 2.40. Barycentric coordinates obey
计算重心坐标的另一种方法是计算子三角形的面积A a 、 A b和A c ,如图 2.40所示。重心坐标遵循
Figure 2.40. The barycentric coordinates are proportional to the areas of the three subtriangles shown.
图 2.40.重心坐标与所示三个子三角形的面积成比例。
where A is the area of the triangle. Note that A = Aa + Ab + Ac, so it can be computed with two additions rather than a full area formula. This rule still holds for points outside the triangle if the areas are allowed to be signed. The reason for this is shown in Figure 2.41. Note that these are signed areas and will be computed correctly as long as the same signed area computation is used for both A and the subtriangles Aa, Ab,and Ac.
其中A是三角形的面积。注意A = A a + A b + A c ,因此可以用两次加法来计算,而不必使用完整的面积公式。如果允许对面积进行符号表示,则此规则对于三角形外部的点仍然适用。原因如图 2.41所示。注意,这些是有符号的面积,只要对A以及子三角形A a 、 A b和A c使用相同的有符号面积计算,就可以正确计算。
Figure 2.41. The area of the two triangles shown is half base times height and are thus the same, as is any triangle with a vertex on the β = 0.5 line. The height and thus the area is proportional to β.
图 2.41。所示两个三角形的面积等于底面的一半乘以高,因此相等,任何顶点位于 β = 0.5 线上的三角形也是如此。高与 β 成正比,因此面积也与 β 成正比。
One wonderful thing about barycentric coordinates is that they extend almost transparently to 3D. If we assume the points a, b, and c are 3D, then we can still use the representation
重心坐标的奇妙之处在于,它们几乎可以透明地扩展到三维空间。如果我们假设点a 、 b和c是三维的,那么我们仍然可以使用以下表示
Now, as we vary β and γ, we sweep out a plane.
现在,随着我们改变β和γ ,我们扫出一个平面。
The normal vector to a triangle can be found by taking the cross product of any two vectors in the plane of the triangle (Figure 2.42). It is easiest to use two of the three edges as these vectors, for example,
三角形的法向量可以通过对三角形平面中的任意两个向量进行叉积来找到(图 2.42 )。最简单的方法是使用三条边中的两条作为这些向量,例如,
Note that this normal vector is not necessarily of unit length, and it obeys the right-hand rule of cross products.
注意,这个法向量不一定是单位长度,而且它遵循叉积的右手定则。
Figure 2.42. The normal vector of the triangle is perpendicular to all vectors in the plane of the triangle, and thus perpendicular to the edges of the triangle.
图 2.42。三角形的法向量垂直于三角形平面内的所有向量,因而垂直于三角形的边。
The area of the triangle can be found by taking the length of the cross product:
三角形的面积可以通过取叉积的长度来求得:
Note that this is not a signed area, so it cannot be used directly to evaluate barycentric coordinates. However, we can observe that a triangle with a “clockwise” vertex order will have a normal vector that points in the opposite direction to the normal of a triangle in the same plane with a “counterclockwise” vertex order. Recall that
请注意,这不是一个有符号区域,因此不能直接用于评估重心坐标。但是,我们可以观察到,具有“顺时针”顶点顺序的三角形将具有指向与同一平面中具有“逆时针”顶点顺序的三角形的法线相反方向的法向量。回想一下
where ϕ is the angle between the vectors. If a and b are parallel, then cos ϕ = ±1, and this gives a test of whether the vectors point in the same or opposite directions. This, along with Equations (2.33)–(2.35), suggest the formulas
其中 φ 是向量之间的角度。如果a和b平行,则 cos φ = ±1,这可以测试向量指向相同还是相反的方向。这与方程 (2.33)–(2.35) 一起,提出了以下公式
where n is Equation (2.34) evaluated with vertices a, b, and c; na is Equation (2.34) evaluated with vertices b, c,and p, and so on, i.e.,
其中n是通过顶点a 、 b和c求得的方程(2.34); na是通过顶点b 、 c和p求得的方程(2.34),依此类推,即
Probability studies things that include random outcomes and discrete probability refers to when there is a finite number of random outcomes. A classic example is a six-sided die, where the die takes on a random value from {1, 2, 3, 4, 5, 6},where when you roll it, each outcome comes with equal probability. The probability of a certain outcome is the fraction of time that outcome happens. The fraction that something happens at all is 1. Each roll comes up one sixth of the time.
概率研究包括随机结果的事物,离散概率是指随机结果的数量有限。一个典型的例子是六面骰子,骰子会从{ 1, 2, 3, 4, 5, 6 }中随机取值,当你掷骰子时,每个结果出现的概率都相同。某个结果的概率是该结果发生的时间分数。某件事发生的分数是 1。每次掷骰子出现的概率是六分之一。
One of the more confusing things about randomness is distinguishing between a random outcome that either hasn’t happened (or happened and we don’t know the outcome) and a die after it has been rolled. A random variable is a single value that does not have a known value, but will on one from a known set of possibilities with a known likelihood. The term “variable” here comes from math and directly related to “variable” in programming. An example of a random variable is X, where “X = the eventual outcome of the die.” The variable could use any symbol; capital X is often used as a random variable symbol in math in the same way “i” and “j” are often used for loop variables in computer science. Computer programs have a pretty direct use of random variables:
关于随机性最容易让人混淆的事情之一是区分尚未发生(或者发生了但我们不知道结果)的随机结果和已经掷出的骰子。随机变量是一个没有已知值但会从一组已知可能性中以已知可能性出现的值。这里的“变量”一词来自数学,与编程中的“变量”直接相关。随机变量的一个例子是X ,其中“ X = 骰子的最终结果”。变量可以使用任何符号;大写X在数学中通常用作随机变量符号,就像在计算机科学中“i”和“j”通常用于循环变量一样。计算机程序对随机变量的使用非常直接:
int X = rand_from(1,6)
where X a variable where we don’t know the value, but we do know that when we run the program, we will get one of six values each with a probability of one sixth, and this corresponds directly to the case of a random variable “X = the eventual outcome of the die.” There are two properties of random variables that are used all the time: expected value and variance. Expected value, sometimes called expectation, of a random variable X, often denied EX or E(X) , might better be called “expected average value,” but it isn’t so don’t say that or it will confuse people who know the standard terminology. This is just the average value that X takes on under all parallel universes where “the die is rolled.” This can be computed by multiplying each outcome by its probability and adding:
其中X 是一个我们不知道其值的变量,但我们知道在运行程序时,我们将以六分之一的概率得到六个值中的一个,这直接对应于随机变量“ X = 骰子的最终结果”的情况。随机变量有两个经常使用的属性:期望值和方差。随机变量X的期望值,有时称为期望,通常为否定EX或E ( X ),最好称为“期望平均值”,但事实并非如此,所以不要这么说,否则会让了解标准术语的人感到困惑。这只是X在所有“掷骰子”的平行宇宙中取的平均值。这可以通过将每个结果乘以其概率并添加来计算:
So if we averaged a lot of dice rolls, we would “expect” a value around 3.5. This saying “I expect the die to come up 3.5” is not the nonsense it sounds when you know it can’t come up anything but a whole number, but the terminology is perhaps unfortunate. The terminology is quite standard across fields so despite its flaws, just try to internalize it and you will have no problems communicated with people from other fields about this topic.
因此,如果我们对很多次掷骰子进行平均,我们“预期”的数值在 3.5 左右。当你知道骰子只能掷出一个整数时,这句话“我预期掷出的点数为 3.5”听起来并不荒谬,但这个术语可能不太恰当。这个术语在各个领域都是相当标准的,因此尽管存在缺陷,但只要尝试将其内化,你就可以毫无问题地与其他领域的人交流这个话题。
Expected value tells us where a random variable will trend, but it doesn’t tell us how long that trend will take to occur nor how much it oscillates away from its average. For example, a die that had 3 ones and 3 sixes would still have an expected value of 3.5, but the “deviation from the mean” would be larger than on a conventional die. So how do we measure the magnitude of variation? One would be to measure the average deviation form 3.5, but if we include signs that average deviation is zero because the –2.5 of rolling 1 cancels out the +2.5 deviation of rolling 6. We could take the absolute difference but that has practical problems (algebra including absolute values is challenging) and some theoretical issues. In practice people prefer average squared deviation and call it variance:
期望值告诉我们随机变量的趋势,但它没有告诉我们这种趋势需要多长时间才能出现,也没有告诉我们它与平均值的偏差有多大。例如,一个有 3 个 1 和 3 个 6 的骰子仍然有 3.5 的期望值,但“与平均值的偏差”会比传统骰子大。那么我们如何测量变化的幅度呢?一种方法是测量 3.5 的平均偏差,但如果我们包括平均偏差为零的符号,因为掷出 1 的-2.5抵消了掷出 6 的 +2.5 偏差。我们可以取绝对差,但这存在实际问题(包括绝对值的代数很有挑战性)和一些理论问题。在实践中,人们更喜欢平均平方偏差并将其称为方差:
Because it is statistical, that average is an expectation, so
因为它是统计数据,所以这个平均值是一个预期值,所以
For the case of the die, E(X) = 3.5, and the values of X –E(X) are –2.5, –1.5, –.5, .5, 1.5, 2.5, and the values of (X – E(X))2 are thus 6.25, 2.25, 0.25, 0.25, 2.25, 6.25, and thus, variance of X, often denoted, is 17.5/6.
对于骰子的情况, E ( X ) = 3.5,而X –E ( X ) 的值为– 2.5、 – 1.5、 – .5、.5、1.5、2.5,因此 ( X – E ( X )) 2 的值为 6.25、2.25、0.25、0.25、2.25、6.25,因此,X 的方差(通常表示为)为 17.5/6。
An algebraic manipulation of the variance formula yields a sometimes more convenient form:
对方差公式进行代数运算可以得到一种有时更方便的形式:
There are some algebraic niceties to expectation and variance that get used a lot. For example, suppose we have two random variables X and Y and define a variable Z = X + Y . What is E(Z)? It turns out
期望和方差有一些代数细节,经常使用。例如,假设我们有两个随机变量 X 和 Y ,并定义一个变量Z = X + Y 。E( Z ) 是多少?结果是
An amazing thing is that even if X and Y are not “statistically independent” (so for the case of our dice, they might influence each other somehow). In an extreme example, we can look at the first die and just set the second die to the same as the first. Still we would have the formula apply! This is very powerful and is often used as an unstated property in programs.
令人惊奇的是,即使 X 和 Y 不是“统计上独立的”(所以对于我们的骰子来说,它们可能会以某种方式相互影响)。在一个极端的例子中,我们可以查看第一个骰子,然后将第二个骰子设置为与第一个相同。我们仍然可以应用公式!这非常强大,并且经常用作程序中未声明的属性。
The variance has the same behavior but only if X and Y are independent.
仅当 X 和 Y 独立时,方差才会有相同的行为。
A counterexample that shows this formula does not necessarily apply for dependent X and Y, assume you roll X and then just set Y to be the opposite side of that die, so for X = 1 choose Y = 6, and if X = 2 choose Y = 5, and for X = 3 choose Y = 4, etc. The value of Z will always be 7, and thus, the variance is zero. But the variance of X is 0 and clearly not 2(17.5 / 6) as the independent sum would yield.
一个反例表明,此公式不一定适用于相关的 X 和 Y,假设您掷出 X,然后将 Y 设置为骰子的另一侧,因此对于 X = 1,选择 Y = 6,如果 X = 2,选择 Y = 5,对于 X = 3,选择 Y = 4,等等。Z 的值将始终为 7,因此方差为零。但 X 的方差为 0,显然不是独立和得出的 2(17.5 / 6)。
One disadvantage of variance is that it is not very intuitive because of the squaring. So people often use the square root of the variance, called the standard deviation, usually denoted sigma(X). So
方差的一个缺点是,由于需要平方,所以它不太直观。因此人们经常使用方差的平方根,称为标准差,通常表示为 sigma(X)。所以
There are no nice formulas for σ(X + Y ) , so the appeal of variance, where there are nice formulas, becomes more obvious. Note that for the die example, the standard deviation . This is “around” the average distance from the mean of 3.5, but is slightly different as the actual mean absolute distance is 1.5. So while in practice it is almost always not dangerous intuition to think of standard deviation as average absolute deviation, it is good to keep at least in the back of your mind they are different.
对于σ ( X + Y ) 来说,没有很好的公式,因此,在有很好公式的情况下,方差的吸引力就变得更加明显。请注意,对于上面的示例,标准差= 17.5 / 6 ≈ 1.7 。这“大约”是与平均值的平均距离 3.5,但略有不同,因为实际平均绝对距离为 1.5。因此,虽然在实践中将标准偏差视为平均绝对偏差几乎总是不危险的直觉,但至少在心里记住它们是不同的,这是有好处的。
In graphics, we often use random variables that can take on a range of values. These are usually called continuous random variables. The good news is almost everything about discrete random variables carries over: the terminology, the expected value definition and formulas, and variance definition and formulas. There is however, a big difference: the probability of a continuous random variable taking on any particular value is zero. Suppose you have a uniform random variable X that is between 0 and 10:
在图形中,我们经常使用可以取一系列值的随机变量。这些通常称为连续随机变量。好消息是,几乎所有关于离散随机变量的内容都延续了下来:术语、期望值定义和公式以及方差定义和公式。然而,有一个很大的区别:连续随机变量取任何特定值的概率为零。假设你有一个介于 0 和 10 之间的均匀随机变量 X:
X = continuous_random_from(-2.3, 10.9).
The probability of getting the value 1.7 or π or e is all equally likely. The trouble is getting exactly 1.7 has a probability of zero.
得到 1.7 或π或e的概率都是相等的。问题是得到 1.7 的概率为零。
The good news is density functions solve this problem. Just like the case of joules per second, we can use probability per length for the 1D case. In the example above of X = continuous random from(-2.3, 10.9), the dimension over which we measure the probability is length. If the length is in some unspecified unit and we just know the zero to ten range, then we would say the probability is measured “per unit length.”
好消息是密度函数解决了这个问题。就像每秒焦耳的情况一样,对于一维情况,我们可以用每单位长度的概率来表示。在上面的例子中,X = 连续随机从(-2.3,10.9) ,我们测量概率的维度是长度。如果长度是某个未指定的单位,而我们只知道 0 到 10 的范围,那么我们会说概率是“每单位长度”测量的。
Section 2.5 discussed how to “read” an integral and abstracted it away as an “integrate()” function. But how do we actually implement that function? The most common way in graphics is to use Monte Carlo Integration. The algebra for Monte Carlo Integration is often ugly and intimidating. But if we look at this function:
第 2.5 节讨论了如何“读取”积分并将其抽象为“integrate()”函数。但我们如何实际实现该函数?图形学中最常用的方法是使用蒙特卡罗积分。蒙特卡罗积分的代数通常很丑陋且令人生畏。但如果我们看一下这个函数:
float shade = average(f(), hemisphere)
Our intuition would find the right answer. Pick a bunch on random points vi on the hemisphere and evaluate f (vi) and average them, for example:
我们的直觉会找到正确的答案。在半球上随机选取一些点 v,然后计算f (v) 并取平均值,例如:
float sum = 0.0
N = 10000; // or some other big number the user sets
For (int i = 1 to N)
vec3 v = random_point_on_hemisphere()
sum = sum + f(v)
Average = sum / N
It really is that easy! Now you need a function to pick random points on the unit hemisphere. The simplest method is a “rejection method” that first picks points uniformly in the unit ball by repeatedly picking three random numbers uniformly in a unit cube:
真的就这么简单!现在你需要一个函数来在单位半球上选取随机点。最简单的方法是“拒绝方法”,它首先通过在单位立方体中均匀地重复选取三个随机数来均匀地在单位球中选取点:
do
X = random_from(-1,1)
Y = random_from(-1,1)
Z = random_from(-1,1)
while (x^2 + y^2 + z^2 > 1)
And then flip the Z if needed to be in the half-ball:
然后如果需要的话翻转Z以进入半球:
If (Z < 0) Z = -Z
Then, project the point onto the unit hemisphere
然后,将该点投影到单位半球上
v = unit_vector(X, Y, Z).
That is a way to handle an average. But what about a general integral? Recall that
这是处理平均值的一种方法。但是一般积分呢?回想一下
average(f(), domain) = integrate(f(), domain) /
integrate(1, domain)
So
所以
integrate(f(), domain)) = average(f(), domain)*
integrate(1, domain)
In the case of a hemisphere, integrate(1, domain) is just the area, which is 2π.
对于半球来说,integral(1, domain) 就是面积,即 2 π 。
So Monte Carlo integration often is an average of random points times a constant (the size of the domain– length, area, etc.).
因此,蒙特卡洛积分通常是随机点的平均值乘以一个常数(域的大小——长度、面积等)。
When a function we want to take a random average of has a wide variation in its high and low values, it can be to our advantage to concentrate samples in some areas and then correct for the nonuniformity with weights. The probability density functions give us the right tool for that: if we know the PDF of a sample, that is a direct measure of how “oversampled” that region is. If we use nonuniform samples, then we can get thus
当我们想要随机取平均值的函数的高值和低值变化很大时,将样本集中在某些区域,然后用权重校正不均匀性对我们有利。概率密度函数为我们提供了正确的工具:如果我们知道样本的 PDF,那么它就是该区域“过采样”程度的直接度量。如果我们使用非均匀样本,那么我们可以得到
integrate = average_of_nonuniform_samples(f()/p(),
domain).
A neat thing about this formula is it also works for uniform random samples. In that case, the PDF p() = 1/ integrate(1, domain) so the “size” of the domain is encoded in the PDF.
这个公式的一个巧妙之处在于它也适用于均匀随机样本。在这种情况下,PDF p() = 1/integrate(1, domain) ,因此域的“大小”被编码在 PDF 中。
For any given Monte Carlo importance sampling problem, there is a pretty formulaic approach we follow, at least to get started:
对于任何给定的蒙特卡洛重要性采样问题,我们遵循一个非常公式化的方法,至少在开始时是这样:
Identify what is the function f () and the domain of integration (e.g., points on the unit sphere or points on a triangle).
确定函数f () 是什么以及积分域(例如,单位球面上的点或三角形上的点)。
Pick a method for generating random samples xi on that domain, and make sure there is a way to evaluate the PDF p(xi) for each sample.
选择一种在该域上生成随机样本 x 的方法,并确保有一种方法可以评估每个样本的 PDF p (x)。
Average the ratio f (xi)/p(xi) for many xi. This is our estimate of the integral.
对多个 x 计算f (x) /p (x) 比率的平均值。这是我们对积分的估计。
A neat thing is that any p() can be used and you will converge to the right answer (with the caveat that where f () is nonzero p() must be nonzero). Which p() you use merely influences how fast your estimate converges. So we usually start with a constant p() for debugging our code.
巧妙之处在于,可以使用任何p (),并且您将收敛到正确答案(但要注意,当f () 非零时, p () 必须非零)。您使用哪个p () 仅影响您的估计收敛速度。因此,我们通常从常数p () 开始调试代码。
Why isn’t there vector division?
为什么没有矢量除法?
It turns out that there is no “nice” analogy of division for vectors. However, it is possible to motivate the quaternions by examining this question in detail (see Hoffmann’s book referenced in the chapter notes).
事实证明,向量除法没有“很好的”类比。但是,通过详细研究这个问题,可以激发四元数的灵感(请参阅本章注释中引用的霍夫曼的书)。
Is there something as clean as barycentric coordinates for polygons with more than three sides?
对于具有三条以上边的多边形,是否存在像重心坐标一样清晰的东西?
Unfortunately, there is not. Even convex quadrilaterals are much more complicated. This is one reason triangles are such a common geometric primitive in graphics.
不幸的是,没有。即使是凸四边形也要复杂得多。这就是三角形在图形中如此常见的几何基元的原因之一。
Is there an implicit form for 3D lines?
3D 线条有隐式形式吗?
No. However, the intersection of two 3D planes defines a 3D line, so a 3D line can be described by two simultaneous implicit 3D equations.
不是。但是,两个 3D 平面的交点定义了一条 3D 线,因此一条 3D 线可以用两个同时的隐式 3D 方程来描述。
How is quasi–Monte Carlo (QMC) or blue noise sampling related to Monte Carlo sampling?
准蒙特卡洛 (QMC) 或蓝噪声采样与蒙特卡洛采样有何关系?
The core idea of Monte Carlo is you can average a bunch of “fair” samples to estimate a true average. Here, fair can be framed in a statistical sense. But some sample sets can also be shown to be “fair” even if they are not random. One such set are quasi–Monte Carlo and have obvious deterministic structure which is not random, but is uniform in a formal sense that is not statistical, and these sets often improve convergence over random ones. Blue noise sample sets add constraints on the samples to avoid clumping, and like QMC sets can improve convergence without being fully random. In practice, most techniques are developed using Monte Carlo formalisms because the math is more tractable, and then, QMC or blue noise points are inserted in the code with the empirical confidence that uniformity is all that is needed in practice.
蒙特卡洛的核心思想是你可以通过对一堆“公平”样本取平均值来估算真实平均值。这里,公平可以从统计意义上来理解。但有些样本集即使不是随机的也可以证明是“公平的”。准蒙特卡洛就是这样一个集合,它具有明显的确定性结构,它不是随机的,但在形式意义上是均匀的,不是统计的,这些集合通常比随机集合的收敛性更好。蓝噪声样本集对样本添加了约束以避免聚集,并且像 QMC 集一样,可以在不完全随机的情况下提高收敛性。在实践中,大多数技术都是使用蒙特卡洛形式主义开发的,因为数学更容易处理,然后,将 QMC 或蓝噪声点插入代码中,并根据经验确信实践中只需要均匀性。
The history of vector analysis is particularly interesting. It was largely invented by Grassmann in the mid-1800s but was ignored and reinvented later (Crowe, 1994). Grassman now has a following in the graphics field of researchers who are developing Geometric Algebra based on some of his ideas (Doran & Lasenby, 2003). Readers interested in why the particular scalar and vector products are in some sense the right ones, and why we do not have a commonly used vector division, will find enlightenment in the concise About Vectors (Hoffmann, 1975). Another important geometric tool is the quaternion invented by Hamilton in the mid-1800s. Quaternions are useful in many situations, but especially where orientations are concerned (Hanson, 2005).
向量分析的历史尤其有趣。它主要是由 Grassmann 在 19 世纪中期发明的,但后来被忽视并被重新发明(Crowe,1994 年)。Grassman 现在在图形领域拥有一批追随者,研究人员正在根据他的一些想法开发几何代数(Doran & Lasenby,2003 年)。如果读者对为什么特定标量和向量积在某种意义上是正确的,以及为什么我们没有常用的向量除法感兴趣,他们会在简明的《关于向量》 (Hoffmann,1975 年)中找到启发。另一个重要的几何工具是 Hamilton 在 19 世纪中期发明的四元数。四元数在许多情况下都很有用,但在涉及方向时尤其有用(Hanson,2005 年)。
1. The cardinality of a set is the number of elements it contains. Under IEEE floating-point representation (Section 1.5), what is the cardinality of the floats?
1.集合的基数是其所含元素的数量。在 IEEE 浮点表示法(第 1.5 节)下,浮点数的基数是多少?
2. Is it possible to implement a function that maps 32-bit integers to 64-bit integers that has a well-defined inverse? Do all functions from 32-bit integers to 64-bit integers have well-defined inverses?
2.是否可以实现一个将 32 位整数映射到 64 位整数的函数,该函数具有明确定义的逆吗?所有从 32 位整数到 64 位整数的函数都有明确定义的逆吗?
3. Specify the unit cube (x-, y-, and z-coordinates all between 0 and 1 inclusive) in terms of the Cartesian product of three intervals.
3.根据三个区间的笛卡尔积指定单位立方体( x -、 y - 和z -坐标均在 0 到 1 之间(含 0 和 1)。
4. If you have access to the natural log function ln(x) , specify how you could use it to implement a log(b, x) function where b is the base of the log. What should the function do for negative b values? Assume an IEEE floating-point implementation.
4.如果您可以访问自然对数函数 ln( x ) ,请说明如何使用它来实现 log( b, x ) 函数,其中b是对数的底数。对于负b值,该函数应该做什么?假设是 IEEE 浮点实现。
5. Solve the quadratic equation 2x2 +6x +4 = 0.
5.解二次方程 2 x 2 +6 x +4 = 0。
6. Implement a function that takes in coefficients A, B,and C for the quadratic equation Ax2 + Bx + C = 0 and computes the two solutions. Have the function return the number of valid (not NaN) solutions and fill in the return arguments so the smaller of the two solutions is first.
6.实现一个函数,该函数接受二次方程Ax 2 + Bx + C = 0 的系数A 、 B和C并计算两个解。让函数返回有效(非 NaN)解的数量,并填写返回参数,以便两个解中较小的一个排在前面。
7. Show that the two forms of the quadratic formula on page 17 are equivalent (assuming exact arithmetic) and explain how to choose one for each root in order to avoid subtracting nearly equal floating-point numbers, which leads to loss of precision.
7.证明第 17 页二次公式的两种形式是等价的(假设精确算术),并解释如何为每个根选择一种形式,以避免减去几乎相等的浮点数,从而导致精度损失。
8. Show by counterexample that it is not always true that for 3D vectors a, b, and c, a × (b × c) = (a × b) × c.
8.通过反例证明,对于三维向量a 、 b和c , a × ( b × c ) = ( a × b ) × c并不总是正确的。
9. Given the nonparallel 3D vectors a and b, compute a right-handed orthonormal basis such that u is parallel to a and v is in the plane defined by a and b.
9.给定非平行的三维向量a和b ,计算右手正交基,使得u平行于a且v位于由a和b定义的平面内。
10. What is the gradient of f (x, y, z) = x2 + y – 3z3?
10. f ( x, y, z )= x2 + y - 3z3的梯度是多少?
11. What is a parametric form for the axis-aligned 2D ellipse?
11.轴对齐二维椭圆的参数形式是什么?
12. What is the implicit equation of the plane through 3D points (1, 0, 0) , (0, 1, 0) , and (0, 0, 1) ? What is the parametric equation? What is the normal vector to this plane?
12.经过三维点 (1, 0, 0) 、 (0, 1, 0) 和 (0, 0, 1) 的平面的隐式方程是什么?参数方程是什么?该平面的法向量是什么?
13. Given four 2D points a0, a1, b0, and b1, design a robust procedure to determine whether the line segments a0a1 and b0b1 intersect.
13.给定四个二维点a 0 、 a 1 、 b 0和b 1 ,设计一个稳健的程序来确定线段a 0 a 1和b 0 b 1是否相交。
14. Design a robust procedure to compute the barycentric coordinates of a 2D point with respect to three 2D non-collinear points.
14.设计一个稳健的程序来计算二维点相对于三个二维非共线点的重心坐标。
15. Calculate the various 1D integrals from introductory calculus, and vary the number of samples. How quickly do the answer converge as the number of samples is increased?
15.计算入门微积分中的各种一维积分,并改变样本数量。随着样本数量的增加,答案收敛的速度有多快?
Most computer graphics images are presented to the user on some kind of raster display. Raster displays show images as rectangular arrays of pixels. A common example is a flat-panel computer display or television, which has a rectangular array of small light-emitting pixels that can individually be set to different colors to create any desired image. Different colors are achieved by mixing varying intensities of red, green, and blue light. Most printers, such as laser printers and ink-jet printers, are also raster devices. They are based on scanning: there is no physical grid of pixels, but the image is laid down sequentially by depositing ink at selected points on a grid.
大多数计算机图形图像都是通过某种光栅显示器呈现给用户的。光栅显示器将图像显示为像素的矩形阵列。一个常见的例子是平板电脑显示器或电视,它具有小发光像素的矩形阵列,可以单独设置为不同的颜色以创建任何所需的图像。通过混合不同强度的红光、绿光和蓝光可以获得不同的颜色。大多数打印机(例如激光打印机和喷墨打印机)也是光栅设备。它们基于扫描:没有物理的像素网格,而是通过在网格上的选定点上沉积墨水来按顺序排列图像。
Pixel is short for “picture element.”
像素是“图像元素”的缩写。
Rasters are also prevalent in input devices for images. A digital camera contains an image sensor comprising a grid of light-sensitive pixels, each of which records the color and intensity of light falling on it. A desktop scanner contains a linear array of pixels that is swept across the page being scanned, making many measurements per second to produce a grid of pixels.
光栅在图像输入设备中也很常见。数码相机包含一个图像传感器,该传感器由光敏像素网格组成,每个像素记录落在其上的光的颜色和强度。台式扫描仪包含一个线性像素阵列,该阵列扫过被扫描的页面,每秒进行多次测量以产生像素网格。
Color in printers is more complicated, involving mixtures of at least four pigments.
打印机中的颜色更加复杂,涉及至少四种颜料的混合。
Because rasters are so prevalent in devices, raster images are the most common way to store and process images. A raster image is simply a 2D array that stores the pixel value for each pixel—usually a color stored as three numbers, for red, green, and blue. A raster image stored in memory can be displayed by using each pixel in the stored image to control the color of one pixel of the display.
由于光栅在设备中非常普遍,因此光栅图像是存储和处理图像的最常见方式。光栅图像只是一个存储像素的二维数组每个像素的值——通常以三个数字存储颜色,分别为红色、绿色和蓝色。存储在内存中的光栅图像可以通过使用存储图像中的每个像素来控制显示器上一个像素的颜色来显示。
Or, maybe it’s because raster images are so convenient that raster devices are prevalent.
或者,可能是因为光栅图像非常方便,所以光栅设备才如此流行。
But we don’t always want to display an image this way. We might want to change the size or orientation of the image, correct the colors, or even show the image pasted on a moving three-dimensional surface. Even in televisions, the display rarely has the same number of pixels as the image being displayed. Considerations like these break the direct link between image pixels and display pixels. It’s best to think of a raster image as a device-independent description of the image to be displayed, and the display device as a way of approximating that ideal image.
但我们并不总是希望以这种方式显示图像。我们可能希望改变图像的大小或方向、校正颜色,甚至显示粘贴在移动的三维表面上的图像。即使在电视中,显示器的像素数也很少与所显示的图像相同。这些考虑因素打破了图像像素和显示像素之间的直接联系。最好将光栅图像视为要显示的图像的与设备无关的描述,将显示设备视为近似理想图像的一种方式。
There are other ways of describing images besides using arrays of pixels. A vector image is described by storing descriptions of shapes—areas of color bounded by lines or curves—with no reference to any particular pixel grid. In essence, this amounts to storing the instructions for displaying the image rather than the pixels needed to display it. The main advantage of vector images is that they are resolution independent and can be displayed well on very-high-resolution devices. The corresponding disadvantage is that they must be rasterized before they can be displayed. Vector images are often used for text, diagrams, mechanical drawings, and other applications where crispness and precision are important and photographic images and complex shading aren’t needed.
除了使用像素数组之外,还有其他描述图像的方法。矢量图通过存储形状描述(由线条或曲线界定的颜色区域)来描述,而不引用任何特定的像素网格。本质上,这相当于存储显示图像的指令,而不是显示图像所需的像素。矢量图的主要优点是它们与分辨率无关,可以在非常高分辨率的设备上很好地显示。相应的缺点是必须先进行光栅化才能显示。矢量图通常用于文本、图表、机械图纸和其他应用程序,这些应用程序注重清晰度和精度,不需要摄影图像和复杂的阴影。
In this chapter, we discuss the basics of raster images and displays, paying particular attention to the nonlinearities of standard displays. The details of how
在本章中,我们讨论光栅图像和显示器的基础知识,特别关注标准显示器的非线性。
Or: you have to know what those numbers in your image actually mean. pixel values relate to light intensities are important to have in mind when we discuss computing images in later chapters.
或者:您必须知道图像中的那些数字实际上意味着什么。在后面的章节中讨论图像计算时,牢记与光强度相关的像素值非常重要。
Before discussing raster images in the abstract, it is instructive to look at the basic operation of some specific devices that use these images. A few familiar raster devices can be categorized into a simple hierarchy:
在抽象地讨论光栅图像之前,先了解一下使用这些图像的一些特定设备的基本操作是有益的。一些常见的光栅设备可以分为一个简单的层次结构:
Output
输出
– Display
-展示
* Transmissive: liquid crystal display (LCD)
* 透射式:液晶显示屏(LCD)
* Emissive: light-emitting diode (LED) display
* 发射:发光二极管 (LED) 显示屏
Hardcopy
硬拷贝
* Binary: ink-jet printer
* 二进制:喷墨打印机
* Continuous tone: dye sublimation printer
* 连续色调:染料热升华打印机
Input
输入
– 2D array sensor: digital camera
– 2D阵列传感器:数码相机
– 1D array sensor: flatbed scanner
– 1D 阵列传感器:平板扫描仪
Current displays, including televisions and digital cinematic projectors as well as displays and projectors for computers, are nearly universally based on fixed arrays of pixels. They can be separated into emissive displays, which use pixels that directly emit controllable amounts of light, and transmissive displays, in which the pixels themselves don’t emit light but instead vary the amount of light that they allow to pass through them. Transmissive displays require a light source to illuminate them: in a direct-viewed display, this is a backlight behind the array; in a projector, it is a lamp that emits light that is projected onto the screen after passing through the array. An emissive display is its own light source.
当前的显示器,包括电视和数字电影放映机以及计算机显示器和投影仪,几乎普遍基于固定的像素阵列。它们可以分为发射显示器,使用直接发射可控光量的像素,以及透射显示器,其中像素本身不发光,而是改变允许穿过它们的光量。透射显示器需要光源来照亮它们:在直视显示器中,这是阵列后面的背光;在投影仪中,它是一盏灯,发出的光穿过阵列后投射到屏幕上。发射显示器有自己的光源。
Light-emitting diode (LED) displays are an example of the emissive type. Each pixel is composed of one or more LEDs, which are semiconductor devices (based on inorganic or organic semiconductors) that emit light with intensity depending on the electrical current passing through them (see Figure 3.1).
发光二极管 (LED) 显示器是自发光类型的一个例子。每个像素由一个或多个 LED 组成,LED 是一种半导体器件(基于无机或有机半导体),其发光强度取决于通过它们的电流(见图3.1 )。
The pixels in a color display are divided into three independently controlled subpixels—one red, one green, and one blue—each with its own LED made using different materials so that they emit light of different colors (Figure 3.2). When the display is viewed from a distance, the eye can’t separate the individual subpixels, and the perceived color is a mixture of red, green, and blue.
彩色显示器中的像素分为三个独立控制的子像素(一个红色、一个绿色和一个蓝色)每个都有自己的 LED,这些 LED 使用不同的材料制成,因此它们会发出不同颜色的光(图 3.2 )。从远处观看显示器时,眼睛无法区分各个子像素,感知到的颜色是红色、绿色和蓝色的混合。
Liquid crystal displays (LCDs) are an example of the transmissive type. A liquid crystal is a material whose molecular structure enables it to rotate the polarization of light that passes through it, and the degree of rotation can be adjusted by an applied voltage. An LCD pixel (Figure 3.3) has a layer of polarizing film behind it, so that it is illuminated by polarized light—let’s assume it is polarized horizontally.
液晶显示器 (LCD) 是透射型显示器的一个例子。液晶是一种材料,其分子结构使其能够旋转穿过它的光的偏振,并且可以通过施加电压来调整旋转的程度。LCD 像素(图 3.3 )后面有一层偏振膜,因此它被偏振光照射——我们假设它是水平偏振的。
Figure 3.1. The operation of a light-emitting diode (LED) display.
图 3.1.发光二极管 (LED) 显示器的操作。
Figure 3.2. The red, green, and blue subpixels within a pixel of a flat-panel display.
图 3.2.平板显示器像素内的红色、绿色和蓝色子像素。
A second layer of polarizing film in front of the pixel is oriented to transmit only vertically polarized light. If the applied voltage is set so that the liquid crystal layer in between does not change the polarization, all light is blocked and the pixel is in the “off” (minimum intensity) state. If the voltage is set so that the liquid crystal rotates the polarization by 90°, then all the light that entered through the back of the pixel will escape through the front, and the pixel is fully “on”—it has its maximum intensity. Intermediate voltages will partly rotate the polarization so that the front polarizer partly blocks the light, resulting in intensities between the minimum and maximum (Figure 3.4). Like color LED displays, color LCDs have red, green, and blue subpixels within each pixel, which are three independent pixels with red, green, and blue color filters over them.
像素前面的第二层偏光膜仅透射垂直偏振光。如果设置施加的电压使得中间的液晶层不改变偏振,则所有光都将被阻挡,像素处于“关闭”(最小强度)状态。如果设置电压使液晶将偏振旋转 90°,则所有从像素后面进入的光都将从前面逸出,像素完全“开启” - 具有最大强度。中间电压将部分旋转偏振,使得前偏振器部分阻挡光,从而产生最小值和最大值之间的强度(图 3.4 )。与彩色 LED 显示器一样,彩色 LCD 每个像素内都有红、绿、蓝子像素,它们是三个独立的像素,上面有红、绿、蓝滤光片。
Any type of display with a fixed pixel grid, including these and other technologies, has a fundamentally fixed resolution determined by the size of the grid. For displays and images, resolution simply means the dimensions of the pixel grid: if a desktop monitor has a resolution of 1920 × 1200 pixels, this means that it has 2,304,000 pixels arranged in 1920 columns and 1200 rows.
任何具有固定像素网格的显示器(包括这些技术和其他技术)都具有由网格大小决定的基本固定分辨率。对于显示器和图像而言,分辨率只是指像素网格的尺寸:如果台式机显示器的分辨率为 1920 × 1200 像素,则意味着它有 2,304,000 个像素,排列在 1920 列和 1200 行中。
Figure 3.3. One pixel of an LCD display in the off state (bottom), in which the front polarizer blocks all the light that passes the back polarizer, and the on state (top), in which the liquid crystal cell rotates the polarization of the light so that it can pass through the front polarizer. Figure courtesy of Reinhard, Khan, Akyüz, and Johnson (2008).
图 3.3。LCD显示屏的一个像素处于关闭状态(底部),其中前偏光片阻挡了所有穿过后偏光片的光线,以及处于开启状态(顶部),其中液晶单元旋转光线的偏振,以便光线可以穿过前偏光片。图片由 Reinhard、Khan、Akyüz 和 Johnson (2008) 提供。
Figure 3.4. The operation of a liquid crystal display (LCD).
图 3.4.液晶显示器 (LCD) 的操作。
The resolution of a display is sometimes called its “native resolution” since most displays can handle images of other resolutions, via built-in conversion.
显示器的分辨率有时被称为“原始分辨率”,因为大多数显示器可以通过内置转换处理其他分辨率的图像。
An image of a different resolution, to fill the screen, must be converted into a 1920 × 1200 image using the methods of Chapter 10.
为了填满屏幕,必须使用第 10 章的方法将不同分辨率的图像转换为 1920 × 1200 的图像。
The process of recording images permanently on paper has very different constraints from showing images transiently on a display. In printing, pigments are distributed on paper or another medium so that when light reflects from the paper it forms the desired image. Printers are raster devices like displays, but many printers can only print binary images—pigment is either deposited or not at each grid position, with no intermediate amounts possible.
在纸上永久记录图像的过程与在显示器上短暂显示图像的过程有着非常不同的限制。在印刷过程中,颜料分布在纸张或其他介质上,这样当光线从纸张上反射时,就会形成所需的图像。打印机是像显示器一样的光栅设备,但许多打印机只能打印二元图像——颜料在每个网格位置上要么沉积,要么不沉积,不存在中间的量。
An ink-jet printer (Figure 3.5) is an example of a device that forms a raster image by scanning. An ink-jet print head contains liquid ink carrying pigment, which can be sprayed in very small drops under electronic control. The head moves across the paper, and drops are emitted as it passes grid positions that should receive ink; no ink is emitted in areas intended to remain blank. After each sweep, the paper is advanced slightly, and then, the next row of the grid is laid down. Color prints are made by using several print heads, each spraying ink with a different pigment, so that each grid position can receive any combination of different colored drops. Because all drops are the same, an ink-jet printer prints binary images: at each grid point, there is a drop or no drop; there are no intermediate shades.
喷墨打印机(图 3.5 )是通过扫描形成光栅图像的设备的一个示例。喷墨打印头含有载有颜料的液态墨水,可以在电子控制下以非常小的液滴形式喷射。打印头在纸张上移动,当打印头经过应接收墨水的网格位置时,就会喷射液滴;在应保持空白的区域则不会喷射墨水。每次扫描之后,纸张都会稍微前进,然后放下下一行网格。彩色打印是使用多个打印头进行的,每个打印头喷射的墨水都带有不同的颜料,这样每个网格位置都可以接收不同颜色液滴的任意组合。由于所有液滴都是相同的,因此喷墨打印机打印的是二元图像:在每个网格点,要么有一滴液滴,要么没有液滴;没有中间色调。
An ink-jet printer has no physical array of pixels; the resolution is determined by how small the drops can be made and how far the paper is advanced after each sweep. Many ink-jet printers have multiple nozzles in the print head, enabling several sweeps to be made in one pass, but it is the paper advance, not the nozzle spacing, that ultimately determines the spacing of the rows.
喷墨打印机没有物理像素阵列;分辨率取决于墨滴的大小以及每次扫描后纸张前进的距离。许多喷墨打印机的打印头中有多个喷嘴,因此一次扫描可以进行多次扫描,但最终决定行间距的是纸张前进,而不是喷嘴间距。
The thermal dye transfer process is an example of a continuous tone printing process, meaning that varying amounts of dye can be deposited at each pixel—it is not all-or-nothing like an ink-jet printer (Figure 3.6). A donor ribbon containing colored dye is pressed between the paper, or dye receiver, and a print head containing a linear array of heating elements, one for each column of pixels in the image. As the paper and ribbon move past the head, the heating elements switch on and off to heat the ribbon in areas where dye is desired, causing the dye to diffuse from the ribbon to the paper. This process is repeated for each of several dye colors. Since higher temperatures cause more dye to be transferred, the amount of each dye deposited at each grid position can be controlled, allowing a continuous range of colors to be produced. The number of heating elements in the print head establishes a fixed resolution in the direction across the page, but the resolution along the page is determined by the rate of heating and cooling compared to the speed of the paper.
热染料转印过程是连续色调打印过程的一个例子,这意味着每个像素上可以沉积不同量的染料 - 它并不像喷墨打印机那样非此即彼(图 3.6 )。含有彩色染料的供体色带被压在纸张或染料接收器与打印头之间,打印头包含线性加热元件阵列,图像中的每一列像素都有一个加热元件。当纸张和色带经过打印头时,加热元件会打开和关闭,以加热需要染料的区域的色带,从而使染料从色带扩散到纸张上。对几种染料颜色中的每一种都重复此过程。由于较高的温度会导致更多的染料被转印,因此可以控制沉积在每个网格位置上的每种染料的量,从而允许产生连续的颜色范围。打印头中的加热元件数量决定了页面方向上的固定分辨率,但页面沿线的分辨率由加热和冷却速率与纸张速度的关系决定。
Figure 3.5. The operation of an ink-jet printer.
图 3.5.喷墨打印机的运行。
There are also continuous ink-jet printers that print in a continuous helical path on paper wrapped around a spinning drum, rather than moving the head back and forth.
还有连续喷墨打印机,这种打印机以连续螺旋路径在缠绕在旋转滚筒上的纸张上打印,而不是来回移动打印头。
Figure 3.6. The operation of a thermal dye transfer printer.
图 3.6.热染料转印打印机的运行。
Unlike displays, the resolution of printers is described in terms of the pixel density instead of the total count of pixels. So a thermal dye transfer printer that has elements spaced 300 per inch across its print head has a resolution of 300 pixels per inch (ppi) across the page. If the resolution along the page is chosen to be the same, we can simply say the printer’s resolution is 300 ppi. An ink-jet printer that places dots on a grid with 1200 grid points per inch is described as having a resolution of 1200 dots per inch (dpi). Because the ink-jet printer is a binary device, it requires a much finer grid for at least two reasons. Because edges are abrupt black/white boundaries, very high resolution is required to avoid stair-stepping, or aliasing, from appearing (see Section 9.3). When continuous-tone images are printed, the high resolution is required to simulate intermediate colors by printing varying-density dot patterns called halftones.
与显示器不同,打印机的分辨率是用像素密度而不是像素总数来描述的。因此,如果热敏染料转印打印机的打印头上每英寸间隔 300 个像素,则整个页面的分辨率为每英寸 300 像素(ppi)。如果选择的页面分辨率相同,我们可以简单地说打印机的分辨率是 300 ppi。如果喷墨打印机将点放置在每英寸 1200 个网格点的网格上,则其分辨率为每英寸 1200 点(dpi)。由于喷墨打印机是二进制设备,因此至少出于两个原因,它需要更精细的网格。由于边缘是突然的黑/白边界,因此需要非常高的分辨率以避免出现阶梯状或混叠(参见第 9.3 节)。当打印连续色调图像时,需要高分辨率来通过打印称为“连续色调”的不同密度点图案来模拟中间色半色调。
The term “dpi” is all too often used to mean “pixels per inch,” but dpi should be used in reference to binary devices and ppi in reference to continuous-tone devices.
术语“dpi”经常用来表示“每英寸像素数”,但 dpi 应该用于二进制设备,而 ppi 应该用于连续色调设备。
Raster images have to come from somewhere, and any image that wasn’t computed by some algorithm has to have been measured by some raster input device, most often a camera or scanner. Even in rendering images of 3D scenes, photographs are used constantly as texture maps (see Chapter 11). A raster input device has to make a light measurement for each pixel, and (like output devices) they are usually based on arrays of sensors.
光栅图像必须来自某个地方,任何不是通过某种算法计算出来的图像都必须经过某种光栅输入设备(通常是照相机或扫描仪)的测量。即使在渲染 3D 场景的图像时,照片也经常用作纹理贴图(参见第 11 章)。光栅输入设备必须对每个像素进行光测量,并且(与输出设备一样)它们通常基于传感器阵列。
A digital camera is an example of a 2D array input device. The image sensor in a camera is a semiconductor device with a grid of light-sensitive pixels. Two common types of arrays are known as CCDs (charge-coupled devices) and CMOS (complimentary metal–oxide–semiconductor) image sensors. The camera’s lens projects an image of the scene to be photographed onto the sensor, and then, each pixel measures the light energy falling on it, ultimately resulting in a number that goes into the output image (Figure 3.7). In much the same way as color displays use red, green, and blue subpixels, most color cameras work by using a color-filter array or mosaic to allow each pixel to see only red, green, or blue light, leaving the image processing software to fill in the missing values in a process known as demosaicking (Figure 3.8).
数码相机是 2D 阵列输入设备的一个例子。相机中的图像传感器是一种具有感光像素网格的半导体器件。两种常见的阵列类型是 CCD(电荷耦合器件)和 CMOS(互补金属氧化物半导体)图像传感器。相机的镜头将要拍摄的场景的图像投射到传感器上,然后每个像素测量落在其上的光能,最终产生一个数字,该数字进入输出图像(图 3.7 )。与彩色显示器使用红色、绿色和蓝色子像素的方式非常相似,大多数彩色相机的工作原理是使用彩色滤光片阵列或马赛克,使每个像素仅看到红色、绿色或蓝色光,让图像处理软件填充缺失的值,这个过程称为去马赛克(图 3.8 )。
Figure 3.7. The operation of a digital camera.
图3.7.数码相机的操作。
Figure 3.8. Most color digital cameras use a color-filter array similar to the Bayer mosaic shown here. Each pixel measures either red, green, or blue light.
图 3.8。大多数彩色数码相机使用与此处所示的拜耳马赛克类似的彩色滤光片阵列。每个像素测量红光、绿光或蓝光。
Other cameras use three separate arrays, or three separate layers in the array, to measure independent red, green, and blue values at each pixel, producing a usable color image without further processing. The resolution of a camera is determined by the fixed number of pixels in the array and is usually quoted using the total count of pixels: a camera with an array of 3000 columns and 2000 rows produces an image of resolution 3000 × 2000, which has 6 million pixels, and is called a 6 megapixel (MP) camera. It’s important to remember that a mosaic sensor does not measure a complete color image, so a camera that measures the same number of pixels but with independent red, green, and blue measurements records more information about the image than one with a mosaic sensor.
其他相机使用三个独立的阵列或阵列中的三个独立层来测量每个像素的独立红、绿、蓝值,从而无需进一步处理即可生成可用的彩色图像。相机的分辨率由阵列中的固定像素数决定,通常以像素总数表示:具有 3000 列和 2000 行阵列的相机可生成分辨率为 3000 × 2000 的图像,该图像具有 600 万像素,因此称为 6 百万像素 (MP) 相机。请务必记住,马赛克传感器不会测量完整的彩色图像,因此,与使用马赛克传感器的相机相比,测量相同像素数但具有独立红、绿、蓝测量值的相机可以记录更多有关图像的信息。
People who are selling cameras use “mega” to mean 106, not 220 as with megabytes.
销售相机的人使用“兆”来表示 10 6 ,而不是像兆字节那样的 2 20 。
A flatbed scanner also measures red, green, and blue values for each of a grid of pixels, but like a thermal dye transfer printer, it uses a 1D array that sweeps across the page being scanned, making many measurements per second (Figure 3.9). The resolution across the page is fixed by the size of the array, and the resolution along the page is determined by the frequency of measurements compared to the speed at which the scan head moves. A color scanner has a 3 × nx array, where nx is the number of pixels across the page, with the three rows covered by red, green, and blue filters. With an appropriate delay between the times at which the three colors are measured, this allows three independent color measurements at each grid point. As with continuous-tone printers, the resolution of scanners is reported in pixels per inch (ppi).
平板扫描仪也会测量每个像素网格的红色、绿色和蓝色值,但与热染料转印打印机类似,它使用一维阵列扫描整个被扫描的页面,每秒进行多次测量(图 3.9 )。整个页面的分辨率由阵列的大小决定,而沿着页面的分辨率由测量频率与扫描头移动速度之比决定。彩色扫描仪具有 3 × n x阵列,其中n x是整个页面上的像素数,三行分别由红色、绿色和蓝色滤光片覆盖。如果在测量三种颜色的时间之间有适当的延迟,则可以在每个网格点进行三次独立的颜色测量。与连续色调打印机一样,扫描仪的分辨率以每英寸像素数 (ppi) 为单位。
The resolution of a scanner is sometimes called its “optical resolution” since most scanners can produce images of other resolutions, via built-in conversion.
扫描仪的分辨率有时被称为“光学分辨率”,因为大多数扫描仪可以通过内置转换生成其他分辨率的图像。
With this concrete information about where our images come from and where they will go, we’ll now discuss images more abstractly, in the way we’ll use them in graphics algorithms.
有了关于图像来自哪里以及到哪里的具体信息,我们现在将以在图形算法中使用图像的方式更抽象地讨论图像。
Figure 3.9. The operation of a flatbed scanner.
图 3.9.平板扫描仪的操作。
“A pixel is not a little square!”—Alvy Ray Smith (1995)
“一个像素不是一个小方块!”——Alvy Ray Smith (1995)
We know that a raster image is a big array of pixels, each of which stores information about the color of the image at its grid point. We’ve seen what various output devices do with images we send to them and how input devices derive them from images formed by light in the physical world. But for computations in the computer, we need a convenient abstraction that is independent of the specifics of any device, that we can use to reason about how to produce or interpret the values stored in images.
我们知道,光栅图像是一大组像素,每个像素都存储了其网格点处图像颜色的信息。我们已经了解了各种输出设备如何处理我们发送给它们的图像,以及输入设备如何从物理世界中光形成的图像中获取图像。但对于计算机中的计算,我们需要一个方便的抽象,它独立于任何设备的细节,我们可以使用它来推断如何生成或解释存储在图像中的值。
When we measure or reproduce images, they take the form of two-dimensional distributions of light energy: the light emitted from the monitor as a function of position on the face of the display; the light falling on a camera’s image sensor as a function of position across the sensor’s plane; the reflectance, or fraction of light reflected (as opposed to absorbed) as a function of position on a piece of paper. So in the physical world, images are functions defined over two-dimensional areas—almost always rectangles. So we can abstract an image as a function
当我们测量或重现图像时,它们会呈现光能的二维分布形式:显示器发出的光是显示屏表面位置的函数;落在相机图像传感器上的光是传感器平面位置的函数;反射率,即反射(而不是吸收)光的比例是纸张位置的函数。因此,在物理世界中,图像是定义在二维区域(几乎总是矩形)上的函数。因此,我们可以将图像抽象为一个函数
where R ⊂ ℝ2 is a rectangular area and V is the set of possible pixel values. The simplest case is an idealized grayscale image where each point in the rectangle has just a brightness (no color), and we can say V = ℝ+ (the nonnegative reals). An idealized color image, with red, green, and blue values at each pixel, has V = (ℝ+)3. We’ll discuss other possibilities for V in the next section.
其中R ⊂ ℝ 2是一个矩形区域, V是可能像素值的集合。最简单的情况是理想化的灰度图像,其中矩形中的每个点都只有亮度(没有颜色),我们可以说V = ℝ + (非负实数)。理想化的彩色图像,每个像素都有红色、绿色和蓝色值, V = (ℝ + ) 3 。我们将在下一节讨论V的其他可能性。
Are there any raster devices that are not rectangular?
有没有非矩形的光栅设备?
How does a raster image relate to this abstract notion of a continuous image? Looking to the concrete examples, a pixel from a camera or scanner is a measurement of the average color of the image over some small area around the pixel. A display pixel, with its red, green, and blue subpixels, is designed so that the average color of the image over the face of the pixel is controlled by the corresponding pixel value in the raster image. In both cases, the pixel value is a local average of the color of the image, and it is called a point sample of the image. In other words, when we find the value x in a pixel, it means “the value of the image in the vicinity of this grid point is x.” The idea of images as sampled representations of functions is explored further in Chapter 10.
光栅图像与连续图像这一抽象概念有何关系?从具体的例子来看,照相机或扫描仪的像素是该像素周围某个小区域内图像平均颜色的量度。显示像素具有红、绿、蓝子像素,其设计使得像素表面图像平均颜色由光栅图像中对应的像素值控制。在这两种情况下,像素值都是图像颜色的局部平均值,称为图像的一个点样本。换句话说,当我们在某个像素中找到x值时,这意味着“该网格点附近的图像值为x ”。第 10 章将进一步探讨图像作为函数采样表示这一概念。
Figure 3.10. Coordinates of a four-pixel × three-pixel screen. Note that in some APIs the y-axis will point downward.
图 3.10。四像素 × 三像素屏幕的坐标。请注意,在某些 API 中, y轴将指向下方。
A mundane but important question is where the pixels are located in 2D space. This is only a matter of convention, but establishing a consistent convention is important! In this book, a raster image is indexed by the pair (i, j) indicating the column (i) and row (j) of the pixel, counting from the bottom left. If an image has nx columns and ny rows of pixels, the bottom-left pixel is (0, 0) and the top-right is pixel (nx – 1,ny – 1) . We need 2D real screen coordinates to specify pixel positions. We will place the pixels’ sample points at integer coordinates, as shown by the 4 × 3 screen in Figure 3.10.
一个平凡但重要的问题是像素在二维空间中的位置。这只是一个惯例问题,但建立一致的惯例非常重要!在本书中,光栅图像由对 ( i,j ) 索引,表示像素的列 ( i ) 和行 ( j ),从左下方开始数。如果图像有n x列和n y行像素,则左下角像素为 (0, 0),右上角像素为 ( nx - 1 , ny - 1)。我们需要二维实屏幕坐标来指定像素位置。我们将像素的采样点放在整数坐标处,如图 3.10中的 4×3 屏幕所示。
In some APIs, and many file formats, the rows of an image are organized top-to-bottom, so that (0, 0) is at the top left. This is for historical reasons: the rows in analog television transmission started from the top.
在某些 API 和许多文件格式中,图像的行是从上到下排列的,因此 (0, 0) 位于左上角。这是出于历史原因:模拟电视传输中的行是从顶部开始的。
The rectangular domain of the image has width nx and height ny and is centered on this grid, meaning that it extends half a pixel beyond the last sample point on each side. So the rectangular domain of a nx × ny image is
图像的矩形域宽度为n x ,高度为n y ,并以此网格为中心,这意味着它每边都比最后一个采样点延伸半个像素。因此, n x × n y图像的矩形域为
Some systems shift the coordinates by half a pixel to place the sample points halfway between the integers but place the edges of the image at integers.
有些系统将坐标移动半个像素,将采样点放置在整数中间,但将图像的边缘放置在整数处。
Again, these coordinates are simply conventions, but they will be important to remember later when implementing cameras and viewing transformations.
再次强调,这些坐标只是约定俗成而已,但在以后实现相机和查看变换时记住它们很重要。
So far we have described the values of pixels in terms of real numbers, representing intensity (possibly separately for red, green, and blue) at a point in the image. This suggests that images should be arrays of floating-point numbers, with either one (for grayscale, or black and white, images) or three (for RGB color images) 32-bit floating-point numbers stored per pixel. This format is sometimes used, when its precision and range of values are needed, but images have a lot of pixels and memory and bandwidth for storing and transmitting images are invariably scarce. Just one ten-megapixel photograph would consume about 115 MB of RAM in this format.
到目前为止,我们用实数描述了像素的值,表示图像中某一点的强度(可能分别表示红色、绿色和蓝色)。这表明图像应该是浮点数数组,每个像素存储一个(对于灰度或黑白图像)或三个(对于 RGB 彩色图像)32 位浮点数。当需要精度和值范围时,有时会使用此格式,但图像有很多像素,用于存储和传输图像的内存和带宽总是稀缺的。仅一张 10 兆像素的照片就会以这种格式消耗大约 115 MB 的 RAM。
Less range is required for images that are meant to be displayed directly. While the range of possible light intensities is unbounded in principle, any given device has a decidedly finite maximum intensity, so in many contexts, it is perfectly sufficient for pixels to have a bounded range, usually taken to be [0, 1] for simplicity. For instance, the possible values in an 8-bit image are 0, 1/255, 2/255, ..., 254/255, 1. Images stored with floating-point numbers, allowing a wide range of values, are often called high dynamic range (HDR) images to distinguish them from fixed-range, or low dynamic range (LDR) images that are stored with integers. See Chapter 20 for an in-depth discussion of techniques and applications for high dynamic range images.
对于直接显示的图像,所需的范围较小。虽然光强度的范围在原则上是无界的,但任何给定的设备都具有绝对有限的最大强度,因此在许多情况下,像素具有有界的范围就足够了,通常为简单起见取为 [0, 1]。例如,8 位图像中的可能值为 0、1 / 255、2 / 255 ,..., 254 / 255,1。用浮点数存储的图像允许很宽的范围的值,通常称为高动态范围(HDR) 图像,以将它们与用整数存储的固定范围或低动态范围(LDR) 图像区分开来。有关高动态范围图像的技术和应用的深入讨论,请参见第 20 章。
Here are some pixel formats with typical applications:
以下是一些具有典型应用的像素格式:
Why 115 MB and not 120 MB?
为什么是 115 MB 而不是 120 MB?
The denominator of 255, rather than 256, is awkward, but being able to represent 0 and 1 exactly is important.
分母为 255 而不是 256,这很尴尬,但能够准确表示 0 和 1 很重要。
1-bit grayscale—text and other images where intermediate grays are not desired (high resolution required);
1 位灰度——不需要中间灰色的文本和其他图像(需要高分辨率);
8-bit RGB fixed-range color (24 bits total per pixel)—web and email applications, consumer photographs;
8 位 RGB 固定范围颜色(每像素总共 24 位)——网络和电子邮件应用程序、消费者照片;
8- or 10-bit fixed-range RGB (24–30 bits/pixel)—digital interfaces to computer displays;
8 位或 10 位固定范围 RGB(24-30 位/像素)——计算机显示器的数字接口;
12- to 14-bit fixed-range RGB (36–42 bits/pixel)—raw camera images for professional photography;
12 到 14 位固定范围 RGB(36-42 位/像素)——用于专业摄影的原始相机图像;
16-bit fixed-range RGB (48 bits/pixel)—professional photography and printing; intermediate format for image processing of fixed-range images;
16 位固定范围 RGB(48 位/像素)——专业摄影和印刷;固定范围图像的图像处理的中间格式;
16-bit fixed-range grayscale (16 bits/pixel)—radiology and medical imaging;
16 位固定范围灰度(16 位/像素)——放射学和医学成像;
16-bit “half-precision” floating-point RGB—HDR images; intermediate format for real-time rendering;
16 位“半精度”浮点 RGB—HDR 图像;实时渲染的中间格式;
32-bit floating-point RGB—general-purpose intermediate format for software rendering and processing of HDR images.
32 位浮点 RGB——用于软件渲染和处理 HDR 图像的通用中间格式。
Reducing the number of bits used to store each pixel leads to two distinctive types of artifacts, or artificially introduced flaws, in images. First, encoding images with fixed-range values produces clipping when pixels that would otherwise be brighter than the maximum value are set, or clipped, to the maximum representable value. For instance, a photograph of a sunny scene may include reflections that are much brighter than white surfaces; these will be clipped (even if they were measured by the camera) when the image is converted to a fixed range to be displayed. Second, encoding images with limited precision leads to quantization artifacts, or banding, when the need to round pixel values to the nearest representable value introduces visible jumps in intensity or color. Banding can be particularly insidious in animation and video, where the bands may not be objectionable in still images, but become very visible when they move back and forth.
减少用于存储每个像素的位数会导致两种不同类型的图像中的伪影,即人为引入的瑕疵。首先,使用固定范围值对图像进行编码会产生限幅是指将原本比最大值更亮的像素设置或限幅为最大可表示值。例如,阳光明媚的场景的照片中可能包含比白色表面亮得多的反射;当将图像转换为固定范围以进行显示时,这些反射将被限幅(即使它们是由相机测量的)。其次,以有限的精度对图像进行编码会导致量化伪影或条带,当需要将像素值四舍五入到最接近的可表示值时,会引入明显的强度或颜色跳跃。条带在动画和视频中尤其隐蔽,其中条带在静止图像中可能并不令人反感,但当它们来回移动时就会变得非常明显。
All modern monitors take digital input for the “value” of a pixel and convert this to an intensity level. Real monitors have some nonzero intensity when they are off because the screen reflects some light. For our purposes, we can consider this “black” and the monitor fully on as “white.” We assume a numeric description of pixel color that ranges from zero to one. Black is zero, white is one, and a gray halfway between black and white is 0.5. Note that here “halfway” refers to the physical amount of light coming from the pixel, rather than the appearance. The human perception of intensity is nonlinear and will not be part of the present discussion; see Chapter 19 for more.
所有现代显示器都以数字形式输入像素的“值”,并将其转换为强度级别。真实的显示器在关闭时会具有一些非零强度,因为屏幕会反射一些光。为了便于说明,我们可以将这种“黑色”视为“黑色”,将完全打开的显示器视为“白色”。我们假设像素颜色的数字描述范围从零到一。黑色为零,白色为一,介于黑色和白色之间的灰色为 0.5。请注意,此处的“一半”是指来自像素的物理光量,而不是外观。人类对强度的感知是非线性的,不是本讨论的一部分;有关详细信息,请参阅第 19 章。
There are two key issues that must be understood to produce correct images on monitors. The first is that monitors are nonlinear with respect to input. For example, if you give a monitor 0, 0.5, and 1.0 as inputs for three pixels, the intensities displayed might be 0, 0.25, and 1.0 (off, one-quarter fully on, and fully on). As an approximate characterization of this nonlinearity, monitors are commonly characterized by a γ (“gamma”) value. This value is the degree of freedom in the formula
必须理解两个关键问题才能在显示器上产生正确的图像。首先,显示器对于输入是非线性的。例如,如果您给显示器 0、0.5 和 1.0 作为三个像素的输入,则显示的强度可能是 0、0.25 和 1.0(关闭、四分之一完全打开和完全打开)。作为这种非线性的近似表征,显示器通常以γ (“伽马”)值表示。该值是公式中的自由度
where a is the input pixel value between zero and one. For example, if a monitor has a gamma of 2.0, and we input a value of a = 0.5, the displayed intensity will be one-fourth the maximum possible intensity because 0.52 = 0.25. Note that a = 0 maps to zero intensity and a = 1 maps to the maximum intensity regardless of the value of γ. Describing a display’s nonlinearity using γ is only an approximation; we do not need a great deal of accuracy in estimating the γ of a device. A nice visual way to gauge the nonlinearity is to find what value of a gives an intensity halfway between black and white. This a will be
其中a是介于 0 和 1 之间的输入像素值。例如,如果显示器的伽马值为 2.0,并且我们输入a = 0.5 的值,则显示的强度将是最大可能强度的四分之一,因为 0.5 2 = 0.25。请注意,无论γ的值是多少, a = 0 都映射到零强度, a = 1 映射到最大强度。使用γ描述显示器的非线性只是一种近似值;我们不需要非常精确地估计设备的γ 。衡量非线性的一种很好的视觉方法是找到a的哪个值给出介于黑色和白色之间的强度。这个a将是
If we can find that a, we can deduce γ by taking logarithms on both sides:
如果我们可以找到a ,我们可以通过对两边取对数来推导出γ :
We can find this a by a standard technique where we display a checkerboard pattern of black and white pixels next to a square of gray pixels with input a (Figure 3.11), then ask the user to adjust a (with a slider, for instance) until the two sides match in average brightness. When you look at this image from a distance (or without glasses if you are nearsighted), the two sides of the image will look about the same when a is producing an intensity halfway between black and white. This is because the blurred checkerboard is mixing even numbers of white and black pixels so the overall effect is a uniform color halfway between white and black.
我们可以通过一种标准技术找到这个a ,即在输入a的情况下,在一块灰色像素的正方形旁边显示一个黑白像素的棋盘格图案(图 3.11 ),然后让用户调整a (例如,使用滑块),直到两边的平均亮度匹配。当你从远处看这幅图像时(如果你是近视,则不戴眼镜),当a产生的强度介于黑色和白色之间时,图像的两侧看起来会大致相同。这是因为模糊的棋盘格混合了偶数个白色和黑色像素,因此整体效果是介于白色和黑色之间的均匀颜色。
Once we know γ, we can gamma correct our input so that a value of a = 0.5 is displayed with intensity halfway between black and white. This is done with the transformation
一旦我们知道了γ ,我们就可以对我们的输入进行伽马校正,以便a = 0.5 的值以介于黑色和白色之间的强度显示。这是通过变换完成的
Figure 3.11. Alternating black and white pixels viewed from a distance are halfway between black and white. The gamma of a monitor can be inferred by finding a gray value that appears to have the same intensity as the black and white pattern.
图 3.11.从远处看,交替出现的黑白像素介于黑色和白色之间。可以通过找到看起来与黑白图案具有相同强度的灰度值来推断显示器的伽马值。
For monitors with analog interfaces, which have difficulty changing intensity rapidly along the horizontal direction, horizontal black and white stripes work better than a checkerboard.
对于具有模拟接口的显示器,由于其难以沿水平方向快速改变强度,因此水平黑白条纹比棋盘效果更好。
When this formula is plugged into Equation (3.1), we get
将此公式代入公式 (3.1) 可得
Another important characteristic of real displays is that they take quantized input values. So while we can manipulate intensities in the floating-point range [0, 1], the detailed input to a monitor is a fixed-size integer. The most common range for this integer is 0–255 which can be held in 8 bits of storage. This means that the possible values for a are not any number in [0, 1] but instead
真实显示器的另一个重要特征是它们采用量化的输入值。因此,虽然我们可以在浮点范围 [0, 1] 内操纵强度,但显示器的详细输入是固定大小的整数。此整数的最常见范围是 0-255,可以保存在 8 位存储空间中。这意味着a的可能值不是 [0, 1] 中的任何数字,而是
This means the possible displayed intensity values are approximately
这意味着可能显示的强度值大约为
where M is the maximum intensity. In applications where the exact intensities need to be controlled, we would have to actually measure the 256 possible intensities, and these intensities might be different at different points on the screen, especially for CRTs. They might also vary with viewing angle. Fortunately, few applications require such accurate calibration.
其中M是最大强度。在需要控制精确强度的应用中,我们必须实际测量 256 种可能的强度,并且这些强度在屏幕上的不同点可能不同,尤其是对于 CRT。它们也可能随着视角而变化。幸运的是,很少有应用需要如此精确的校准。
Most computer graphics images are defined in terms of red-green-blue (RGB) color. RGB color is a simple space that allows straightforward conversion to the controls for most computer screens. In this section, RGB color is discussed from a user’s perspective, and operational facility is the goal. A more thorough discussion of color is given in Chapter 18, but the mechanics of RGB color space will allow us to write most graphics programs. The basic idea of RGB color space is that the color is displayed by mixing three primary lights: one red, one green, and one blue. The lights mix in an additive manner.
大多数计算机图形图像都是用红绿蓝 (RGB) 颜色定义的。RGB 颜色是一个简单的空间,可以直接转换为大多数计算机屏幕的控件。在本节中,将从用户的角度讨论 RGB 颜色,目标是实现操作便利。第 18 章将对颜色进行更深入的讨论,但 RGB 颜色空间的机制将使我们能够编写大多数图形程序。RGB 颜色空间的基本思想是通过混合三种原色光来显示颜色:一种红色、一种绿色和一种蓝色。这些光以加法方式混合。
In grade school, you probably learned that the primaries are red, yellow, and blue, and that, e.g., yellow + blue = green. This is subtractive color mixing, which is fundamentally different from the more familiar additive mixing that happens in displays.
在小学时,您可能学过三原色是红色、黄色和蓝色,例如黄色 + 蓝色 = 绿色。这是减色混合,与显示器中常见的加色混合有着根本的不同。
In RGB additive color mixing we have (Figure 3.12)
在 RGB 加色混合中,我们有(图 3.12 )
The color “cyan” is a blue-green, and the color “magenta” is a purple.
“青色”是蓝绿色,“洋红色”是紫色。
Figure 3.12. The additive mixing rules for colors red/-green/blue.
图 3.12。红/绿/蓝颜色的加色混合规则。
If we are allowed to dim the primary lights from fully off (indicated by pixel value 0) to fully on (indicated by 1), we can create all the colors that can be displayed on an RGB monitor. The red, green, and blue pixel values create a three-dimensional RGB color cube that has a red, a green, and a blue axis. Allowable coordinates for the axes range from zero to one. The color cube is shown graphically in Figure 3.13.
如果允许我们将原色光从完全关闭(像素值 0 表示)调暗到完全打开(像素值 1 表示),我们就可以创建 RGB 显示器上可以显示的所有颜色。红色、绿色和蓝色像素值创建一个三维RGB 颜色立方体,该立方体具有红色、绿色和蓝色轴。轴的允许坐标范围从零到一。颜色立方体以图形方式显示在图 3.13中。
The colors at the corners of the cube are
立方体角落的颜色是
Figure 3.13. The RGB color cube in 3D and its faces unfolded. Any RGB color is a point in the cube.
图 3.13。3D中的 RGB 颜色立方体及其展开的面。任何 RGB 颜色都是立方体中的一个点。
Actual RGB levels are often given in quantized form, just like the grayscales discussed in Section 3.2.2. Each component is specified with an integer. The most common size for these integers is one byte each, so each of the three RGB components is an integer between 0 and 255. The three integers together take up three bytes, which is 24 bits. Thus, a system that has “24-bit color” has 256 possible levels for each of the three primary colors. Issues of gamma correction discussed in Section 3.2.2 also apply to each RGB component separately.
实际的 RGB 级别通常以量化形式给出,就像第 3.2.2 节中讨论的灰度一样。每个组件都用一个整数指定。这些整数最常见的大小是每个字节一个,因此三个 RGB 组件中的每一个都是 0 到 255 之间的整数。这三个整数加起来占用三个字节,即 24 位。因此,具有“24 位颜色”的系统对三原色中的每一个都有 256 个可能的级别。第 3.2.2 节中讨论的伽马校正问题也分别适用于每个 RGB 组件。
Often, we would like to only partially overwrite the contents of a pixel. A common example of this occurs in compositing, where we have a background and want to insert a foreground image over it. For opaque pixels in the foreground, we just replace the background pixel. For entirely transparent foreground pixels, we do not change the background pixel. For partially transparent pixels, some care must be taken. Partially transparent pixels can occur when the foreground object has partially transparent regions, such as glass. But, the most frequent case where foreground and background must be blended is when the foreground object only partly covers the pixel, either at the edge of the foreground object, or when there are subpixel holes such as between the leaves of a distant tree.
通常,我们只想部分覆盖像素的内容。一个常见的例子发生在合成中,我们有一个背景并想在其上插入前景图像。对于前景中的不透明像素,我们只需替换背景像素。对于完全透明的前景像素,我们不会更改背景像素。对于部分透明的像素,必须小心。当前景物体具有部分透明区域(例如玻璃)时,可能会出现部分透明像素。但是,必须混合前景和背景的最常见情况是当前景物体仅部分覆盖像素时(要么在前景物体的边缘,要么当存在子像素孔洞时,例如远处树木的叶子之间)。
The most important piece of information needed to blend a foreground object over a background object is the pixel coverage, which tells the fraction of the pixel covered by the foreground layer. We can call this fraction α. If we want to composite a foreground color cf over background color cb, and the fraction of the pixel covered by the foreground is α, then we can use the formula
将前景物体与背景物体融合所需的最重要的信息是像素覆盖率,表示前景层覆盖的像素比例。我们可以将这个比例称为 α。如果我们想将前景色c f合成到背景色c b上,并且前景覆盖的像素比例为 α,那么我们可以使用公式
For an opaque foreground layer, the interpretation is that the foreground object covers area α within the pixel’s rectangle and the background object covers the remaining area, which is (1 – α) . For a transparent layer (think of an image painted on glass or on tracing paper, using translucent paint), the interpretation is that the foreground layer blocks the fraction (1 – α) of the light coming through from the background and contributes a fraction α of its own color to replace what was removed. An example of using Equation (3.2) is shown in Figure 3.14.
对于不透明的前景层,解释是前景对象覆盖像素矩形内的面积 α,而背景对象覆盖剩余面积,即 (1 - α)。对于透明层(想象用半透明颜料在玻璃或描图纸上绘制的图像),解释是前景层阻挡了从背景透过的光线的一部分 (1 - α),并贡献了其自身颜色的一部分 α 来替换被移除的部分。图 3.14显示了使用公式 (3.2) 的示例。
The α values for all the pixels in an image might be stored in a separate grayscale image, which is then known as an alpha mask or transparency mask. Or the information can be stored as a fourth channel in an RGB image, in which case it is called the alpha channel, and the image can be called an RGBA image. With 8-bit images, each pixel then takes up 32 bits, which is a conveniently sized chunk in many computer architectures.
图像中所有像素的 α 值可能会存储在单独的灰度图像中,这被称为alpha mask或透明度蒙版。或者,该信息可以作为 RGB 图像中的第四个通道存储,在这种情况下,它被称为alpha 通道,图像可以称为 RGBA 图像。对于 8 位图像,每个像素占用 32 位,这在许多计算机架构中是一个方便的大小块。
Since the weights of the foreground and background layers add up to 1, the color won’t change if the foreground and background layers have the same color.
由于前景层和背景层的权重之和为 1,因此如果前景层和背景层具有相同的颜色,则颜色不会改变。
Although Equation (3.2) is what is usually used, there are a variety of situations where α is used differently (Porter & Duff, 1984).
尽管通常使用公式 (3.2),但在很多情况下 α 的使用方法有所不同 (Porter & Duff,1984)。
Figure 3.14. An example of compositing using Equation (3.2). The foreground image is in effect cropped by the α channel before being put on top of the background image. The resulting composite is shown on the bottom.
图 3.14。使用公式 (3.2) 进行合成的示例。前景图像实际上被 α 通道裁剪,然后放在背景图像之上。最终的合成显示在底部。
Most RGB image formats use eight bits for each of the red, green, and blue channels. This results in approximately three megabytes of raw information for a single million-pixel image. To reduce the storage requirement, most image formats allow for some kind of compression. At a high level, such compression is either lossless or lossy. No information is discarded in lossless compression, while some information is lost unrecoverably in a lossy system. Popular image storage formats include
大多数 RGB 图像格式对红、绿、蓝通道各使用 8 位。这样,一幅百万像素图像的原始信息量就约为 3 兆字节。为了减少存储需求,大多数图像格式都允许某种压缩。从高层次上讲,这种压缩要么是无损的,要么是有损的。无损压缩不会丢弃任何信息,而有损压缩则会不可挽回地丢失一些信息。流行的图像存储格式包括
jpeg. This lossy format compresses image blocks based on thresholds in the human visual system. This format works well for natural images.
jpeg。这种有损格式根据人类视觉系统中的阈值压缩图像块。这种格式非常适合自然图像。
tiff. This format is most commonly used to hold binary images or losslessly compressed 8- or 16-bit RGB although many other options exist.
tiff。此格式最常用于保存二进制图像或无损压缩的 8 位或 16 位 RGB,尽管还存在许多其他选项。
ppm. This very simple lossless, uncompressed format is most often used for 8-bit RGB images although many options exist.
ppm。这种非常简单的无损、未压缩格式最常用于 8 位 RGB 图像,尽管还存在许多选项。
png. This is a set of lossless formats with a good set of open source management tools.
png。这是一组无损格式,具有一套很好的开源管理工具。
Because of compression and variants, writing input/output routines for images can be involved. Fortunately, one can usually rely on library routines to read and write standard file formats. For quick-and-dirty applications, where simplicity is valued above efficiency, a simple choice is to use raw ppm files, which can often be written simply by dumping the array that stores the image in memory to a file, prepending the appropriate header.
由于压缩和变体,可能需要编写图像的输入/输出例程。幸运的是,通常可以依靠库例程来读取和写入标准文件格式。对于快速而粗糙的应用程序,简单性比效率更重要,一个简单的选择是使用原始 ppm 文件,通常只需将存储图像的数组转储到文件中,并在前面添加适当的标头即可编写该文件。
Why don’t they just make monitors linear and avoid all this gamma business?
为什么他们不把显示器做成线性的,而避免所有这些伽马问题呢?
Ideally, the 256 possible intensities of a monitor should look evenly spaced as opposed to being linearly spaced in energy. Because human perception of intensity is itself nonlinear, a gamma between 1.5 and 3 (depending on viewing conditions) will make the intensities approximately uniform in a subjective sense. In this way, gamma is a feature. Otherwise, the manufacturers would make the monitors linear.
理想情况下,显示器的 256 种可能强度看起来应该是均匀分布的,而不是能量呈线性分布。由于人类对强度的感知本身是非线性的,因此伽马值在 1.5 到 3 之间(取决于观看条件)会使强度在主观意义上大致均匀。这样,伽马值就是一种特性。否则,制造商会将显示器做成线性的。
1. Simulate an image acquired from the Bayer mosaic by taking a natural image (preferably a scanned photo rather than a digital photo where the Bayer mosaic may already have been applied) and creating a grayscale image composed of interleaved red/green/blue channels. This simulates the raw output of a digital camera. Now create a true RGB image from that output and compare with the original.
1.通过拍摄自然图像(最好是扫描照片,而不是可能已经应用了拜耳马赛克的数码照片)并创建由交错的红/绿/蓝通道组成的灰度图像来模拟从拜耳马赛克获取的图像。这模拟了数码相机的原始输出。现在从该输出创建真正的 RGB 图像并与原始图像进行比较。
One of the basic tasks of computer graphics is rendering three-dimensional objects: taking a scene composed of many geometric objects arranged in 3D space and computing a 2D image that shows the objects as viewed from a particular viewpoint. It is the same operation that has been done for centuries by architects and engineers creating drawings to communicate their designs to others.
计算机图形学的基本任务之一是渲染三维物体:将一个由许多几何物体排列在三维空间中组成的场景,计算出一个二维图像,该图像显示从特定视角看到的物体。几个世纪以来,建筑师和工程师一直在做同样的操作,他们绘制图纸以向他人传达他们的设计。
Fundamentally, rendering is a process that takes as its input a set of objects and produces as its output an array of pixels. One way or another, rendering involves considering how each object contributes to each pixel, and it can be organized in two general ways. In object-order rendering, each object is considered in turn, and for each object, all the pixels that it influences are found and updated. In image-order rendering, each pixel is considered in turn, and for each pixel all the objects that influence it are found and the pixel value is computed. You can think of the difference in terms of the nesting of loops: in image-order rendering, the “for each pixel” loop is on the outside, whereas in object-order rendering, the “for each object” loop is on the outside.
从根本上讲,渲染是一个以一组对象作为输入,以像素数组作为输出的过程。无论如何,渲染都涉及考虑每个对象对每个像素的贡献,并且可以通过两种常规方式组织。在对象顺序渲染中,依次考虑每个对象,并且对于每个对象,找到并更新它影响的所有像素。在图像顺序渲染中,依次考虑每个像素,并且对于每个像素,找到影响它的所有对象并计算像素值。您可以从循环嵌套的角度来考虑差异:在图像顺序渲染中,“针对每个像素”循环位于外部,而在对象顺序渲染中,“针对每个对象”循环位于外部。
If the output is a vector image rather than a raster image, rendering doesn’t have to involve pixels, but we’ll assume raster images in this book.
如果输出是矢量图像而不是光栅图像,则渲染不必涉及像素,但本书中我们假设是光栅图像。
Image-order and object-order renderers can compute exactly the same images, but they lend themselves to computing different kinds of effects and have quite different performance characteristics. We’ll explore the comparative strengths of the approaches in Chapter 9 after we have discussed them both, but, broadly speaking, image-order rendering is simpler to get working and more flexible in the effects that can be produced and usually (though not always) takes more execution time to produce a comparable image.
图像顺序和对象顺序渲染器可以计算完全相同的图像,但它们适合计算不同类型的效果,并且具有完全不同的性能特征。在讨论完这两种方法后,我们将在第 9 章中探讨这两种方法的比较优势,但从广义上讲,图像顺序渲染更易于操作,在可以产生的效果方面更灵活,并且通常(但并非总是)需要更多执行时间来生成可比图像。
In a ray tracer, it is easy to compute accurate shadows and reflections, which are awkward in the object-order framework.
在光线追踪器中,很容易计算出精确的阴影和反射,但这在对象顺序框架中却很困难。
Ray tracing is an image-order algorithm for making renderings of 3D scenes, and we’ll consider it first because it’s possible to get a ray tracer working without developing any of the mathematical machinery that’s used for object-order rendering.
光线追踪是一种用于渲染 3D 场景的图像顺序算法,我们首先考虑它,因为有可能使光线追踪器工作而无需开发用于对象顺序渲染的任何数学机制。
A ray tracer works by computing one pixel at a time, and for each pixel, the basic task is to find the object that is seen at that pixel’s position in the image. Each pixel “looks” in a different direction, and any object that is seen by a pixel must intersect the viewing ray, a line that emanates from the viewpoint in the direction that pixel is looking. The particular object we want is the one that intersects the viewing ray nearest the camera, since it blocks the view of any other objects behind it. Once that object is found, a shading computation uses the intersection point, surface normal, and other information (depending on the desired type of rendering) to determine the color of the pixel. This is shown in Figure 4.1, where the ray intersects two triangles, but only the first triangle hit, T2, is shaded.
光线追踪器的工作原理是每次计算一个像素,对于每个像素,基本任务是找到在图像中该像素位置处看到的物体。每个像素“看”向不同的方向,像素看到的任何物体都必须与视线相交,视线是从像素看的方向的视点发出的线。我们想要的特定物体是与最靠近相机的视线相交的物体,因为它挡住了它后面任何其他物体的视线。找到该物体后,着色计算将使用交点、表面法线和其他信息(取决于所需的渲染类型)来确定像素的颜色。如图 4.1所示,其中光线与两个三角形相交,但只有第一个三角形T2被着色。
A basic ray tracer therefore has three parts:
因此,基本光线追踪器有三个部分:
ray generation, which computes the origin and direction of each pixel’s viewing ray based on the camera geometry;
射线生成,根据相机几何形状计算每个像素的视线的原点和方向;
ray intersection, which finds the closest object intersecting the viewing ray;
射线相交,查找与视线相交的最近的物体;
shading, which computes the pixel color based on the results of ray intersection.
着色,根据光线相交的结果计算像素颜色。
Figure 4.1. The ray is “traced” into the scene and the first object hit is the one seen through the pixel. In this case, the triangle T2 is returned.
图 4.1。光线被“追踪”到场景中,第一个击中的物体就是通过像素看到的物体。在本例中,返回三角形T 2 。
The structure of the basic ray tracing program is
基本射线追踪程序的结构是
for each pixel do compute viewing ray find first object hit by ray and its surface normal n set pixel color to value computed from hit point, lights, and n
This chapter covers basic methods for ray generation, ray intersection, and shading, that are sufficient for implementing a simple demonstration ray tracer. For a really useful system, more efficient ray intersection techniques from Chapter 12 need to be added, and the real potential of a ray tracer will be seen with the more advanced rendering techniques from Chapter 14.
本章介绍了射线生成、射线相交和着色的基本方法,这些方法足以实现一个简单的演示射线追踪器。对于真正有用的系统,需要添加第 12 章中更高效的射线相交技术,而射线追踪器的真正潜力将在第 14 章中更高级的渲染技术中得到体现。
The problem of representing a 3D object or scene with a 2D drawing or painting was studied by artists hundreds of years before computers. Photographs also represent 3D scenes with 2D images. While there are many unconventional ways to make images, from cubist painting to fisheye lenses (Figure 4.2) to peripheral cameras, the standard approach for both art and photography, as well as computer graphics, is linear perspective, in which 3D objects are projected onto an image plane in such a way that straight lines in the scene become straight lines in the image.
在计算机出现之前的数百年,艺术家们就研究过如何用二维绘图或绘画来表现三维物体或场景。照片也是用二维图像来表现三维场景。虽然有很多非常规的图像制作方法,从立体派绘画到鱼眼镜头(图 4.2 )再到外围相机,但艺术和摄影以及计算机图形学的标准方法是线性透视,其中三维物体以这样的方式投影到图像平面上,即场景中的直线变成图像中的直线。
Figure 4.2. An image taken with a fisheye lens is not a linear perspective image. Photo courtesy Philip Greenspun.
图 4.2.用鱼眼镜头拍摄的图像不是线性透视图像。照片由 Philip Greenspun 提供。
The simplest type of projection is parallel projection, in which 3D points are mapped to 2D by moving them along a projection direction until they hit the image plane (Figures 4.3–4.4). The view that is produced is determined by the choice of projection direction and image plane. If the image plane is perpendicular to the view direction, the projection is called orthographic; otherwise, it is called oblique.
最简单的投影类型是平行投影,其中 3D 点通过沿投影方向直到它们到达图像平面(图 4.3 – 4.4 )。产生的视图由投影方向和图像平面的选择决定。如果图像平面垂直于视图方向,则投影称为正交投影;否则,投影称为斜交投影。
Some books reserve “orthographic” for projection directions that are parallel to the coordinate axes.
有些书籍将“正交”保留为与坐标轴平行的投影方向。
Figure 4.3. When projection lines are parallel and perpendicular to the image plane, the resulting views are called orthographic.
图 4.3.当投影线平行且垂直于图像平面时,所得到的视图称为正交视图。
Parallel projections are often used for mechanical and architectural drawings because they keep parallel lines parallel and they preserve the size and shape of planar objects that are parallel to the image plane.
平行投影通常用于机械和建筑绘图,因为它们保持平行线平行,并保留与图像平面平行的平面物体的大小和形状。
Figure 4.4. A parallel projection that has the image plane at an angle to the projection direction is called oblique (right). In perspective projection, the projection lines all pass through the viewpoint, rather than being parallel (left). The illustrated perspective view is non-oblique because a projection line drawn through the center of the image would be perpendicular to the image plane.
图 4.4。平行投影的图像平面与投影方向成一定角度,这种投影称为斜投影(右)。在透视投影中,投影线全部通过视点,而不是平行(左)。图示的透视图是非斜的,因为通过图像中心绘制的投影线将垂直于图像平面。
The advantages of parallel projection are also its limitations. In our everyday experience (and even more so in photographs), objects look smaller as they get farther away, and as a result, parallel lines receding into the distance do not appear parallel. This is because eyes and cameras don’t collect light from a single viewing direction; they collect light that passes through a particular viewpoint. As has been recognized by artists since the Renaissance, we can produce natural-looking views using perspective projection: we simply project along lines that pass through a single point, the viewpoint, rather than along parallel lines (Figure 4.4). In this way, objects farther from the viewpoint naturally become smaller when they are projected. A perspective view is determined by the choice of viewpoint (rather than projection direction) and image plane. As with parallel views, there are oblique and non-oblique perspective views; the distinction is made based on the projection direction at the center of the image.
平行投影的优点也是它的局限性。在我们的日常生活中(在照片中更是如此),物体越远看起来就越小,因此,向远处延伸的平行线看起来并不平行。这是因为眼睛和相机不会从单一的观察方向收集光线;它们会收集通过特定视点的光线。正如文艺复兴时期艺术家们所认识到的那样,我们可以使用透视投影:我们只是沿着通过单个点(视点)的线进行投影,而不是沿着平行线进行投影(图 4.4 )。这样,距离视点较远的物体在投影时自然会变小。透视图由视点(而不是投影方向)和图像平面的选择决定。与平行视图一样,透视图也有斜透视图和非斜透视图;区别在于图像中心的投影方向。
You may have learned about the artistic conventions of three-point perspective, a system for manually constructing perspective views (Figure 4.5). A surprising fact about perspective is that all the rules of perspective drawing will be followed automatically if we follow the simple mathematical rule underlying perspective: objects are projected directly toward the eye, and they are drawn where they meet a view plane in front of the eye.
您可能已经了解了三点透视的艺术惯例,这是一种手动构建透视图的系统(图 4.5 )。关于透视的一个令人惊讶的事实是,如果我们遵循透视背后的简单数学规则,透视图的所有规则都会自动遵循:物体直接投射到眼睛上,并在它们与眼睛前方的视平面相交的位置绘制它们。
From the previous section, the basic tools of ray generation are the viewpoint (or view direction, for parallel views) and the image plane. There are many ways to work out the details of camera geometry; in this section, we explain one based on orthonormal bases that supports normal and oblique parallel and orthographic views.
从上一节中我们了解到,射线生成的基本工具是视点(或平行视图的视线方向)和图像平面。有很多方法可以确定相机几何的细节;在本节中,我们将介绍一种基于正交基的方法,它支持正交和斜交平行和正交视图。
Figure 4.5. In three-point perspective, an artist picks “vanishing points” where parallel lines meet. Parallel horizontal lines will meet at a point on the horizon. Every set of parallel lines has its own vanishing points. These rules are followed automatically if we implement perspective based on the correct geometric principles.
图 4.5。在三点透视中,艺术家选择平行线相交的“消失点”。平行的水平线将在水平线上的某个点相交。每组平行线都有自己的消失点。如果我们根据正确的几何原理实现透视,这些规则就会自动遵循。
In order to generate rays, we first need a mathematical representation for a ray. A ray is really just an origin point and a propagation direction; a 3D parametric line is ideal for this. As discussed in Section 2.7.7, the 3D parametric line from the eye e through a point s on the image plane (Figure 4.6) is given by
为了生成射线,我们首先需要射线的数学表示。射线实际上只是一个原点和一个传播方向;3D 参数线是理想的选择。如第 2.7.7 节所述,从眼睛e到图像平面上的点s 的3D 参数线(图 4.6 )由下式给出
This should be interpreted as, “we advance from e along the vector (s – e) a fractional distance t to find the point p.” So given t, we can determine a point p. The point e is the ray’s origin,and s – e is the ray’s direction.
这应该被解释为,“我们从e沿着向量 ( s – e ) 前进一段距离t来找到点p 。”因此,给定t ,我们可以确定一个点p 。点e是射线的原点, s – e是射线的方向。
Note that p(0) = e, and p(1) = s, and more generally, if 0 < t1 < t2, then p(t1) is closer to the eye than p(t2) . Also, if t < 0,then p(t) is “behind” the eye. These facts will be useful when we search for the closest object hit by the ray that is not behind the eye.
请注意, p (0) = e ,且p (1) = s ,更一般地,如果 0 < t 1 < t 2 ,则p ( t 1 ) 比p ( t 2 ) 更靠近眼睛。此外,如果t < 0 ,则p ( t ) 位于眼睛“后面”。当我们寻找不在眼睛后面的射线击中的最近物体时,这些事实将很有用。
Caution: we are overloading the variable t, which is the ray parameter and also the v-coordinate of the top edge of the image.
注意:我们正在重载变量t ,它是射线参数,也是图像顶部边缘的v坐标。
Rays are invariably represented in code using some kind of structure or object that stores the position and direction. For instance, in an object-oriented program we might write:
射线总是用某种结构或对象来表示,这些结构或对象存储了射线的位置和方向。例如,在面向对象的程序中,我们可能会这样写:
Figure 4.6. The ray from the eye through a point on the image plane.
图 4.6.从眼睛发出的光线穿过图像平面上的某一点。
class Ray Vec3 o | ray origin Vec3 d | ray direction Vec3 evaluate(real t) return o + td
Figure 4.7. The sample points on the screen are mapped to a similar array on the 3D window. A viewing ray is sent to each of these locations.
图 4.7。屏幕上的采样点被映射到 3D 窗口上的类似阵列。查看射线被发送到每个位置。
We are assuming there is a class Vec3 that represents three-dimensional vectors and supports the usual arithmetic operations.
我们假设有一个类Vec3 ,它表示三维向量并支持通常的算术运算。
To compute a viewing ray, we need to know e (which is given) and s. Finding s may seem difficult, but it is actually straightforward if we look at the problem in the right coordinate system.
要计算视线,我们需要知道e (给定)和s 。找到s似乎很困难,但如果我们在正确的坐标系中看待问题,它实际上很简单。
Figure 4.8. The vectors of the camera frame, together with the view direction and up direction. The w vector is opposite the view direction, and the v vector is coplanar with w and the up vector.
图 4.8。相机框架的向量以及视线方向和向上方向。w向量与视线方向相反, v向量与w和向上向量共面。
All of our ray-generation methods start from an orthonormal coordinate frame known as the camera frame (Figure 4.7), which we’ll denote by e, for the eye point, or viewpoint, and u, v, and w for the three basis vectors, organized with u pointing rightward (from the camera’s view), v pointing upward, and w pointing backward, so that {u, v, w} forms a right-handed coordinate system. The most common way to construct the camera frame is from the viewpoint, which becomes e, the view direction,which is –w, and the up vector, which is used to construct a basis that has v and w in the plane defined by the view direction and the up direction, using the process for constructing an orthonormal basis from two vectors described in Section 2.4.7 (Figure 4.8).
我们所有的射线生成方法都从一个正交坐标系开始,即相机坐标系(图 4.7 ),我们将其表示为e ,表示视点或视点, u 、 v和w表示三个基向量,其中u指向右(从相机的视角看), v指向上, w指向后,因此{ u , v , w }形成一个右手坐标系。构建相机坐标系最常见的方式是从视点(变为e )、视线方向(即-w )和向上向量(用于构建一个基,该基具有v和w,位于由视线方向和向上方向定义的平面中),使用第 2.4.7 节中描述的从两个向量构建正交基的过程(图 4.8 )。
Since v and w have to be perpendicular, the up vector and v are not generally the same. But setting the up vector to point straight upward in the scene will orient the camera in the way we would think of as “up-right.”
由于v和w必须垂直,所以向上向量和v通常不相同。但是将向上向量设置为在场景中垂直向上将使相机朝向我们认为的“直上”的方向。
For an orthographic view, all the rays will have the direction –w. Even though a parallel view doesn’t have a viewpoint per se, we can still use the origin of the camera frame to define the plane where the rays start, so that it’s possible for objects to be behind the camera.
对于正交视图,所有射线的方向都是 -w 。尽管平行视图本身没有视点,但我们仍然可以使用相机框架的原点来定义射线开始的平面,这样物体就有可能位于相机后面。
The viewing rays should start on the plane defined by the point e and the vectors u and v; the only remaining information required is where on the plane the image is supposed to be. We’ll define the image dimensions with four numbers, for the four sides of the image: l and r are the positions of the left and right edges of the image, as measured from e along the u direction; and b and t are the positions of the bottom and top edges of the image, as measured from e along the v direction. Usually, l < 0 < r and b < 0 < t. (SeeFigure4.9a.)
视线应从点e和向量u和v定义的平面开始;唯一需要的剩余信息是图像应该位于平面的哪个位置。我们将用四个数字定义图像尺寸,表示图像的四个边: l和r是图像左边缘和右边缘的位置,从e沿u方向测量; b和t是图像下边缘和上边缘的位置,从e沿v方向测量。通常, l < 0 < r和b < 0 < t 。(见图 4.9a。)
Figure 4.9. Ray generation using the camera frame. (a) In an orthographic view, the rays start at the pixels’ locations on the image plane, and all share the same direction, which is equal to the view direction. (b) In a perspective view, the rays start at the viewpoint, and each ray’s direction is defined by the line through the viewpoint, e, and the pixel’s location on the image plane.
图 4.9.使用相机框架生成射线。 (a) 在正交视图中,射线从图像平面上的像素位置开始,并且所有射线都共享相同的方向,该方向等于视线方向。 (b) 在透视视图中,射线从视点开始,每条射线的方向由通过视点的线e和图像平面上的像素位置定义。
It might seem logical that orthographic viewing rays should start from infinitely far away, but then it would not be possible to make orthographic views of an object inside a room, for instance.
正交观察光线应该从无限远的地方开始,这看起来合乎逻辑,但是这样就不可能对房间内的物体进行正交视图。
In Section 3.2, we discussed pixel coordinates in an image. To fit an image with nx × ny pixels into a rectangle of size (r – l)×(t–b) , the pixels are spaced a distance (r – l)/nx apart horizontally and (t – b)/ny apart vertically, with a half-pixel space around the edge to center the pixel grid within the image rectangle. This means that the pixel at position (i, j) in the raster image has the position
在第 3.2 节中,我们讨论了图像中的像素坐标。为了将具有n x × n y个像素的图像放入大小为 ( r – l ) × ( t – b ) 的矩形中,像素在水平方向上的间距为 ( r – l ) /n x ,在垂直方向上的间距为 ( t – b ) /n y ,并在边缘周围留出半个像素的空间,以使像素网格在图像矩形内居中。这意味着光栅图像中位置 ( i, j ) 处的像素具有位置
Many systems assume that l = – r and b = – t so that a width and a height suffice.
许多系统假设l = – r和b = – t,因此宽度和高度就足够了。
where (u, v) are the coordinates of the pixel’s position on the image plane, measured with respect to the origin e and the basis {u, v}.
其中 ( u, v ) 是图像平面上像素位置的坐标,相对于原点e和基{ u , v }进行测量。
With l and r both specified, there is redundancy: moving the viewpoint a bit to the right and correspondingly decreasing l and r will not change the view (and similarly on the v-axis).
当l和r同时指定时,就会出现冗余:将视点稍微向右移动,并相应地减小l和r不会改变视图( v轴也是如此)。
In an orthographic view, we can simply use the pixel’s image-plane position as the ray’s starting point, and we already know the ray’s direction is the view direction. The procedure for generating orthographic viewing rays is then
在正交视图中,我们可以简单地使用像素的图像平面位置作为射线的起点,并且我们已经知道射线的方向是视线方向。生成正交视线的过程如下
compute u and v using (4.1)
使用(4.1)计算u和v
ray.o ← e + u u + v v
射线。o ← e + u u + v v
ray.d ←–w
射线。d ←– w
It’s very simple to make an oblique parallel view: just allow the image plane normal w to be specified separately from the view direction d. The procedure is then exactly the same, but with d substituted for –w. Of course, w is still used to construct u and v.
制作斜平行视图非常简单:只需允许将图像平面法线w与视图方向d分开指定即可。然后过程完全相同,但用d代替-w 。当然, w仍然用于构造u和 v 。
For a perspective view, all the rays have the same origin, at the viewpoint; it is the directions that are different for each pixel. The image plane is no longer positioned at e, but rather some distance d in front of e; this distance is the image plane distance, often loosely called the focal length, because choosing d plays the same role as choosing focal length in a real camera. The direction of each ray is defined by the viewpoint and the position of the pixel on the image plane. This situation is illustrated in Figure 4.9, and the resulting procedure is similar to the orthographic one:
对于透视图,所有射线的原点都相同,即视点;但每个像素的方向不同。图像平面不再位于e处,而是位于e前方一定距离d处;这个距离就是图像平面距离,通常被笼统地称为焦距,因为选择d与在真实相机中选择焦距的作用相同。每条射线的方向由视点和图像平面上像素的位置定义。这种情况如图 4.9所示,其结果过程与正交过程类似:
compute u and v using (4.1)
使用(4.1)计算u和v
ray.o ← e
射线。o ← e
ray.d ←– d w + u u + v v
射线。d ← – d w + u u + v v
As with parallel projection, oblique perspective views can be achieved by specifying the image plane normal separately from the projection direction.
与平行投影一样,可以通过分别指定图像平面法线和投影方向来实现斜透视视图。
Once we’ve generated a ray e + td, we next need to find the first intersection with any object where t > 0. In practice, it turns out to be useful to solve a slightly more general problem: find the first intersection between the ray and a surface that occurs at a t in the interval [t0,t1]. The basic ray intersection is then the case where t0 = 0 and t1 = +∞. We solve this problem for both spheres and triangles. In the next section, multiple objects are discussed.
一旦我们生成了射线e + t d ,接下来我们需要找到与任何物体的第一个交点,其中t > 0。实际上,解决一个稍微更一般的问题很有用:找到射线与在区间 [ t 0 ,t 1 ] 中t时刻出现的表面之间的第一个交点。那么基本射线交点就是t 0 = 0 和t 1 = +∞ 的情况。我们针对球体和三角形解决了这个问题。在下一节中,我们将讨论多个对象。
Given a ray p(t) = e + td and an implicit surface f (p) = 0 (see Section 2.7.3), we’d like to know where they intersect. Intersection points occur when points on the ray satisfy the implicit equation, so the values of t we seek are those that solve the equation
给定一条射线p ( t ) = e + t d和一个隐式曲面f ( p ) = 0(参见第 2.7.3 节),我们想知道它们在何处相交。当射线上的点满足隐式方程时,就会出现交点,因此我们寻找的t值就是那些可以解方程的值
A sphere with center c = (xc ,yc ,zc) and radius R can be represented by the implicit equation
一个球体,其中心为c = ( x c , y c , z c ) ,半径为R ,可以用隐式方程表示
We can write this same equation in vector form:
我们可以将同样的方程写成矢量形式:
Any point p that satisfies this equation is on the sphere. If we plug points on the ray p(t) = e + td into this equation, we get an equation in terms of t that is satisfied by the values of t that yield points on the sphere:
满足该方程的任何点p都在球面上。如果我们将射线p ( t ) = e + t d上的点代入该方程,我们将得到一个关于t的方程,该方程由球面上的点的t值满足:
Rearranging terms yields
重新排列项可得出
Here, everything is known except the parameter t, so this is a classic quadratic equation in t, meaning it has the form
这里,除了参数t之外,其他都是已知的,所以这是一个经典的t二次方程,也就是说它的形式为
The solution to this equation is discussed in Section 2.2. The term under the square root sign in the quadratic solution, B2 – 4AC, is called the discriminant and tells us how many real solutions there are. If the discriminant is negative, its square root is imaginary and the line and sphere do not intersect. If the discriminant is positive, there are two solutions: one solution where the ray enters the sphere and one where it leaves. If the discriminant is zero, the ray grazes the sphere, touching it at exactly one point. Plugging in the actual terms for the sphere and canceling a factor of two, we get
2.2 节讨论了该方程的解。二次解B 2 – 4 AC中平方根符号下面的项称为判别式,它告诉我们有多少个实数解。如果判别式为负,则其平方根为虚数,线与球不相交。如果判别式为正,则有两个解:一个是射线进入球面的解,另一个是射线离开球面的解。如果判别式为零,则射线擦过球面,恰好在一个点接触球面。代入球面的实际项并取消两个因子,我们得到
In an actual implementation, you should first check the value of the discriminant before computing other terms. To correctly find the closest intersection in the interval [t0,t1], there are three cases: if the smaller of the two solutions is in the interval, it is the first hit; otherwise, if the larger solution is in the interval, it is the first hit; otherwise, there is no hit.
在实际实现中,应该先检查判别式的值,然后再计算其他项。为了正确找到区间[ t 0 ,t 1 ] 中的最近交点,有三种情况:如果两个解中较小的一个在区间内,则它是第一个命中;否则,如果较大的解在区间内,则它是第一个命中;否则,没有命中。
As discussed in Section 2.7.4, the normal vector at point p is given by the gradient n = 2(p – c) . The unit normal is (p – c)/R.
如2.7.4 节所述,点p处的法向量由梯度n = 2( p - c ) 给出。单位法向量为 ( p - c ) / R 。
There are many algorithms for computing ray-triangle intersections. We will present the form that uses barycentric coordinates for the parametric plane containing the triangle, because it requires no long-term storage other than the vertices of the triangle (Snyder & Barr, 1987).
计算射线三角形交点的算法有很多种。我们将介绍一种使用重心坐标作为包含三角形的参数平面的形式,因为除了三角形的顶点之外,它不需要长期存储(Snyder & Barr,1987)。
To intersect a ray with a parametric surface, we set up a system of equations where the Cartesian coordinates all match:
为了使射线与参数曲面相交,我们建立了一个方程组,其中笛卡尔坐标全部匹配:
Here, we have three equations and three unknowns (t, u, and v). In the case where the surface is a parametric plane, the parametric equation is linear and can be written in vector form as discussed in Section 2.9.2. If the vertices of the triangle are a, b,and c, then the intersection will occur when
这里,我们有三个方程和三个未知数( t 、 u和v )。如果表面是参数平面,则参数方程是线性的,可以写成矢量形式,如第 2.9.2 节所述。如果三角形的顶点为a 、 b和c ,则当
for some t, β,and γ. Solving this equation tells us both t, which locates the intersection point along the ray, and (β, γ) , which locates the intersection point relative to the triangle. The intersection p will be at e+td as shown in Figure 4.10. Again from Section 2.9.2, we know the intersection is inside the triangle if and only if β > 0, γ > 0, and β + γ < 1. Otherwise, the ray has hit the plane outside the triangle, so it misses the triangle. If there are no solutions, either the triangle is degenerate or the ray is parallel to the plane containing the triangle.
其中t 、 β和γ为某个值。解此方程可得出t (它确定了射线上的交点位置)和( β, γ )(它确定了交点相对于三角形的位置)。交点p位于e + t d处,如图 4.10所示。同样从第 2.9.2 节可知,当且仅当β > 0、 γ > 0 和β + γ < 1 时,交点才位于三角形内部。否则,射线就会击中三角形外部的平面,因此会错过三角形。如果没有解,则要么三角形已退化,要么射线平行于包含三角形的平面。
Figure 4.10. The ray hits the plane containing the triangle at point p.
图 4.10射线在点 p 处与包含三角形的平面相交。
To solve for t, β, and γ in Equation (4.2), we expand it from its vector form into the three equations for the three coordinates:
为了求解方程 (4.2) 中的t 、 β和γ ,我们将其从矢量形式展开为三个坐标的三个方程:
This can be rewritten as a standard linear system:
这可以重写为标准线性系统:
The fastest classic method to solve this 3 × 3 linear system is Cramer’s rule. This gives us the solutions
解决这个 3 × 3 线性系统的最快经典方法是克莱姆法则。这给了我们解决方案
where the matrix A is
其中矩阵A是
and |A| denotes the determinant of A. The 3 × 3 determinants have common subterms that can be exploited for efficiency in implementation. Looking at the linear systems with dummy variables
并且 | A | 表示A的行列式。3 × 3 行列式具有共同的子项,可以利用这些子项来提高实施效率。查看具有虚拟变量的线性系统
Cramer’s rule gives us
克莱默规则告诉我们
where
在哪里
We can reduce the number of operations by reusing numbers such as “ei-minus-hf.”
我们可以通过重复使用诸如“ ei-minus-hf ”之类的数字来减少运算次数。
The algorithm for the ray-triangle intersection for which we need the linear solution can have some conditions for early termination. Thus, the function should look something like:
我们需要线性解决方案的射线三角形相交算法可以有一些提前终止的条件。因此,该函数应该看起来像这样:
boolean raytri (Ray r, vector3 a, vector3 b, vector3 c,
interval [t0,t1])
compute t
if (t < t0) or (t > t1) then
return false
compute γ
if (γ < 0) or (γ > 1) then
return false
compute β
if (β < 0) or (β > 1 – γ) then
return false
return true
In a ray tracing program, it is a good idea to use an object-oriented design that has a class called something like Surface with derived classes Triangle, Sphere, etc. Anything that a ray can intersect, including groups of surfaces or efficiency structures (Section 12.3) should be a subclass of Surface. The ray-tracing program would then have one reference to a Surface for the whole model, and new types of objects and efficiency structures can be added transparently.
在射线追踪程序中,使用面向对象设计是一个好主意,该设计具有一个名为Surface 的类,并派生出Triangle 、 Sphere等类。射线可以相交的任何东西,包括曲面组或效率结构(第 12.3 节)都应该是 Surface 的子类。这样,射线追踪程序将对整个模型拥有一个对 Surface 的引用,并且可以透明地添加新类型的对象和效率结构。
The key interface of the Surface class is a method to intersect a ray (Kirk & Arvo, 1988).
Surface 类的关键接口是与射线相交的方法(Kirk & Arvo,1988)。
class Surface
HitRecord hit(Ray r, real t0, real t1)
Here, (t0,t1) is the interval on the ray where hits will be returned, and HitRecord is a class that contains all the data about the surface intersection that will be needed:
这里,( t0 ,t1 )是射线上返回命中的间隔,HitRecord 是一个包含所需的表面相交的所有数据的类:
class HitRecord
Surface s | surface that was hit
real t | coordinate of hit point along the ray
Vec3 n | surface normal at the hit point
.
.
.
The surface that was hit, the t value, and the surface normal are the minimum required, but other data such as texture coordinates or tangent vectors may be stored as well. Depending on the language, the hit record might not be literally returned from the function but rather passed by reference and filled in. A miss can be indicated by a hit that has t = ∞.
命中的表面、 t值和表面法线是最低要求,但其他数据(如纹理坐标或切线向量)也可以存储。根据语言的不同,命中记录可能不会从函数中直接返回,而是通过引用传递并填写。如果命中t = ∞,则表示未命中。
Of course, most interesting scenes consist of more than one object, and when we intersect a ray with the scene, we must find only the closest intersection to the camera along the ray. A simple way to implement this is to think of a group of objects as itself being another type of object. To intersect a ray with a group, you simply intersect the ray with the objects in the group and return the intersection with the smallest t value. The following code tests for hits in the interval t ∈ [t0,t1]:
当然,大多数有趣的场景都由多个物体组成,当我们将光线与场景相交时,我们必须只找到沿光线距离相机最近的交点。实现这一点的一种简单方法是将一组物体视为另一种类型的物体。要将光线与一组物体相交,只需将光线与组中的物体相交并返回具有最小t值的交点。以下代码测试区间t ∈ [ t 0 ,t 1 ] 内的命中:
class Group, subclass of Surface
list-of-Surface surfaces | list of all surfaces in the group
HitRecord hit(Ray ray, real t0, real t1)
HitRecord closest-hit(∞) | initialize to indicate miss
for surf in surfaces do
rec = surf.hit(ray, t0, t1)
if rec.t < ∞ then
closest-hit = rec
t1 = t
return closest-hit
Note that this code shrinks the intersection interval [t0,t1] so that the call to surf.hit will only hit surfaces that are closer than the closest one seen so far.
请注意,此代码缩小了交点间隔 [ t 0 ,t 1 ],因此对 surf.hit 的调用只会击中比迄今为止看到的最近表面更近的表面。
Once ray-scene intersection works, we can render an image like Figure 4.11, but nicer results depend on including more visual cues, as we describe next.
一旦光线场景相交起作用,我们就可以渲染像图 4.11那样的图像,但更好的结果取决于包含更多的视觉提示,正如我们接下来所描述的。
Figure 4.11. A simple scene rendered with only ray generation and surface intersection, but no shading; each pixel is just set to a fixed color depending on which object it hit.
图 4.11.渲染的简单场景仅包含射线生成和表面相交,但没有阴影;每个像素仅根据其击中的物体设置为固定颜色。
Once the visible surface for a pixel is known, the pixel value is computed by evaluating a shading model. How this is done depends entirely on the application— methods range from simple heuristics to elaborate physics-based models. Exactly the same shading models can be used in ray tracing or in object-order rendering methods.
一旦知道像素的可见表面,就可以通过评估着色模型来计算像素值。如何做到这一点完全取决于应用程序——方法范围从简单的启发式到复杂的基于物理的模型。完全相同的着色模型可用于光线追踪或对象顺序渲染方法。
Chapter 5 describes a simple shading model that is suitable for a basic ray tracer and that is the one we used to make the renderings in this chapter. For more realism, you can upgrade to the models discussed in Chapter 14, which are much more true to the physics of real surfaces. Here, we will discuss how a ray tracer computes the inputs to shading.
第 5 章描述了一个适用于基本光线追踪器的简单着色模型,这也是我们在本章中用于制作渲染的模型。为了获得更逼真的效果,您可以升级到第 14 章中讨论的模型,这些模型更符合真实表面的物理特性。在这里,我们将讨论光线追踪器如何计算着色的输入。
To support shading, a ray tracing program always has a list of light sources. For the Chapter 5 shading model, we need three types of lights: point lights, which emit light from a point in space, directional lights, which illuminate the scene from a single direction, and ambient lights, which provide constant illumination to fill in the shadows. In fancier systems, other types of lights are supported, such as area lights (which are basically scene geometry that emits light) or environment lights (which use an image to represent light coming from far-away sources like the sky).
为了支持着色,光线追踪程序始终有一个光源列表。对于第 5 章的着色模型,我们需要三种类型的光源:点光源(从空间中的一点发射光)、定向光源(从单一方向照亮场景)和环境光源(提供恒定的照明以填充阴影)。在更高级的系统中,还支持其他类型的光源,例如区域光源(基本上是发射光的场景几何体)或环境光源(使用图像来表示来自远处光源(如天空)的光)。
Computing shading from a point or directional light source requires certain geometric information, and in a ray tracer, after a viewing ray has been determined to hit the surface, we have all we need to determine these four vectors:
计算点光源或定向光源的阴影需要一定的几何信息,在光线追踪器中,在确定视线照射到表面后,我们就可以确定以下四个向量:
The shading point x can be computed by evaluating the viewing ray at the t value of the intersection.
可以通过评估交点t值处的视线来计算着色点x 。
The surface normal n depends on the type of surface (sphere, triangle, etc.), and every surface needs to be able to compute its normal at the point where a ray intersects it.
表面法线n取决于表面的类型(球体、三角形等),并且每个表面都需要能够计算射线与其相交点处的法线。
The light direction l is computed from the light source position or direction as part of shading.
光线方向l是根据光源位置或方向计算得出的,作为阴影的一部分。
The viewing direction v is simply opposite the direction of the viewing ray (v = –d/ ||d||).
观察方向v与观察射线的方向正好相反( v = – d / ||d|| )。
The shading from an ambient source is much simpler: there is no l since light comes from everywhere; the shading does not depend on v; and for the simple models of Chapter 5, it doesn’t even depend on x or n.
环境源的阴影要简单得多:没有l ,因为光来自四面八方;阴影不依赖于v ;对于第 5 章的简单模型,它甚至不依赖于x或n 。
Computing shading in a scene containing several lights is simply a matter of adding up the contributions of the lights. In a basic ray tracer, you can simply loop over all the light sources, computing shading from each one, and accumulate the results into the pixel color.
在包含多个光源的场景中计算着色只需将光源的贡献相加即可。在基本的光线追踪器中,您可以简单地循环遍历所有光源,计算每个光源的着色,并将结果累积到像素颜色中。
A ray tracing program usually contains objects representing light sources and materials. Light sources can be instances of subclasses of a Light class, and they must include enough information to fully describe the light source. Since shading also requires parameters describing the material of the surface, another class that is useful is Material, which encapsulates everything needed to evaluate the shading model.
光线追踪程序通常包含表示光源和材质的对象。光源可以是Light类的子类的实例,并且它们必须包含足够的信息来完整描述光源。由于着色还需要描述表面材质的参数,因此另一个有用的类是Material ,它封装了评估着色模型所需的一切。
Different systems take different approaches to breaking up the shading calculations between lights and materials. An approach that aligns with the presentation in this chapter is to make lights responsible for the overall illumination computation and materials responsible for computing BRDF values. With this setup, the interfaces of these classes might look like:
不同的系统采用不同的方法来分解光源和材质之间的着色计算。与本章介绍的一致方法是让光源负责整体照明计算,让材质负责计算 BRDF 值。使用此设置,这些类的接口可能如下所示:
class Light
Color illuminate(Ray ray, HitRecord hrec)
class Material
Color evaluate(Vec3 l, Vec3 v, Vec3 n)
Each surface would then store a reference to its material, and in this way, point light illumination might be implemented as follows:
然后,每个表面都会存储对其材质的引用,这样,点光源照明可以按如下方式实现:
class PointLight, subclass of Light
Color I
Vec3 p
Color illuminate(Ray ray, HitRecord hrec)
Vec3 x = ray.evaluate(hrec.t)
real r = p – x
Vec3 l = (p – x)/r
Vec3 n = hrec.normal
Color E = max(0, n · l) I/r2
Color k = hrec.surface.material.evaluate(l, v, n)
return kE
These computations assume the class Color carries the RGB components of a color and supports componentwise multiplication. This arrangement is also amenable to treating ambient lighting as a light source, by making the ambient coefficient a property of the material:
这些计算假设Color类带有颜色的 RGB 分量并支持分量相乘。通过将环境系数设为材质的属性,此安排还适用于将环境光视为光源:
class AmbientLight, subclass of Light
Color Ia
Color illuminate(Ray ray, HitRecord hrec)
Color ka = hrec.surface.material.ka
return ka Ia
The complete calculation for shading a ray, including the intersection and handling several lights, can look like this:
光线着色的完整计算(包括相交和处理多个灯光)如下所示:
function shade-ray(Ray ray, realt0, realt1)
HitRecord rec = scene.hit(ray,t0,t1)
if rec.t < ∞ then
Color c = 0
for light in scene.lights do
c = c + light.illuminate(ray, rec)
return c
else return background-color
This setup keeps materials and lights reasonably separate and allows you to later add new kinds of materials and lights transparently. Textures add some complexity to the architecture of a ray tracer; see Section 11.2.5.
此设置可将材质和灯光合理地分开,并允许您稍后透明地添加新类型的材质和灯光。纹理为光线追踪器的架构增加了一些复杂性;请参阅第 11.2.5 节。
Figure 4.12. A simple scene rendered with shading from two point sources using the shading model of Chapter 5.
图 4.12.使用第 5 章的着色模型从两个点源渲染的简单场景。
By itself, shading makes images of 3D objects more realistic and understandable, but it doesn’t show their interactions with other objects. For instance, the spheres in Figure 4.12 appear to float above the floor they are resting on.
阴影本身可以使 3D 物体的图像更加逼真和易于理解,但它不会显示它们与其他物体的相互作用。例如,图 4.12中的球体似乎漂浮在它们所处的地板上方。
Once you have basic shading in your ray tracer, shadows for point and directional lights can be added very easily. If we imagine ourselves at a point x on a surface being shaded, the point is in shadow if we “look” towards the light source and see an object between us and the light source. If there are no objects in between, then the light is not blocked.
一旦光线追踪器中有了基本的着色功能,就可以非常轻松地为点光源和定向光源添加阴影。如果我们想象自己位于被阴影表面的x点,如果我们“看”向光源并看到我们和光源之间有一个物体,则该点处于阴影中。如果两者之间没有物体,则光线不会被阻挡。
Figure 4.13. The point p is not in shadow, while the point q is in shadow.
图 4.13。点p不在阴影中,而点q在阴影中。
This is shown in Figure 4.13, where the ray x + tl does not hit any objects and thus the point x is not in shadow. On the other hand, the point x is in shadow because the ray x + tl does hit an object. The rays that determine in or out of shadow are called shadow rays to distinguish them from viewing rays.
如图 4.13所示,射线x + t l没有击中任何物体,因此点x不在阴影中。另一方面,点x处于阴影中,因为射线x + t l确实击中了物体。确定是否在阴影内的射线称为阴影射线,以区别于观察射线。
Figure 4.14. By testing in the interval starting at ̭, we avoid numerical imprecision causing the ray to hit the surface p is on.
图 4.14.通过在从 ̭ 开始的区间内进行测试,我们避免了数值不精确导致射线击中p所在的表面。
To get the algorithm for shading, we add an if statement to the code that adds shading from a light source to first determine whether the light is shadowed. In a naive implementation, the shadow ray will check for t ∈ [0,r], but because of numerical imprecision, this can result in an intersection with the surface on which p lies. Instead, the usual adjustment to avoid that problem is to test for t ∈ [ , r] where is some small positive constant (Figure 4.14).
为了获得阴影算法,我们在代码中添加了一个 if 语句,该语句添加了来自光源的阴影,首先确定光是否被阴影覆盖。在一个简单的实现中,阴影射线将检查t ∈ [0 ,r ],但由于数值不精确,这可能导致与p所在的表面相交。相反,避免该问题的通常调整是测试t ∈ [ , r ],其中是一些小的正常数(图 4.14 )。
A shadow test can be added to the method PointLight.illuminate shown above by tracing a shadow ray and adding a conditional:
可以通过追踪阴影射线并添加条件,将阴影测试添加到上面显示的 PointLight.illuminate 方法中:
HitRecord srec = scene.hit(Ray(x, l), ,r) if srec.t < ∞ then proceed with normal illumination calculation else return 0 | shading point is in shadow
HitRecord srec = scene.hit(Ray( x , l ) , ,r )如果srec .t < ∞则继续进行正常照明计算,否则返回0 |着色点在阴影中
The shadow test for directional lights is similar but uses t1 = ∞ rather than r. Note that the illumination computation for each light requires a separate shadow ray, and there is no shadow test in computing ambient shading.
平行光的阴影测试类似,但使用t 1 = ∞ 而不是r 。请注意,每个光源的照明计算都需要单独的阴影射线,并且在计算环境光阴影时没有阴影测试。
Shadows serve an important visual role in showing the relationships between nearby objects, as shown in Figure 4.15.
阴影在显示附近物体之间的关系方面起着重要的视觉作用,如图 4.15所示。
It is straightforward to add ideal specular reflection, or mirror reflection,toaray-tracing program. The key observation is shown in Figure 4.16 where a viewer looking from direction e sees what is in direction r as seen from the surface. The vector r is the reflection of the vector –d across the surface normal n, which can be computed using the projection of d onto the direction of the surface normal:
向光线追踪程序添加理想镜面反射或镜面反射非常简单。关键观察结果如图 4.16所示,从方向e观察的观察者看到的是方向r上的东西,就像从表面看到的一样。向量r是向量-d在表面法线n上的反射,可以使用d在表面法线方向上的投影来计算:
Figure 4.15. The same scene rendered with shading and shadows from two point sources.
图 4.15.使用两个点源的阴影渲染的同一场景。
In the real world, some energy is lost when the light reflects from the surface, and this loss can be different for different colors. For example, gold reflects yellow more efficiently than blue, so it shifts the colors of the objects it reflects. This can be implemented by adding a recursive call in shade-ray that adds one more contribution after all the lights are accounted for:
在现实世界中,当光线从表面反射时,会损失一些能量,并且这种损失对于不同的颜色可能不同。例如,金色反射黄色的效率比蓝色高,因此它会改变反射物体的颜色。这可以通过在shade-ray中添加递归调用来实现,在考虑所有光线后再添加一个贡献:
Figure 4.16. When looking into a perfect mirror, the viewer looking in direction d will see whatever the viewer “below” the surface would see in direction r.
图 4.16.当观察一面完美的镜子时,朝方向d看的观察者将看到表面“下方”的观察者朝方向r看的任何东西。
where km (for “mirror reflection”) is the specular RGB color. We need to make sure to pass t0 = for the same reason as we did with shadow rays; we don’t want the reflection ray to hit the object that generates it.
其中k m (代表“镜面反射”)是镜面 RGB 颜色。我们需要确保传递t 0 =,原因与阴影光线相同;我们不希望反射光线击中产生它的物体。
The problem with the recursive call above is that it may never terminate. For example, if a ray starts inside a room, it will bounce forever. This can be fixed by adding a maximum recursion depth. The code will be more efficient if a reflection ray is generated only if km is not zero.
上述递归调用的问题是它可能永远不会终止。例如,如果一条射线从房间内开始,它将永远反弹。这可以通过添加最大递归深度来解决。如果仅当k m不为零时才生成反射射线,则代码将更高效。
Using a constant mirror reflection coefficient km gives a particular look characteristic of simple ray tracers (Figure 4.17); in the real world, this coefficient varies substantially depending on the incident angle. For better models, see Chapter 14.
使用恒定的镜面反射系数k m可呈现简单光线追踪器的特定外观特征(图 4.17 );在现实世界中,该系数会根据入射角而发生很大变化。有关更好的模型,请参阅第 14 章。
Figure 4.17. A simple scene rendered with shading, shadows, and mirror reflection. Both the floor and the blue sphere have nonzero mirror reflection coefficients.
图 4.17.使用阴影、明暗和镜面反射渲染的简单场景。地板和蓝色球体的镜面反射系数均为非零。
Ray tracing was developed early in the history of computer graphics (Appel, 1968) but was not used much until sufficient compute power was available (Kay & Greenberg, 1979; Whitted, 1980).
光线追踪是在计算机图形学历史的早期开发的(Appel,1968),但直到有足够的计算能力时才得到广泛应用(Kay & Greenberg,1979;Whitted,1980)。
Ray tracing has a lower asymptotic time complexity than basic object-order rendering (Snyder & Barr, 1987; Muuss, 1995; Parker et al., 1999; Wald, Slusallek, Benthin, & Wagner, 2001). Although it was traditionally thought of as an offline method, real-time ray tracing implementations are becoming more and more common.
光线追踪的渐近时间复杂度比基本对象顺序渲染更低(Snyder & Barr,1987;Muuss,1995;Parker 等,1999;Wald、Slusallek、Benthin & Wagner,2001)。尽管传统上认为它是一种离线方法,但实时光线追踪实现正变得越来越普遍。
Why is there no perspective matrix in ray tracing?
为什么光线追踪中没有透视矩阵?
The perspective matrix in a z-buffer exists so that we can turn the perspective projection into a parallel projection. This is not needed in ray tracing, because it is easy to do the perspective projection implicitly by fanning the rays out from the eye.
Z 缓冲区中的透视矩阵的存在,使得我们可以将透视投影转换为平行投影。这在光线追踪中是不需要的,因为通过将光线从眼睛散开,隐式地进行透视投影很容易。
Can ray tracing be made interactive?
光线追踪可以实现交互吗?
For sufficiently small models and images, any modern PC is sufficiently powerful for ray tracing to be interactive. In practice, multiple CPUs with a shared frame buffer are required for a full-screen implementation. Computer power is increasing much faster than screen resolution, and it is just a matter of time before conventional PCs can ray trace complex scenes at screen resolution.
对于足够小的模型和图像,任何现代 PC 都足够强大,可以实现光线追踪的交互。实际上,全屏实现需要多个具有共享帧缓冲区的 CPU。计算机能力的增长速度远远快于屏幕分辨率,传统 PC 能够以屏幕分辨率对复杂场景进行光线追踪只是时间问题。
Is ray tracing useful in a hardware graphics program?
光线追踪在硬件图形程序中有用吗?
Ray tracing is frequently used for picking. When the user clicks the mouse on a pixel in a 3D graphics program, the program needs to determine which object is visible within that pixel. Ray tracing is an ideal way to determine that.
光线追踪经常用于拾取。当用户在 3D 图形程序中单击某个像素时,程序需要确定该像素内可见的对象。光线追踪是确定该对象的理想方法。
1. What are the ray parameters of the intersection points between ray (1, 1, 1)+ t(–1, –1, –1) and the sphere centered at the origin with radius 1? Note: this is a good debugging case.
1 . 射线 (1, 1, 1)+ t ( – 1, – 1, – 1) 与以原点为中心、半径为 1 的球体的交点的射线参数是什么? 注意:这是一个很好的调试案例。
2. What are the barycentric coordinates and ray parameter where the ray (1, 1, 1) + t(–1, –1, –1) hits the triangle with vertices (1, 0, 0) , (0, 1, 0) , and (0, 0, 1) ? Note: this is a good debugging case.
2 . 当射线 (1, 1, 1) + t ( - 1, - 1, - 1) 与顶点为 (1, 0, 0) 、 (0, 1, 0) 和 (0, 0, 1) 的三角形相交时,重心坐标和射线参数是多少? 注意:这是一个很好的调试案例。
3. Do a back of the envelope computation of the approximate time complexity of ray tracing on “nice” (non-adversarial) models. Split your analysis into the cases of preprocessing and computing the image, so that you can predict the behavior of ray tracing multiple frames for a static model.
3.对“良好”(非对抗性)模型上光线追踪的近似时间复杂度进行粗略计算。将分析分为预处理和计算图像的情况,以便可以预测静态模型的光线追踪多帧的行为。
When we are rendering images of 3D scenes, whether by using ray tracing or rasterization, in real time or in batch processing, one of the key contributors to the visual impression of three-dimensionality is shading or coloring surfaces in the scene based on their shape and their relationship to other objects in the scene. In the physical world, most of the light we see is reflected light, and the physics of light reflection is strongly influenced by geometry, which produces a variety of cues that the human visual system makes very effective use of to understand shape.
当我们渲染 3D 场景的图像时,无论是使用光线追踪还是光栅化,无论是实时渲染还是批量处理,对三维视觉印象产生影响的关键因素之一就是根据场景中表面的形状及其与场景中其他物体的关系对其进行着色或着色。在物理世界中,我们看到的大部分光都是反射光,而光反射的物理性质受到几何学的强烈影响,这会产生各种线索,人类视觉系统可以非常有效地利用这些线索来理解形状。
In computer graphics, the purpose of shading is to provide these cues to the visual system, although the goals differ depending on the application. In computer-aided design or scientific visualization, the focus is on clarity: shading should be designed to provide the clearest, most accurate impression of 3D shape. On the other hand, in visual effects or advertising, the goal is to maximize the resemblance of renderings to the appearance of real objects. In animation, virtual environments, or games, the goals are somewhere in the middle: shading is meant to achieve artistic ends, which include depicting shape and material, but may not necessarily be intended to literally imitate reality.
在计算机图形学中,阴影的目的是为视觉系统提供这些提示,尽管目标因应用而异。在计算机辅助设计或科学可视化中,重点是清晰度:阴影的设计应提供最清晰、最准确的 3D 形状印象。另一方面,在视觉效果或广告中,目标是最大限度地提高渲染与真实物体外观的相似性。在动画、虚拟环境或游戏中,目标介于两者之间:阴影旨在实现艺术目的,包括描绘形状和材料,但不一定旨在完全模仿现实。
The equations used to compute shading are known as a shading model, and a range of different shading models have been developed for these different applications. Generally, they all begin with simple models that provide a useful approximation to the physics of light reflection. From this starting point, additional features can be added to achieve closer approximations to physics for realistic rendering, or some parts can be modified or left out to make models suitable for more abstract styles.
用于计算着色的方程称为着色模型,并且已经为这些不同的应用开发了一系列不同的着色模型。通常,它们都从简单的模型开始,这些模型为光反射的物理提供有用的近似值。从这个起点开始,可以添加其他功能以实现更接近物理的近似值以实现逼真的渲染,或者可以修改或省略某些部分以使模型适合更抽象的风格。
A shading model is quite independent of the rest of a rendering system, and the same models can be used in ray tracing and rasterization systems. This chapter describes a basic shading model for an opaque surface illuminated by a point light source. This model might be all we need for simple applications, and it forms the starting point for more advanced shading computations such as those discussed in Chapter 14.
着色模型与渲染系统的其余部分完全独立,相同的模型可用于光线追踪和光栅化系统。本章描述了由点光源照亮的不透明表面的基本着色模型。对于简单的应用程序,这个模型可能就是我们所需要的,它构成了更高级的着色计算(如第 14 章中讨论的那些)的起点。
In the real world, light falls on surfaces from all directions. But for modeling illumination, the simplest case is when light arrives from a single direction; this is always an idealization, but it makes a useful model for light sources that are small in proportion to to their distance from the surface, either because they are indeed small (for example, an LED flashlight) or because they are very far away (for example, the sun). Point-like sources come in two flavors: a point source is small enough to be treated as a point, but is close to the scene and can illuminate different surfaces differently; and a directional source is both small enough (relative to its distance) to be treated as point-like and also so far away that it illuminates all surfaces the same and there is no need to keep track of its location, only its direction. The flashlight and the sun are canonical examples of these two types of light sources.
在现实世界中,光线会从各个方向照射到表面上。但对于照明建模,最简单的情况是光线从单一方向照射;这始终是一种理想化情况,但它可以为与表面距离成比例的光源提供一个有用的模型,这些光源要么确实很小(例如 LED 手电筒),要么距离很远(例如太阳)。点状光源有两种类型:点状光源足够小,可以被视为一个点,但靠近场景,可以以不同的方式照亮不同的表面;点状光源定向光源既足够小(相对于其距离而言),可以视为点状光源,又足够远,可以以相同的方式照亮所有表面,因此无需跟踪其位置,只需跟踪其方向即可。手电筒和太阳是这两种光源的典型例子。
A point light source is described by its position, which is a point in 3D space, and its intensity, which describes the amount of light it produces. A point source can be isotropic, meaning the intensity is the same in all directions; this is normally the default, but many systems provide “spot lights” that only send light in some directions, which can be handy for controlling light in a virtual scene in the same way that a real spot light is useful for controlling light on a stage.
点光源由其位置(三维空间中的点)和强度(描述其产生的光量)来描述。点光源可以是各向同性的,即强度在所有方向上都是相同的;这通常是默认的,但许多系统提供只向某些方向发射光的“聚光灯”,这对于控制虚拟场景中的光线非常有用,就像真实的聚光灯可用于控制舞台上的光线一样。
When in doubt, make light sources neutral in color, with equal red, green, and blue intensities.
如有疑问,请使光源颜色为中性色,且红、绿、蓝强度相等。
Figure 5.1. Irradiance from a point source decreases with the square of distance.
图 5.1.点源的辐照度随距离的平方而减小。
For an isotropic point source, it’s easy to reason about how much light falls on a surface a certain distance away. Suppose we have a point source that emits one Watt of radiant power isotropically, and we place this source at the center of a hollow sphere with one meter radius (Figure 5.1). All the power from the light falls on the inside surface of the sphere, and it’s distributed uniformly over the whole surface area of 4π m2, so the density of radiant power per unit area is 1/(4π) Watts per square meter. This density is known as irradiance and is the right quantity to describe how much light is falling on a surface for the purposes of light reflection.
对于各向同性的点光源,很容易推断出有多少光照射到一定距离的表面上。假设我们有一个点光源,发射一瓦的辐射功率各向同性,我们将该光源放置在半径为 1 米的空心球体的中心(图 5.1 )。光的所有功率都落在球体的内表面上,并均匀分布在整个 4 π m 2的表面积上,因此每单位面积的辐射功率密度为 1 / (4 π ) 瓦特每平方米。该密度称为辐照度,是描述为了光反射而落在表面上的光量的正确量。
In the general case of a source that has power P and a receiving sphere of radius r, we find the irradiance E is
对于功率为P且接收球半径为r的光源,我们发现辐照度E为
The quantity I = P/(4π) is the intensity of the source; it is a property of the source itself that is independent of what surface it’s illuminating. The r–2 factor, often called the inverse square term, describes how irradiance depends on the distance r between the source and the surface.
量I = P/ (4 π ) 是光源的强度;它是光源本身的属性,与其照射的表面无关。r –2因子通常称为平方反比项,描述了辐照度如何取决于光源和表面之间的距离r 。
Figure 5.2. (a) A beam of light falls on a surface, illuminating the whole top of the cube. (b) The cube is tilted 60°, and now only half the light falls on the top surface of the cube; the area stays the same so the power per unit area, or irradiance, is halved.
图 5.2。 (a) 一束光落在一个表面上,照亮了立方体的整个顶部。 (b) 立方体倾斜 60°,现在只有一半的光落在立方体的顶面上;面积保持不变,因此单位面积的功率或辐照度减半。
One other important consideration in computing irradiance is the angle of incidence—the angle between the surface normal and the direction the light is traveling. Consider a small surface that is illuminated by a point source that is far away compared to the size of the surface. The light that falls on the surface is all travelling approximately parallel. If we tilt the surface to an angle of 60° as shown in Figure 5.2, the surface intercepts only half the light that it did when it was facing the source. In general, when rotated by an angle θ it intercepts an amount of light (radiant power) proportional to cos θ, and since the area stays the same, the irradiance (which, remember, is radiant power per unit area) is proportional to the same factor. This rule, that the irradiance on a surface falls off as the cosine of the incident angle, is known as Lambert’s cosine law because it was described by Johann Heinrich Lambert in his 1760 book Photometria.
计算辐照度时另一个重要的考虑因素是入射角——表面法线与光线传播方向之间的角度。考虑一个小表面,它被一个相对于表面尺寸较远的点源照亮。落在表面上的光线几乎都是平行传播的。如果我们将表面倾斜 60° 角(如图 5.2所示),表面截取的光线只有它面对光源时的一半。一般来说,当旋转角度θ时,它会截取与 cos θ成比例的光量(辐射功率),由于面积保持不变,辐照度(记住,是单位面积的辐射功率)与同一因子成比例。表面上的辐照度随着入射角的余弦而下降,这条规则称为朗伯余弦定律,因为它是由约翰·海因里希·朗伯 (Johann Heinrich Lambert) 在其 1760 年出版的《光度测量学》一书中描述的。
Putting this together with the formula, we just derived for irradiance on a surface facing exactly toward the source, we get the general formula for irradiance due to a point source,
将其与我们刚刚推导的公式结合起来,得到点源辐照度的一般公式,
The term cos θ/r2 can be called the geometry factor for a point source; it depends on the geometric relationship between source and receiving surface, but not on the specific properties of either one.
项 cos θ/r 2可以称为点源的几何因子;它取决于源和接收表面之间的几何关系,但不取决于其中一个的具体属性。
In practice, the angle θ is not normally computed, because given a unit vector n that is normal to the surface and a unit vector l that points toward the light (Figure 5.3), the cosine factor can be computed using the dot product
实际上,角度θ通常不需要计算,因为给定一个垂直于表面的单位向量n和一个指向光的单位向量l (图 5.3 ),余弦因子可以通过点积计算
Figure 5.3. A surface in a beam of light intercepts an amount of light proportional to the cosine of the angle between the light direction and the surface normal.
图 5.3。光束中的表面会拦截一定量光,该量与光线方向和表面法线之间的角度的余弦成比例。
which is simpler and more efficient than computations with trigonometric functions.
比三角函数计算更简单、更高效。
A directional source is a limiting case of a very bright, far-away point source. As the source gets farther and farther away, the ratio I/r2 in Equation 5.2 varies less and less over the scene, and for a directional source, we replace this with a constant, H:
定向源是极亮的远距离点源的极限情况。随着光源距离越来越远,公式 5.2 中的比率I/r 2在场景中的变化越来越小,对于定向源,我们用常数H代替它:
Note that this formula only holds when these two vectors have unit length!
注意,这个公式只有当这两个向量具有单位长度时才成立!
This constant can be called the normal irradiance since it is equal to the irradiance when the light is positioned along the surface normal. A directional source is characterized by the direction toward the source (rather than by a position) and by the normal irradiance H (rather than by an intensity). The illumination from a directional source is uniform and does not fall off with distance in the way that point source illumination does.
该常数可以称为法向辐照度,因为它等于光沿表面法线定位时的辐照度。定向源的特征在于朝向源的方向(而不是位置)和法向辐照度H (而不是强度)。定向源的照明均匀,不会像点源照明那样随着距离而减弱。
Now that we have the ability to compute irradiance, which describes how much light falls on an object, we come to the question of how the object reflects that light. This depends on the material the object is made out of, and in this chapter, we develop a basic model for a colored material with an optional shiny surface. The idea behind this model is shown in Figure 5.4: the material can have a base layer that determines the object’s overall color, and it can have a surface that provides a shiny, mirror-like reflection, and we will look at the simplest model for each.
现在我们已经有能力计算辐照度,它描述了有多少光照射到物体上,接下来我们要考虑的是物体如何反射这些光。这取决于物体的材质,在本章中,我们将为具有可选光泽表面的有色材料开发一个基本模型。该模型背后的想法如图 5.4所示:材料可以有一个决定物体整体颜色的底层,也可以有一个提供光泽、镜面反射的表面,我们将研究每种模型最简单的模型。
Figure 5.4. Specular reflection (a) happens at the top surface and reflects near the mirror direction; diffuse reflection (b) happens in the material below the surface and emerges in all directions.
图 5.4镜面反射 (a) 发生在顶面,并在镜面方向附近反射;漫反射 (b) 发生在表面下方的材料中,并向各个方向发出。
The very simplest kind of reflection is a surface that reflects light equally to all directions, regardless of where it came from, so that the reflected light Lr seen by the observer is simply a constant multiple of the irradiance:
最简单的反射是表面将光均匀地反射到所有方向,无论光来自何处,这样观察者看到的反射光L r只是辐照度的常数倍:
A surface that behaves this way is known as an ideal diffuse surface and appears the same brightness from all directions; its color is view independent and is completely described by its reflectance, R, which is the fraction of the irradiance it reflects. The coefficient relating reflected to incident light is R/π (the reason for the factor of π will have to wait for Chapter 14):
具有这种行为的表面被称为理想漫射表面,从各个方向看,其亮度相同;其颜色为视图独立,并完全由其描述反射率R是它反射的辐照度的分数。与入射光相关的反射系数是R/π ( π因子的原因要等到第 14 章才能揭晓):
The reflectance can be different for different colors of light, and for simple modeling of color, it suffices to just keep three different reflectances, one each for red, green, and blue, so this shading equation is carried out separately for the three color channels.
不同颜色的光的反射率可能不同,对于简单的颜色建模,只需保留三种不同的反射率(红、绿、蓝各一种)即可,因此该着色方程是针对三个颜色通道分别执行的。
Ideal diffuse shading, often called Lambertian shading because Lambert’s cosine law is the main effect it models, provides a flat, chalky appearance by itself. Physically, it models light that bounces around inside the material so that it “for-gets” where it came from and emerges randomly in all directions. It is an effective model for paper, flat paint, dirt, tree bark, stone, and other rough materials that don’t have a distinct and smooth enough top surface to produce noticeable shiny reflections.
理想漫反射着色通常称为朗伯着色,因为朗伯余弦定律是其模拟的主要效果,它本身就提供了一种平坦的白垩外观。从物理上讲,它模拟了在材料内部反射的光线,因此光线“忘记”了它来自哪里,并随机地向各个方向射出。对于纸张、平面油漆、泥土、树皮、石头和其他粗糙材料,这是一种有效的模型,这些材料的顶面不够明显和光滑,无法产生明显的闪亮反射。
Precise prediction of color is a bit more complex; see Chapter 18.
精确预测颜色稍微复杂一些;参见第 18 章。
Many materials have some degree of shininess to them—for example, metals, plastics, gloss or semi-gloss paints, or many leaves of plants. When you look at these materials, you see reflections that move around when you move your viewpoint; you could describe their color as being view-dependent in contrast to the view-independent color of a Lambertian surface. The view-dependent part of the reflection generally happens at the top surface of the material and is known as specular reflection.
许多材料都具有一定程度的光泽,例如金属、塑料、光泽或半光泽的油漆,或许多植物的叶子。当您观察这些材料时,您会看到反射会随着视点的移动而移动;您可以将其颜色描述为与视图相关的,与 Lambertian 表面的与视图无关的颜色形成对比。反射中与视图相关的部分通常发生在材料的顶面上,称为镜面反射。
The simplest kind of specular reflection happens at perfectly smooth surfaces like a mirror or the surface of water: light reflects in a mirrorlike way so that light coming from a point source goes in exactly one direction. This is known as ideal specular reflection and generally needs to be handled as a special case. But many surfaces are not perfectly smooth, and they exhibit a more general kind of reflection known in computer graphics as glossy reflection. There are many models for glossy reflection, and better ones are discussed in Chapter 14, but a simple and well-known model was originally proposed by Phong (1975) and later updated by Blinn (1976) and others to the form most commonly used today, known as the Modified Blinn–Phong model.
最简单的镜面反射发生在镜子或水面等完全光滑的表面上:光以类似镜子的方式反射,因此来自点光源的光会朝一个方向传播。这被称为镜面反射是理想的,通常需要作为特殊情况处理。但许多表面并非完全光滑,它们表现出一种更普遍的反射,在计算机图形学中称为光泽反射。光泽反射有许多模型,第 14 章将讨论更好的模型,但最初由 Phong (1975) 提出了一个简单而著名的模型,后来由 Blinn (1976) 和其他人更新为今天最常用的形式,称为改进的Blinn-Phong模型。
Since specular reflection is view dependent, it is a function of the view vector v that points from the shading point toward the viewer, as well as the normal vector n and light direction l. The idea is to produce reflection that is at its brightest when v and l are symmetrically positioned across the surface normal, which is when mirror reflection would occur; the reflection then decreases smoothly as the vectors move away from a mirror configuration.
由于镜面反射与视角相关,因此它是从着色点指向观察者的视线向量v ,以及法线向量n和光线方向l 。这样做的目的是当v和l对称地位于表面法线上时,产生最亮的反射,此时会发生镜面反射;然后,随着向量远离镜面配置,反射会平稳地减弱。
We can tell how close we are to a mirror configuration using the idea of a half vector, which is the vector halfway between the viewing and illumination directions and is perpendicular to the surface exactly when l and v are in a mirror reflection configuration (Figure 5.5). If the half vector is near the surface normal, the specular component should be bright; if it is far away, it should be dim. We measure the nearness of h and n by computing their dot product (remember they are unit vectors, so n · h reaches its maximum of 1 when the vectors are equal) and then take the result to a power p > 1 to make it decrease faster:
我们可以使用半向量的概念来判断我们与镜面结构的接近程度,半向量是视线和照明方向之间的中点向量,当l和v处于镜面反射结构时,该向量恰好垂直于表面(图 5.5 )。如果半向量接近表面法线,镜面反射分量应该是明亮的;如果距离较远,镜面反射分量应该是暗淡的。我们通过计算h和n的点积来测量它们的接近程度(记住它们是单位向量,因此当向量相等时, n · h达到最大值 1),然后将结果取p>1次幂以使其下降得更快:
Figure 5.5. Geometry for Blinn–Phong shading.
图 5.5. Blinn–Phong 着色的几何形状。
(n · h)p
( n · h ) p
The Phong exponent, p, controls the apparent shininess of the surface: higher values make the reflection fall off faster away from the mirror direction, leading to a shinier appearance. The half vector itself is easy to compute: since v and l are the same length, their sum is a vector that bisects the angle between them, which only needs to be normalized to produce h:
这冯氏指数p控制表面的表观光泽度:值越高,反射在远离镜面方向的速度越快,从而产生更闪亮的外观。半向量本身很容易计算:由于v和l 的长度相同,它们的和是一个平分它们之间角度的向量,只需对其进行归一化即可得出h :
Typical values of p:
p的典型值:
10—“eggshell”;
10——“蛋壳”;
100—mildly shiny;
100——略有光泽;
1000—really glossy;
1000——非常有光泽;
10,000—nearlymirror-like.
10,000——几乎像镜子一样。
To incorporate the Blinn–Phong idea into a shading computation, we add a specular component to Lambertian shading; the Lambertian part is then the diffuse component. We simply generalize the factor k from (5.3) that relates reflected light to incident irradiance to include not just the contribution of diffuse reflection but also a separate term that adds in specular reflection:
为了将 Blinn–Phong 的思想融入到着色计算中,我们在 Lambertian 着色中添加了一个镜面反射分量;然后 Lambertian 部分就是漫反射分量。我们简单地将 (5.3) 中的因子k推广到将反射光与入射辐照度联系起来的程度,使其不仅包括漫反射的贡献,还包括一个单独的项,即镜面反射:
where the scale factor ks is the specular coefficient (separate for red, green, and blue) and controls how bright the specular component is, and we have added a clamping operation to avoid surprises for corner cases in which n faces away from h.
其中比例因子k s是镜面反射系数(红色、绿色和蓝色分开),控制镜面反射分量的亮度,并且我们添加了一个限制操作,以避免在n背对h 的极端情况下出现意外。
When in doubt, for surfaces that also have a diffuse color, make the specular coefficient neutral in color, with equal red, green, and blue values.
如有疑问,对于也具有漫反射颜色的表面,请使镜面反射系数的颜色为中性,并具有相同的红色、绿色和蓝色值。
The expression that generalizes the factor k is called the bidirectional reflectance distribution function or BRDF, because it describes how the reflectance varies as a function of the two directions l and v. The BRDF of a Lambertian surface is constant, but the BRDF of a surface that has specular reflection is not. The shading calculation then boils down to computing the irradiance (describing how much light is available to reflect) and the BRDF (describing how the surface reflects it), and then multiplying them. The BRDF is discussed more completely in Chapter 14.
概括因子k 的表达式称为双向反射分布函数或 BRDF,因为它描述了反射率如何随两个方向l和v而变化。朗伯表面的 BRDF 是恒定的,但具有镜面反射的表面的 BRDF 不是。着色计算归结为计算辐照度(描述有多少光可供反射)和 BRDF(描述表面如何反射它),然后将它们相乘。第 14 章将更全面地讨论 BRDF。
When implementing surface shading, the code needs to have access to information about the light source, the surface, and the viewing direction. Writing clean code that supports both point and directional lights is easiest to do by separating the calculation of irradiance from the calculation of reflected light. Irradiance depends only on the light source and the surface geometry, and once it’s known, calculating the reflected light only depends on the surface properties and the viewing geometry.
在实现表面着色时,代码需要访问有关光源、表面和观察方向的信息。通过将辐照度计算与反射光计算分开,编写支持点光源和定向光源的简洁代码是最容易做到的。辐照度仅取决于光源和表面几何形状,一旦知道了辐照度,计算反射光仅取决于表面属性和观察几何形状。
Basic shading calculations can be done in exactly the same way in ray tracing and rasterization systems; it’s really only how the inputs are computed that varies. To compute irradiance, we need
在光线追踪和光栅化系统中,基本着色计算可以以完全相同的方式进行;实际上只是输入的计算方式有所不同。要计算辐照度,我们需要
The shading point x, a 3D point on a surface
着色点x ,表面上的 3D 点
The surface normal n perpendicular to the surface at x
垂直于x处表面的表面法线n
The light source position p for a point light or its direction l for a directional light
点光源的光源位置p或定向光的方向l
The light source intensity I for a point light or its normal irradiance H for a directional light (these are RGB colors).
点光源的光源强度I或其定向光的法向辐照度H (这些是 RGB 颜色)。
For a point light, we need to compute the distance and the light direction, which are both simple to get from the vector p – x:
对于点光源,我们需要计算距离和光线方向,这两个都可以从向量p – x轻松获取:
and for both types of lights, the cosine factor is best computed using a dot product; as long as n and l are unit vectors,
对于这两种类型的光,最好使用点积来计算余弦因子;只要n和l是单位向量,
In practice, it’s a good idea when computing irradiance to clamp the dot product at zero to make sure that even if in some cases, you find the light direction is facing away from the surface normal, you won’t get negative shading. This leads to what we view as the official equations for computing irradiance:
在实践中,计算辐照度时将点积限制为零是一个好主意,以确保即使在某些情况下,你发现光线方向背离表面法线,你也不会得到负着色。这导致了我们所认为的计算辐照度的官方方程式:
Once the irradiance is known, it needs to be multiplied by the BDRF value, and the ingredients for calculating that value are
一旦知道辐照度,就需要将其乘以 BDRF 值,计算该值的要素为
This can happen with interpolated normals (Section 9.2.4)
这可能发生在插值法线上(第 9.2.4 节)
The light direction l, a unit vector pointing from x toward the light (already computed as part of the irradiance calculation)
光方向l ,从x指向光的单位向量(已作为辐照度计算的一部分计算)
The viewing direction v, a unit vector pointing from x toward the viewer
观察方向v是从x指向观察者的单位向量
The parameters describing the properties of the surface material. For this chapter’s model, this includes R, ks, and p.
描述表面材料特性的参数。对于本章的模型,这包括R 、 k s和p 。
How you get these quantities differs substantially between ray tracing and rasterization systems, but the actual shading calculation itself is the same. Don’t forget that v, l, and n all must be unit vectors; failing to normalize these vectors is a very common error in shading computations.
在光线追踪和光栅化系统之间,获取这些量的方法有很大不同,但实际的着色计算本身是相同的。不要忘记v 、 l和n都必须是单位向量;无法将这些向量标准化是着色计算中非常常见的错误。
Figure 5.6. Ambient illumination adds some light that arrives from all directions equally.
图 5.6.环境照明添加了来自各个方向均匀照射的光。
Point-like sources are models for very localized sources that produce a lot of light near one direction. Other kinds of light sources are not so localized—for instance the sky, or the light reflected from the walls of a room. While such extended sources can be modeled in great detail, for basic shading we need a really simple approximation, so we make the assumption of ambient light that is exactly the same in all directions and at all locations in the scene (Figure 5.6). We further assume ambient light is only reflected diffusely (since there is no light direction and therefore no way to compute specular shading). This makes ambient shading very simple: it is a constant!
点光源是极局部光源的模型,它们在一个方向附近产生大量光。其他类型的光源没有那么局部化——例如天空,或者从房间墙壁反射的光。虽然这种扩展光源可以非常详细地建模,但对于基本着色,我们需要一个非常简单的近似值,因此我们假设环境光在场景的所有方向和所有位置都完全相同(图 5.6 )。我们进一步假设环境光仅被漫反射(因为没有光方向,因此无法计算镜面着色)。这使得环境着色非常简单:它是一个常数!
Normally, this constant is factored into the product of a material-related ambient reflection coefficient ka and a light-related ambient intensity Ia:
通常情况下,这个常数被计入与材料相关的产品中环境反射系数 k a和光相关的环境强度 I a :
really, ka ought to be called reflectance and Ia ought to be called radiance, but this is not the usual nomenclature.
实际上, k a应该被称为反射率,而i a应该被称为辐射率,但这不是通常的命名法。
Both these quantities are colored, so they are multiplied componentwise (the ambient coefficient for red scales the red ambient intensity). This arrangement makes it convenient to tune ambient shading per object and in the scene as a whole.
这两个量都是彩色的,因此它们按分量相乘(红色的环境系数缩放红色环境强度)。这种安排使得调整每个物体和整个场景的环境阴影变得方便。
Ambient shading is a bit of a hack, since lighting from large extended light sources does still vary: it tends to be darker in corners and other concave areas. But it is an important part of simple shading setups because it prevents shadows from being completely black and allows an easy way to tweak overall scene contrast.
环境光着色有点儿像 hack,因为大型扩展光源的照明仍然会发生变化:角落和其他凹陷区域往往较暗。但它是简单着色设置的重要组成部分,因为它可以防止阴影完全变黑,并允许轻松调整整体场景对比度。
When in doubt, set the ambient color to be the same as the diffuse color and the ambient intensity to a neutral color.
如有疑问,请将环境光颜色设置为与漫反射颜色相同,并将环境光强度设置为中性色。
Many systems treat ambient light as a type of light source that appears in a list with point and directional lights; other systems make the ambient intensity a parameter of the scene so that there is no explicit light source for ambient, which is the same as assuming there is always exactly one ambient light.
许多系统将环境光视为一种光源,出现在点光源和定向光源的列表中;其他系统将环境光强度作为场景的一个参数,因此环境光没有明确的光源,这与假设始终只有一个环境光相同。
Phong shading seems like an enormous hack. Is that true?
Phong 着色看起来是个大难题。是真的吗?
Yes. It is not a very good model if you are trying to match measurements of real surfaces. However, it is simple and has proven to produce shading that is very useful in practice. Applications that are looking for realistic shading are moving away from Phong shading to more complex but much more accurate models based on microfacet theory (Walter, Marschner, Li, & Torrance, 2007). But realism also absolutely requires going beyond point-like light sources. All this is discussed in Chapter 14.
是的。如果您尝试匹配真实表面的测量值,那么这不是一个很好的模型。但是,它很简单,并且已被证明可以产生在实践中非常有用的阴影。寻求逼真阴影的应用程序正在从 Phong 阴影转向基于微面理论的更复杂但更准确的模型(Walter、Marschner、Li 和 Torrance,2007 年)。但现实主义也绝对需要超越点状光源。所有这些都在第 14 章中讨论。
I hate calling pow(). Is there a way to avoid it when doing Phong lighting?
我讨厌调用 pow()。有没有办法在进行 Phong 照明时避免调用它?
A simple way is to only have exponents that are themselves a power of two, i.e., 2, 4, 8, 16, .... In practice, this is not a problematic restriction for most applications. Many systems designed for fast graphics calculations have library functions for pow() that are much faster and slightly less accurate than the ones found in standard math libraries.
一种简单的方法是只使用本身是 2 的幂的指数,即 2、4、8、16……实际上,这对于大多数应用程序来说都不是一个有问题的限制。许多为快速图形计算而设计的系统都有 pow() 库函数,这些函数比标准数学库中的函数快得多,但准确度略低。
1. The moon is poorly approximated by both diffuse and Phong shading. What observations tell you that this is true?
1 . 月球的散射和 Phong 阴影效果都不太好。哪些观察结果告诉你这是真的?
2. Velvet is poorly approximated by both diffuse and Phong shading. What observations tell you that this is true?
2.天鹅绒在漫反射和 Phong 着色中都表现不佳。哪些观察结果告诉你这是真的?
3. Why do most highlights on plastic objects look white, while those on gold metal look gold?
3.为什么塑料物体上的高光大部分看起来是白色的,而金色金属上的高光却看起来是金色的?
Perhaps, the most universal tools of graphics programs are the matrices that change or transform points and vectors. In the next chapter, we will see how a vector can be represented as a matrix with a single column, and how the vector can be represented in a different basis via multiplication with a square matrix. We will also describe how we can use such multiplications to accomplish changes in the vector such as scaling, rotation, and translation. In this chapter, we review basic linear algebra from a geometric perspective, focusing on intuition and algorithms that work well in the two- and three-dimensional case.
也许,图形程序最通用的工具是改变或变换点和向量的矩阵。在下一章中,我们将了解如何将向量表示为单列矩阵,以及如何通过与方阵相乘将向量表示为不同的基。我们还将描述如何使用此类乘法来实现向量的改变,例如缩放、旋转和平移。在本章中,我们将从几何角度回顾基本的线性代数,重点介绍在二维和三维情况下效果良好的直觉和算法。
This chapter can be skipped by readers comfortable with linear algebra. However, there may be some enlightening tidbits even for such readers, such as the development of determinants and the discussion of singular and eigenvalue decomposition.
熟悉线性代数的读者可以跳过本章。不过,即使对于这样的读者来说,本章也可能会有一些启发性的内容,例如行列式的发展以及对奇异值和特征值分解的讨论。
Figure 6.1. The signed area of the parallelogram is |ab|, and in this case the area is positive.
图 6.1。平行四边形的符号面积为 | ab |,在这种情况下面积为正。
We usually think of determinants as arising in the solution of linear equations. However, for our purposes, we will think of determinants as another way to multiply vectors. For 2D vectors a and b, the determinant |ab| is the area of the parallelogram formed by a and b (Figure 6.1). This is a signed area, and the sign is positive if a and b are right-handed and negative if they are left-handed. This means |ab| = –|ba|. In 2D, we can interpret “right-handed” as meaning we rotate the first vector counterclockwise to close the smallest angle to the second vector. In 3D, the determinant must be taken with three vectors at a time. For three 3D vectors, a, b, and c, the determinant |abc| is the signed volume of the parallelepiped (3D parallelogram; a sheared 3D box) formed by the three vectors (Figure 6.2). To compute a 2D determinant, we first need to establish a few of its properties. We note that scaling one side of a parallelogram scales its area by the same fraction (Figure 6.3):
我们通常认为行列式是在解线性方程时产生的。然而,为了我们的目的,我们将行列式看作是向量相乘的另一种方式。对于二维向量a和b ,行列式 | ab | 是a和b所构成的平行四边形的面积(图 6.1 )。这是一个有符号的面积,如果a和b是右旋的,则符号为正,如果是左旋的,则符号为负。这意味着 | ab | = –| ba |。在二维中,我们可以将“右旋”解释为将第一个向量逆时针旋转,以闭合与第二个向量的最小角度。在三维中,行列式必须同时用三个向量来求。对于三个三维向量a 、 b和c ,行列式 | abc | 是这三个向量所构成的平行六面体(三维平行四边形;剪切的三维盒子)的有符号体积(图 6.2 )。要计算二维行列式,我们首先需要确定它的一些属性。我们注意到,缩放平行四边形的一边会使其面积缩放相同的比例(图 6.3 ):
Figure 6.2. The signed volume of the parallelepiped shown is denoted by the determinant |abc|, and in this case the volume is positive because the vectors form a righthanded basis.
图 6.2。所示平行六面体的有符号体积用行列式 |abc| 表示,在这种情况下体积为正,因为向量形成右手系基。
Also, we note that “shearing” a parallelogram does not change its area (Figure 6.4):
另外,我们注意到“剪切”平行四边形不会改变其面积(图 6.4 ):
Finally, we see that the determinant has the following property:
最后,我们看到行列式具有以下性质:
because as shown in Figure 6.5, we can “slide” the edge between the two parallelograms over to form a single parallelogram without changing the area of either of the two original parallelograms.
因为如图 6.5所示,我们可以“滑动”两个平行四边形之间的边来形成一个平行四边形,而不会改变两个原始平行四边形的面积。
Now let’s assume a Cartesian representation for a and b:
现在让我们假设a和b 的笛卡尔表示:
This simplification uses the fact that |vv| = 0 for any vector v, because the parallelograms would all be collinear with v and thus without area.
此简化利用了以下事实:对于任何向量v , | vv | = 0,因为平行四边形都与v共线,因此没有面积。
Figure 6.3. Scaling a parallelogram along one direction changes the area in the same proportion.
图 6.3.沿一个方向缩放平行四边形会以相同比例改变面积。
In three dimensions, the determinant of three 3D vectors a, b,and c is denoted |abc|. With Cartesian representations for the vectors, there are analogous rules for parallelepipeds as there are for parallelograms, and we can do an analogous expansion as we did for 2D:
在三维空间中,三个三维向量a 、 b和c的行列式表示为|abc |。在向量的笛卡尔表示中,平行六面体有与平行四边形类似的规则,我们可以像在二维空间中一样进行类似的扩展:
As you can see, the computation of determinants in this fashion gets uglier as the dimension increases. We will discuss less error-prone ways to compute determinants in Section 6.3.
如您所见,随着维度的增加,以这种方式计算行列式会变得越来越丑陋。我们将在第 6.3 节中讨论计算行列式的不易出错的方法。
Figure 6.4. Shearing a parallelogram does not change its area. These four parallelograms have the same length base and thus the same area.
图 6.4.剪切平行四边形不会改变其面积。这四个平行四边形的底边长度相同,因此面积也相同。
Example 2 Determinants arise naturally when computing the expression for one vector as a linear combination of two others—for example, if we wish to express a vector c as a combination of vectors a and b:
例 2当计算一个向量的表达式作为另外两个向量的线性组合时,行列式自然出现——例如,如果我们希望将向量c表示为向量a和b的组合:
Figure 6.5. The geometry behind Equation 6.1. Both of the parallelograms on the left can be sheared to cover the single parallelogram on the right.
图 6.5。公式 6.1 背后的几何形状。左侧的两个平行四边形都可以剪切以覆盖右侧的单个平行四边形。
Figure 6.6. On the left, the vector c can be represented using two basis vectors as aca + bcb. On the right, we see that the parallelogram formed by a and c is a sheared version of the parallelogram formed by bcb and a.
图 6.6。在左侧,向量 c 可以使用两个基向量表示为a c a + b c b 。在右侧,我们可以看到由a和c形成的平行四边形是b c b和a 形成的平行四边形的剪切版本。
We can see from Figure 6.6 that
从图 6.6中我们可以看出
because these parallelograms are just sheared versions of each other. Solving for bc yields
因为这些平行四边形只是彼此的剪切版本。求解b c可得出
An analogous argument yields
类似的论证得出
This is the two-dimensional version of Cramer’s rule which we will revisit in Section 6.3.2.
这是二维版本的克莱姆规则,我们将在第 6.3.2 节中重新讨论。
A matrix is an array of numeric elements that follow certain arithmetic rules. An example of a matrix with two rows and three columns is
矩阵是遵循特定算术规则的数字元素数组。具有两行三列的矩阵的示例如下
Matrices are frequently used in computer graphics for a variety of purposes including representation of spatial transforms. For our discussion, we assume the elements of a matrix are all real numbers. This chapter describes both the mechanics of matrix arithmetic and the determinant of “square” matrices, i.e., matrices with the same number of rows as columns.
矩阵在计算机图形学中经常用于各种目的,包括表示空间变换。在我们的讨论中,我们假设矩阵的元素都是实数。本章描述了矩阵算法的机制和“方阵”的行列式,即行数与列数相同的矩阵。
A matrix times a constant results in a matrix where each element has been multiplied by that constant, e.g.,
矩阵乘以一个常数将得到一个矩阵,其中每个元素都乘以该常数,例如,
For matrix multiplication, we “multiply” rows of the first matrix with columns of the second matrix:
对于矩阵乘法,我们将第一个矩阵的行与第二个矩阵的列“相乘”:
For matrix multiplication, we “multiply” rows of the first matrix with columns of the second matrix:
对于矩阵乘法,我们将第一个矩阵的行与第二个矩阵的列“相乘”:
So the element pij of the resulting product is
因此所得乘积的元素p ij为
Taking a product of two matrices is only possible if the number of columns of the left matrix is the same as the number of rows of the right matrix. For example,
只有当左矩阵的列数与右矩阵的行数相同时,才有可能将两个矩阵相乘。例如,
Matrix multiplication is not commutative in most instances:
在大多数情况下,矩阵乘法不满足交换律:
Also, if AB = AC, it does not necessarily follow that B = C. Fortunately, matrix multiplication is associative and distributive:
另外,如果AB = AC ,也不一定B = C 。幸运的是,矩阵乘法是结合律和分配律:
We would like a matrix analog of the inverse of a real number. We know the inverse of a real number x is 1/x and that the product of x and its inverse is 1. We need a matrix I that we can think of as a “matrix one.” This exists only for square matrices and is known as the identity matrix; it consists of ones down the diagonal and zeroes elsewhere. For example, the four by four identity matrix is
我们想要一个实数逆矩阵的类似物。我们知道实数x的逆是 1 /x , x与其逆的乘积是 1。我们需要一个矩阵I ,我们可以将其视为“矩阵一”。这只存在于方阵中,称为单位矩阵;它由对角线上的 1 和其他地方的 0 组成。例如,四乘四的单位矩阵是
The inverse matrix A–1 of a matrix A is the matrix that ensures AA– 1 = I. For example,
这矩阵A的逆矩阵A –1是保证AA – 1 = I 的矩阵。例如,
Note that the inverse of A-1 is A. So AA-1 = A-1A = I. The inverse of a product of two matrices is the product of the inverses, but with the order reversed:
请注意, A -1的逆是A 。所以AA -1 = A -1 A = I 。两个矩阵乘积的逆是逆的乘积,但顺序相反:
We will return to the question of computing inverses in Section 6.3.
我们将在第 6.3 节中回到计算逆的问题。
The transpose AT of a matrix A has the same numbers, but the rows are switched with the columns. If we label the entries of AT as aij,then
这矩阵A的转置A T具有相同的数字,但行与列交换。如果我们将A T的元素标记为ij ,然后
For example,
例如,
The transpose of a product of two matrices obeys a rule similar to Equation (6.4):
两个矩阵乘积的转置遵循与公式 (6.4) 类似的规则:
The determinant of a square matrix is simply the determinant of the columns of the matrix, considered as a set of vectors. The determinant has several nice relationships to the matrix operations just discussed, which we list here for reference:
方阵的行列式只是矩阵列的行列式,被视为一组向量。行列式与刚刚讨论的矩阵运算有几种很好的关系,我们在此列出以供参考:
In graphics, we use a square matrix to transform a vector represented as a matrix. For example, if you have a 2D vector a = (xa, ya) and want to rotate it by 90 degrees about the origin to form vector a = (–ya, xa) , you can use a product of a 2 × 2 matrix and a 2 × 1 matrix, called a column vector. The operation in matrix form is
在图形学中,我们使用方阵来变换以矩阵表示的向量。例如,如果你有一个二维向量a = ( x a , y a ) ,并想将其绕原点旋转 90 度以形成向量a = (– y a , x a ) ,则可以使用 2 × 2 矩阵和 2 × 1 矩阵的乘积(称为列向量) 。矩阵形式的运算为
We can get the same result by using the transpose of this matrix and multiplying on the left (“premultiplying”) with a row vector:
我们可以使用该矩阵的转置并在左侧乘以(“预乘”)行向量来获得相同的结果:
These days, postmultiplication using column vectors is fairly standard, but in many older books and systems, you will run across row vectors and premultiplication. The only difference is that the transform matrix must be replaced with its transpose.
如今,使用列向量进行后乘法已相当标准,但在许多较旧的书籍和系统中,您会遇到行向量和预乘法。唯一的区别是必须用其转置矩阵替换变换矩阵。
We also can use matrix formalism to encode operations on just vectors. If we consider the result of the dot product as a 1 × 1 matrix, it can be written
我们还可以使用矩阵形式来编码向量上的运算。如果我们将点积的结果视为 1 × 1 矩阵,则可以写成
For example, if we take two 3D vectors we get
例如,如果我们取两个三维向量,则得到
A related vector product is the outer product between two vectors, which can be expressed as a matrix multiplication with a column vector on the left and a row vector on the right: abT. The result is a matrix consisting of products of all pairs of an entry of a with an entry of b. For 3D vectors, we have
相关的向量积是两个向量之间的外积,可以表示为矩阵乘法,左侧是列向量,右侧是行向量: ab T 。结果是一个矩阵,由a 元素与b元素的所有对的乘积组成。对于三维向量,我们有
It is often useful to think of matrix multiplication in terms of vector operations. To illustrate using the three-dimensional case, we can think of a 3 × 3 matrix as a collection of three 3D vectors in two ways: either it is made up of three column vectors side-by-side or it is made up of three row vectors stacked up. For instance, the result of a matrix-vector multiplication y = Ax can be interpreted as a vector whose entries are the dot products of x with the rows of A. Naming these row vectors ri, we have
用向量运算来思考矩阵乘法通常很有用。为了使用三维情况进行说明,我们可以将 3 × 3 矩阵视为三个三维向量的集合,有两种方式:要么由并排的三个列向量组成,要么由堆叠的三个行向量组成。例如,矩阵向量乘法y = Ax的结果可以解释为一个向量,其元素是x与A的行的点积。将这些行向量命名为r ,我们有
Alternatively, we can think of the same product as a sum of the three columns ci of A, weighted by the entries of x:
或者,我们可以将相同的乘积视为 A 的三列c的总和,并由x的条目加权:
Using the same ideas, one can understand a matrix–matrix product AB as an array containing the pairwise dot products of all rows of A with all columns of B (cf. (6.2)); as a collection of products of the matrix A with all the column vectors of B, arranged left to right; as a collection of products of all the row vectors of A with the matrix B, stacked top to bottom; or as the sum of the pairwise outer products of all columns of A with all rows of B. (See Exercise 8.)
使用相同的思想,我们可以将矩阵乘积AB理解为一个包含A的所有行与B的所有列的成对点积的数组(参见(6.2));包含矩阵A与B所有列向量的乘积的集合,从左到右排列;包含矩阵A与矩阵B所有行向量的乘积的集合,从上到下堆叠;或者包含A的所有列与B的所有行的成对外积之和。(参见练习 8。)
These interpretations of matrix multiplication can often lead to valuable geometric interpretations of operations that may otherwise seem very abstract.
这些对矩阵乘法的解释通常可以导致对那些原本看起来非常抽象的运算的有价值的几何解释。
The identity matrix is an example of a diagonal matrix, where all nonzero elements occur along the diagonal. The diagonal consists of those elements whose column index equals the row index counting from the upper left.
单位矩阵是对角矩阵,其中所有非零元素都出现在对角线上。对角线由列索引等于从左上角开始数的行索引的元素组成。
The identity matrix also has the property that it is the same as its transpose. Such matrices are called symmetric.
单位矩阵还具有与其转置矩阵相同的性质。这样的矩阵称为对称矩阵。
The identity matrix is also an orthogonal matrix, because each of its columns considered as a vector has length 1 and the columns are orthogonal to one another. The same is true of the rows (see Exercise 2). The determinant of any orthogonal matrix is either +1 or –1.
单位矩阵也是一个正交矩阵,因为其每列被视为一个向量,长度为 1,并且列彼此正交。行也是如此(参见练习 2)。任何正交矩阵的行列式都是 +1 或-1 。
The idea of an orthogonal matrix corresponds to the idea of an orthonormal basis, not just a set of orthogonal vectors—an unfortunate glitch in terminology.
正交矩阵的概念对应于正交基的概念,而不仅仅是一组正交向量——这是一个术语上的错误。
A very useful property of orthogonal matrices is that they are nearly their own inverses. Multiplying an orthogonal matrix by its transpose results in the identity,
正交矩阵的一个非常有用的性质是它们几乎是自己的逆。将正交矩阵乘以其转置可得到恒等式,
This is easy to see because the entries of RTR are dot products between the columns of R. Off-diagonal entries are dot products between orthogonal vectors, and the diagonal entries are dot products of the (unit-length) columns with themselves.
这很容易看出,因为R T R的元素是R各列之间的点积。非对角线元素是正交向量之间的点积,对角线元素是(单位长度)列与其自身的点积。
Example 3 The matrix
例 3矩阵
is diagonal, and therefore symmetric, but not orthogonal (the columns are orthogonal but they are not unit length).
是对角线的,因此是对称的,但不正交(列是正交的,但它们不是单位长度)。
The matrix
矩阵
is symmetric, but not diagonal or orthogonal.
是对称的,但不是对角线或正交的。
The matrix
矩阵
is orthogonal, but neither diagonal nor symmetric.
是正交的,但既不是对角线也不是对称的。
Recall from Section 6.1 that the determinant takes n n-dimensional vectors and combines them to get a signed n-dimensional volume of the n-dimensional parallelepiped defined by the vectors. For example, the determinant in 2D is the area of the parallelogram formed by the vectors. We can use matrices to handle the mechanics of computing determinants.
回想一下第 6.1 节,行列式取nn维向量并将它们组合起来,得到由向量定义的n维平行六面体的有符号n维体积。例如,二维中的行列式是向量形成的平行四边形的面积。我们可以使用矩阵来处理计算行列式的机制。
If we have 2D vectors r and s, we denote the determinant |rs|; this value is the signed area of the parallelogram formed by the vectors. Suppose we have two 2D vectors with Cartesian coordinates (a, b) and (A, B) (Figure 6.7). The determinant can be written in terms of column vectors or as a shorthand:
如果我们有二维向量r和s ,我们用 | rs | 表示行列式;这个值是向量形成的平行四边形的有符号面积。假设我们有两个二维向量,其笛卡尔坐标分别为 ( a,b ) 和 ( A,B )(图 6.7 )。行列式可以写成列向量的形式,也可以简写为:
Note that the determinant of a matrix is the same as the determinant of its transpose:
请注意,矩阵的行列式与其转置矩阵的行列式相同:
Figure 6.7. The 2D determinant in Equation 6.8 is the area of the parallelogram formed by the 2D vectors.
图 6.7.公式 6.8 中的二维行列式是二维向量形成的平行四边形的面积。
This means that for any parallelogram in 2D, there is a “sibling” parallelogram that has the same area but a different shape (Figure 6.8). For example, the parallelogram defined by vectors (3, 1) and (2, 4) has area 10, as does the parallelogram defined by vectors (3, 2) and (1, 4) .
这意味着,对于二维中的任何平行四边形,都有一个“兄弟”平行四边形,其面积相同,但形状不同(图 6.8 )。例如,由向量 (3, 1) 和 (2, 4) 定义的平行四边形的面积为 10,由向量 (3, 2) 和 (1, 4) 定义的平行四边形的面积也为 10。
Example 4 The geometric meaning of the 3D determinant is helpful in seeing why certain formulas make sense. For example, the equation of the plane through the points (xi, yi, zi) for i = 0, 1, 2 is
例 4三维行列式的几何意义有助于理解某些公式的意义。例如,对于i = 0, 1, 2,通过点 (x, y, z) 的平面方程为
Figure 6.8. The sibling parallelogram has the same area as the parallelogram in Figure 6.7.
图 6.8.兄弟平行四边形的面积与图 6.7中的平行四边形相同。
Each column is a vector from point (xi ,yi ,zi) to point (x, y, z) . The volume of the parallelepiped with those vectors as sides is zero only if (x, y, z) is coplanar with the three other points. Almost all equations involving determinants have similarly simple underlying geometry.
每列都是从点 (x ,y ,z) 到点 ( x, y, z ) 的一个向量。仅当 ( x, y, z ) 与其他三个点共面时,以这些向量为边的平行六面体的体积才为零。几乎所有涉及行列式的方程都有类似的简单基础几何。
As we saw earlier, we can compute determinants by a brute force expansion where most terms are zero, and there is a great deal of bookkeeping on plus and minus signs. The standard way to manage the algebra of computing determinants is to use a form of Laplace’s expansion. The key part of computing the determinant this way is to find cofactors of various matrix elements. Each element of a square matrix has a cofactor which is the determinant of a matrix with one fewer row and column possibly multiplied by minus one. The smaller matrix is obtained by eliminating the row and column that the element in question is in. For example, for a 10 × 10 matrix, the cofactor of a82 is the determinant of the 9 × 9 matrix with the 8th row and 2nd column eliminated. The sign of a cofactor is positive if the sum of the row and column indices is even and negative otherwise. This can be remembered by a checkerboard pattern:
正如我们之前所见,我们可以通过强力扩展来计算行列式,其中大多数项为零,并且需要大量记录正负号。管理计算行列式代数的标准方法是使用以下形式拉普拉斯展开式。用这种方法计算行列式的关键部分是找到各种矩阵元素的余因子。方阵的每个元素都有一个余因子,它是矩阵的行列式,行列式可以少一个,再乘以负一。通过消除相关元素所在的行和列,可以得到较小的矩阵。例如,对于 10 × 10 矩阵, 82的余因子是 9 × 9 矩阵的行列式,其中第 8 行和第 2 列被消除。如果行和列索引之和为偶数,则余因子的符号为正,否则为负。这可以通过棋盘格图案来记住:
So, for a 4 × 4 matrix,
因此,对于 4×4 矩阵,
The cofactors of the first row are
第一行的余因子是
The determinant of a matrix is found by taking the sum of products of the elements of any row or column with their cofactors. For example, the determinant of the 4 × 4 matrix above taken about its second column is
矩阵的行列式是通过将任意行或列的元素与其余因式的乘积相加而得出的。例如,上图 4 × 4 矩阵的第二列行列式为
We could do a similar expansion about any row or column and they would all yield the same result. Note the recursive nature of this expansion.
我们可以对任意行或列进行类似的扩展,它们都会产生相同的结果。请注意此扩展的递归性质。
Example 5 A concrete example for the determinant of a particular 3 × 3 matrix by expanding the cofactors of the first row is
例 5通过展开第一行的余子式,求出特定 3 × 3 矩阵的行列式的具体例子是
We can deduce that the volume of the parallelepiped formed by the vectors defined by the columns (or rows since the determinant of the transpose is the same) is zero. This is equivalent to saying that the columns (or rows) are not linearly independent. Note that the sum of the first and third rows is twice the second row, which implies linear dependence.
我们可以推断,由列(或行,因为转置的行列式相同)定义的向量形成的平行六面体的体积为零。这相当于说列(或行)不是线性独立的。请注意,第一行和第三行的总和是第二行的两倍,这意味着线性相关。
Determinants give us a tool to compute the inverse of a matrix. It is a very inefficient method for large matrices, but often in graphics, our matrices are small. A key to developing this method is that the determinant of a matrix with two identical rows is zero. This should be clear because the volume of the n-dimensional parallelepiped is zero if two of its sides are the same. Suppose we have a 4 × 4 A and we wish to find its inverse A–1. The inverse is
行列式为我们提供了一种计算矩阵逆的工具。对于大型矩阵来说,这是一种非常低效的方法,但在图形学中,我们的矩阵通常很小。开发此方法的关键是,具有两个相同行的矩阵的行列式为零。这应该很清楚,因为如果n维平行六面体的两条边相同,则其体积为零。假设我们有一个 4 × 4 的A ,我们希望找到它的逆A –1 。逆是
Note that this is just the transpose of the matrix where elements of A are replaced by their respective cofactors multiplied by the leading constant (1 or –1). This matrix is called the adjoint of A. The adjoint is the transpose of the cofactor matrix of A. We can see why this is an inverse. Look at the product AA–1 which we expect to be the identity. If we multiply the first row of A by the first column of the adjoint matrix we need to get |A| (remember the leading constant above divides by |A|:
请注意,这只是矩阵的转置,其中A的元素被其各自的余因子乘以首项常数(1 或-1 )所替换。这个矩阵称为A的伴生矩阵。伴生矩阵是A的余因子矩阵的转置。我们可以看出为什么它是逆矩阵。看看乘积AA -1 ,我们期望它是恒等式。如果我们将A的第一行乘以伴生矩阵的第一列,我们需要得到 | A |(记住上面的首项常数除以 | A |:
This is true because the elements in the first row of A are multiplied exactly by their cofactors in the first column of the adjoint matrix which is exactly the determinant. The other values along the diagonal of the resulting matrix are |A| for analogous reasons. The zeros follow a similar logic:
这是正确的,因为A的第一行中的元素恰好乘以伴随矩阵第一列中的余因子,而伴随矩阵的第一列恰好是行列式。出于类似原因,结果矩阵对角线上的其他值为 | A | 。零点遵循类似的逻辑:
Note that this product is a determinant of some matrix:
请注意,该乘积是某个矩阵的行列式:
The matrix in fact is
该矩阵实际上是
Because the first two rows are identical, the matrix is singular, and thus, its determinant is zero.
由于前两行相同,所以该矩阵是奇异的,因此其行列式为零。
The argument above does not apply just to four by four matrices; using that size just simplifies typography. For any matrix, the inverse is the adjoint matrix divided by the determinant of the matrix being inverted. The adjoint is the transpose of the cofactor matrix, which is just the matrix whose elements have been replaced by their cofactors.
上述论点不仅适用于四乘四矩阵;使用该大小只会简化排版。对于任何矩阵,逆矩阵都是伴随矩阵除以被逆矩阵的行列式。伴随矩阵是余因子矩阵的转置,即元素已被其余因子替换的矩阵。
Example 6 The inverse of one particular three-by-three matrix whose determinant is 6 is
例 6一个特定的三乘三矩阵,其行列式为 6,其逆矩阵为
You can check this yourself by multiplying the matrices and making sure you get the identity.
您可以通过将矩阵相乘并确保获得恒等式来自己检查这一点。
We often encounter linear systems in graphics with “n equations and n unknowns,” usually for n = 2 or n = 3. For example,
我们在图形学中经常会遇到具有“ n 个方程和n 个未知数”的线性系统,通常为n = 2 或n = 3。例如,
Here, x, y, and z are the “unknowns” for which we wish to solve. We can write this in matrix form:
这里, x 、 y和z是我们希望求解的“未知数”。我们可以将其写成矩阵形式:
A common shorthand for such systems is Ax = b where it is assumed that A is a square matrix with known constants, x is an unknown column vector (with elements x, y,and z in our example), and b is a column matrix of known constants.
此类系统的常见简写是Ax = b,其中假设A是具有已知常数的方阵, x是未知列向量(在我们的示例中具有元素x 、 y和z ), b是已知常数的列矩阵。
There are many ways to solve such systems, and the appropriate method depends on the properties and dimensions of the matrix A. Because in graphics we so frequently work with systems of size n ≤ 4, we’ll discuss here a method appropriate for these systems, known as Cramer’s rule, which we saw earlier, from a 2D geometric viewpoint, in the example on page 108. Here, we show this algebraically. The solution to the above equation is
有许多方法可以解决此类系统,而适当的方法取决于矩阵A的属性和维度。由于在图形学中我们经常处理大小为n ≤ 4 的系统,因此我们将在此讨论一种适合这些系统的方法,称为我们之前在第 108 页的例子中从二维几何角度看到了克莱姆规则。这里我们用代数方法展示它。上述方程的解是
The rule here is to take a ratio of determinants, where the denominator is |A| and the numerator is the determinant of a matrix created by replacing a column of A with the column vector b. The column replaced corresponds to the position of the unknown in vector x. For example, y is the second unknown and the second column is replaced. Note that if |A| = 0, the division is undefined and there is no solution. This is just another version of the rule that if A is singular (zero determinant), then there is no unique solution to the equations.
这里的规则是取行列式的比率,其中分母是 | A |,分子是通过将A的一列替换为列向量b而创建的矩阵的行列式。替换的列对应于向量x中未知数的位置。例如, y是第二个未知数,第二列被替换。请注意,如果 | A | = 0,则除法未定义且无解。这只是规则的另一个版本,即如果A是奇异的(零行列式),则方程没有唯一解。
Square matrices have eigenvalues and eigenvectors associated with them. The eigenvectors are those nonzero vectors whose directions do not change when multiplied by the matrix. For example, suppose for a matrix A and vector a,wehave
方阵具有特征值和与它们相关的特征向量。特征向量是那些与矩阵相乘时方向不变的非零向量。例如,假设矩阵A和向量a ,我们有
This means we have stretched or compressed a, but its direction has not changed. The scale factor λ is called the eigenvalue associated with eigenvector a. Knowing the eigenvalues and eigenvectors of matrices is helpful in a variety of practical applications. We will describe them to gain insight into geometric transformation matrices and as a step toward singular values and vectors described in the next section.
这意味着我们拉伸或压缩了a ,但其方向没有改变。比例因子 λ 称为与特征向量a相关的特征值。了解矩阵的特征值和特征向量对各种实际应用都很有帮助。我们将描述它们以深入了解几何变换矩阵,并作为下一节中描述的奇异值和向量的一步。
If we assume a matrix has at least one eigenvector, then we can do a standard manipulation to find it. First, we write both sides as the product of a square matrix with the vector a:
如果我们假设一个矩阵至少有一个特征向量,那么我们可以做一个标准操作来找到它。首先,我们将两边写成方阵与向量a的乘积:
where I is an identity matrix. This can be rewritten
其中I是单位矩阵。这可以改写为
Because matrix multiplication is distributive, we can group the matrices:
因为矩阵乘法是分配的,所以我们可以对矩阵进行分组:
This equation can only be true if the matrix (A –λI) is singular, and thus, its determinant is zero. The elements in this matrix are the numbers in A except along the diagonal. For example, for a 2 × 2 matrix the eigenvalues obey
此方程仅当矩阵 ( A –λ I ) 为奇异矩阵时才成立,因此其行列式为零。此矩阵中的元素是A中除对角线外的数字。例如,对于 2 × 2 矩阵,特征值遵循
Because this is a quadratic equation, we know there are exactly two solutions for λ. These solutions may or may not be unique or real. A similar manipulation for an n × n matrix will yield an nth-degree polynomial in λ. Because it is not possible, in general, to find exact explicit solutions of polynomial equations of degree greater than four, we can only compute eigenvalues of matrices 4 × 4 or smaller by analytic methods. For larger matrices, numerical methods are the only option.
因为这是一个二次方程,我们知道 λ 恰好有两个解。这些解可能是也可能不是唯一的或实数。对n × n矩阵进行类似操作将得到 λ 的n次多项式。因为一般不可能找到四次以上多项式方程的精确显式解,所以我们只能通过分析方法计算 4 × 4 或更小矩阵的特征值。对于较大的矩阵,数值方法是唯一的选择。
An important special case where eigenvalues and eigenvectors are particularly simple is symmetric matrices (where A = AT). The eigenvalues of real symmetric matrices are always real numbers, and if they are also distinct, their eigenvectors are mutually orthogonal. Such matrices can be put into diagonal form:
特征值和特征向量特别简单的一个重要特殊情况是对称矩阵(其中A = A T )。实对称矩阵的特征值始终为实数,如果它们也不同,则它们的特征向量相互正交。此类矩阵可以表示为对角形式:
where Q is an orthogonal matrix and D is a diagonal matrix. The columns of Q are the eigenvectors of A and the diagonal elements of D are the eigenvalues of A. Putting A in this form is also called the eigenvalue decomposition, because it decomposes A into a product of simpler matrices that reveal its eigenvectors and eigenvalues.
其中Q是正交矩阵, D是对角矩阵。Q 的列是A的特征向量, D的对角元素是A的特征值。将A置于这种形式也称为特征值分解,因为它将A分解为更简单矩阵的乘积,从而揭示其特征向量和特征值。
Recall that an orthogonal matrix has orthonormal rows and orthonormal columns.
回想一下,正交矩阵具有正交行和正交列。
Example 7 Given the matrix
例 7给定矩阵
the eigenvalues of A are the solutions to
A的特征值是
We approximate the exact values for compactness of notation:
为了符号的紧凑性,我们近似计算精确值:
Now we can find the associated eigenvector. The first is the nontrivial (not x = y = 0) solution to the homogeneous equation,
现在我们可以找到相关的特征向量。第一个是齐次方程的非平凡(非x = y = 0)解,
This is approximately (x, y) = (0.8507, 0.5257) . Note that there are infinitely many solutions parallel to that 2D vector, and we just picked the one of unit length. Similarly, the eigenvector associated with λ2 is (x, y) = (–0.5257, 0.8507) .This means the diagonal form of A is (within some precision due to our numeric approximation):
这近似为 ( x, y ) = (0.8507, 0.5257) 。请注意,与该二维向量平行的解有无数个,我们只选取了单位长度的解。类似地,与 λ 2相关的特征向量为 ( x, y ) = ( – 0.5257, 0.8507) 。这意味着A的对角线形式为(由于我们的数值近似,精度在一定范围内):
We will revisit the geometry of this matrix as a transform in the next chapter.
在下一章中,我们将重新讨论该矩阵作为变换的几何形状。
We saw in the last section that any symmetric matrix can be diagonalized, or decomposed into a convenient product of orthogonal and diagonal matrices. However, most matrices we encounter in graphics are not symmetric, and the eigenvalue decomposition for nonsymmetric matrices is not nearly so convenient or illuminating, and in general involves complex-valued eigenvalues and eigenvectors even for real-valued inputs.
我们在上一节中看到,任何对称矩阵都可以对角化,或者分解为正交矩阵和对角矩阵的方便乘积。然而,我们在图形中遇到的大多数矩阵都不是对称的,非对称矩阵的特征值分解远没有那么方便或有启发性,而且即使对于实值输入,通常也涉及复值特征值和特征向量。
We would recommend learning in this order: symmetric eigenvalues/vectors, singular values/vectors, and then nonsymmetric eigenvalues, which are much trickier.
我们建议按此顺序学习:对称特征值/向量、奇异值/向量,然后是非对称特征值,这要棘手得多。
There is another generalization of the symmetric eigenvalue decomposition to nonsymmetric (and even non-square) matrices; it is the singular value decomposition (SVD). The main difference between the eigenvalue decomposition of a symmetric matrix and the SVD of a nonsymmetric matrix is that the orthogonal matrices on the left and right sides are not required to be the same in the SVD:
对称特征值分解还有另一种推广形式,即奇异值分解(SVD),适用于非对称(甚至非方阵)矩阵。对称矩阵的特征值分解和非对称矩阵的 SVD 之间的主要区别在于,SVD 中左右两侧的正交矩阵不必相同:
Here, U and V are two, potentially different, orthogonal matrices, whose columns are known as the left and right singular vectors of A,and S is a diagonal matrix whose entries are known as the singular values of A. When A is symmetric and has all nonnegative eigenvalues, the SVD and the eigenvalue decomposition are the same.
这里, U和V是两个可能不同的正交矩阵,其列称为A的左奇异向量和右奇异向量, S是一个对角矩阵,其元素称为A的奇异值。当A是对称的并且具有所有非负特征值时,SVD 和特征值分解是相同的。
There is another relationship between singular values and eigenvalues that can be used to compute the SVD (though this is not the way an industrial-strength SVD implementation works). First, we define M = AAT. We assume that we can perform a SVD on M:
奇异值和特征值之间还有另一种关系,可用于计算 SVD(尽管这不是工业强度 SVD 实现的工作方式)。首先,我们定义M = AA T 。我们假设我们可以对M执行 SVD:
The substitution is based on the fact that (BC)T = CTBT, that the transpose of an orthogonal matrix is its inverse, and that the transpose of a diagonal matrix is the matrix itself. The beauty of this new form is that M is symmetric and US2UT is its eigenvalue decomposition, where S2 contains the (all nonnegative) eigenvalues. Thus, we find that the singular values of a matrix are the square roots of the eigenvalues of the product of the matrix with its transpose, and the left singular vectors are the eigenvectors of that product. A similar argument allows V, the matrix of right singular vectors, to be computed from ATA.
这种代换基于以下事实:( BC ) T = CTBT ,正交矩阵的转置是其逆,对角矩阵的转置就是矩阵本身。这种新形式的妙处在于, M是对称的,且US2UT是其特征值分解,其中S2包含(所有非负)特征值。因此,我们发现矩阵的奇异值是矩阵与其转置乘积的特征值的平方根,而左奇异向量是该乘积的特征向量。类似的论证允许从A T A计算出右奇异向量矩阵V。
Example 8 We now make this concrete with an example:
例 8现在我们用一个例子来具体说明:
We saw the eigenvalue decomposition for this matrix in the previous section. We observe immediately
我们在上一节中看到了这个矩阵的特征值分解。我们立即观察到
We can solve for V algebraically:
我们可以用代数方法求解V :
The inverse of S is a diagonal matrix with the reciprocals of the diagonal elements of S. This yields
S的逆是一个对角矩阵,其对角元素的倒数为S的对角线元素的倒数。这得出
This form used the standard symbol σi for the ith singular value. Again, for a symmetric matrix, the eigenvalues and the singular values are the same (σi = λi). We will examine the geometry of SVD further in Section 7.1.6.
此形式使用标准符号 σ 表示第 i个奇异值。同样,对于对称矩阵,特征值和奇异值相同(σ = λ)。我们将在第 7.1.6 节中进一步研究 SVD 的几何形状。
Why is matrix multiplication defined the way it is rather than just element by element?
为什么矩阵乘法是这样定义的而不是逐个元素地定义?
Element by element multiplication is a perfectly good way to define matrix multiplication, and indeed, it has nice properties. However, in practice it is not very useful. Ultimately, most matrices are used to transform column vectors; e.g., in 3D you might have
逐元素乘法是定义矩阵乘法的一种非常好的方法,而且它确实具有很好的特性。然而,在实践中它并不是很有用。最终,大多数矩阵用于变换列向量;例如,在 3D 中,你可能有
where a and b are vectors and M is a 3×3 matrix. To allow geometric operations such as rotation, combinations of all three elements of a must go into each element of b. That requires us to go either row-by-row or column-by-column through M. That choice is made based on composition of matrices having the desired property,
其中a和b是向量, M是 3×3 矩阵。为了允许旋转等几何运算, a的所有三个元素的组合必须进入b的每个元素。这要求我们逐行或逐列地遍历M 。该选择基于具有所需属性的矩阵组合,
which allows us to use one composite matrix C = M2M1 to transform our vector. This is valuable when many vectors will be transformed by the same composite matrix. So, in summary, the somewhat weird rule for matrix multiplication is engineered to have these desired properties.
这使我们能够使用一个复合矩阵C = M 2 M 1来变换我们的向量。当许多向量将由同一个复合矩阵变换时,这很有用。因此,总而言之,矩阵乘法的这个有点奇怪的规则被设计成具有这些所需的属性。
Sometimes I hear that eigenvalues and singular values are the same thing and sometimes that one is the square of the other. Which is right?
有时我听说特征值和奇异值是同一个东西,有时又说一个是另一个的平方。到底哪个是对的?
If a real matrix A is symmetric, and its eigenvalues are nonnegative, then its eigenvalues and singular values are the same. If A is not symmetric, the matrix M = AAT is symmetric and has nonnegative real eignenvalues. The singular values of A and AT are the same and are the square roots of the singular/eigenvalues of M. Thus, when the square root statement is made, it is because two different matrices (with a very particular relationship) are being talked about: M = AAT.
如果实矩阵A是对称的,并且其特征值是非负的,则其特征值和奇异值相同。如果A不对称,则矩阵M = AA T是对称的,并且具有非负实特征值。A 和A T的奇异值相同,并且是M的奇异值/特征值的平方根。因此,当做出平方根陈述时,是因为正在讨论两个不同的矩阵(具有非常特殊的关系): M = AA T 。
The discussion of determinants as volumes is based on A Vector Space Approach to Geometry (Hausner, 1998). Hausner has an excellent discussion of vector analysis and the fundamentals of geometry as well. The geometric derivation of Cramer’s rule in 2D is taken from Practical Linear Algebra: A Geometry Tool-box (Farin & Hansford, 2004). That book also has geometric interpretations of other linear algebra operations such as Gaussian elimination. The discussion of eigenvalues and singular values is based primarily on Linear Algebra and Its Applications (Strang, 1988). The example of SVD of the shear matrix is based on a discussion in Computer Graphics and Geometric Modeling (Salomon, 1999).
关于行列式作为体积的讨论基于《几何的向量空间方法》 (Hausner,1998 年)。Hausner 对向量分析和几何学基础进行了出色的讨论。克莱姆法则在二维中的几何推导取自《实用线性代数:几何工具箱》 (Farin & Hansford,2004 年)。该书还对其他线性代数运算(如高斯消元法)进行了几何解释。关于特征值和奇异值的讨论主要基于《线性代数及其应用》 (Strang,1988 年)。剪切矩阵的 SVD 示例基于《计算机图形学和几何建模》 (Salomon,1999 年)中的讨论。
1. Write an implicit equation for the 2D line through points (x0,y0) and (x1,y1) using a 2D determinant.
1.使用二维行列式写出通过点 ( x 0 ,y 0 ) 和 ( x 1 ,y 1 ) 的二维直线的隐式方程。
2. Show that if the columns of a matrix are orthonormal, then so are the rows.
2.证明如果矩阵的列是正交的,那么行也是如此。
3. Prove the properties of matrix determinants stated in Equations (6.5)–(6.7).
3.证明公式(6.5)-(6.7)中矩阵行列式的性质。
4. Show that the eigenvalues of a diagonal matrix are its diagonal elements.
4.证明对角矩阵的特征值是其对角线元素。
5. Show that for a square matrix A, AAT is a symmetric matrix.
5.证明对于方阵A , AA T是对称矩阵。
6. Show that for three 3D vectors a, b, c, the following identity holds: |abc| = (a × b) · c .
6.证明对于三个三维向量a 、 b 、 c ,以下恒等式成立: | abc | = ( a × b ) · c 。
7. Explain why the volume of the tetrahedron with side vectors a, b, c (see Figure 6.2) is given by |abc|/6.
7.解释为什么边矢为a 、 b 、 c的四面体(见图6.2 )的体积由| abc | / 6给出。
8. Demonstrate the four interpretations of matrix–matrix multiplication by taking the following matrix–matrix multiplication code, rearranging the nested loops, and interpreting the resulting code in terms of matrix and vector operations.
8.通过采用以下矩阵 - 矩阵乘法代码、重新排列嵌套循环并根据矩阵和向量运算解释生成的代码,演示矩阵 - 矩阵乘法的四种解释。
function mat-mult(in a[m][p], in b[p][n], out c[m][n]) {
// the array c is initialized to zero
for i = 1 to m
for j = 1 to n
for k = 1 to p
c[i][j] += a[i][k] * b[k][j]
}
9. Prove that if A, Q, and D satisfy Equation (6.14), v is the ith column of Q, and λ is the ith entry on the diagonal of D, then v is an eigenvector of A with eigenvalue λ.
9.证明:若A 、 Q和D满足公式 (6.14), v是Q的第i列,λ 是D对角线上的第 i个元素,则v是A的一个特征向量,特征值为 λ。
10. Prove that if A, Q,and D satisfy Equation (6.14), the eigenvalues of A are all distinct, and v is an eigenvector of A with eigenvalue λ, then for some i, v is the ith row of Q and λ is the ith entry on the diagonal of D.
10.证明:若A 、 Q和D满足公式(6.14),且A的特征值各不相同,且v是A的一个特征向量,特征值为 λ ,则对于某个i , v是Q的第i行,λ 是D对角线上的第i个元素。
11. Given the (x, y) coordinates of the three vertices of a 2D triangle, explain why the area is given by
11.给定二维三角形三个顶点的 ( x, y ) 坐标,解释为什么面积由下式给出
The machinery of linear algebra can be used to express many of the operations required to arrange objects in a 3D scene, view them with cameras, and get them onto the screen. Geometric transformations such as rotation, translation, scaling, and projection can be accomplished with matrix multiplication, and the transformation matrices used to do this are the subject of this chapter.
线性代数机制可用于表达在 3D 场景中排列物体、用相机查看它们并将它们显示在屏幕上所需的许多操作。旋转、平移、缩放、投影等几何变换可以通过矩阵乘法来实现,用于实现这一目的的变换矩阵是本章的主题。
We will show how a set of points transform if the points are represented as offset vectors from the origin, and we will use the clock shown in Figure 7.1 as an example of a point set. So think of the clock as a bunch of points that are the ends of vectors whose tails are at the origin. We also discuss how these transforms operate differently on locations (points), displacement vectors, and surface normal vectors.
我们将展示如果将一组点表示为偏离原点的偏移向量,这些点将如何变换,并使用图 7.1中所示的时钟作为点集的示例。因此,将时钟视为一堆点,这些点是向量的末端,尾部位于原点。我们还讨论了这些变换在位置(点)、位移向量和表面法向量上的不同操作方式。
We can use a 2 × 2 matrix to change, or transform, a 2D vector:
我们可以使用 2×2 矩阵来改变或变换二维向量:
This kind of operation, which takes in a 2-vector and produces another 2-vector by a simple matrix multiplication, is a linear transformation.
这种运算以一个2向量为输入,通过简单的矩阵乘法生成另一个2向量,是一种线性变换。
By this simple formula, we can achieve a variety of useful transformations, depending on what we put in the entries of the matrix, as will be discussed in the following sections. For our purposes, consider moving along the x-axis a horizontal move and along the y-axis, a vertical move.
通过这个简单的公式,我们可以实现各种有用的变换,具体取决于我们在矩阵中输入的内容,这将在以下部分中讨论。为了我们的目的,考虑沿x轴移动为水平移动,沿y轴移动为垂直移动。
The most basic transform is a scale along the coordinate axes. This transform can change length and possibly direction:
最基本的变换是沿坐标轴的缩放。此变换可以改变长度,也可能改变方向:
Note what this matrix does to a vector with Cartesian components (x, y) :
注意此矩阵对具有笛卡尔分量 ( x, y ) 的向量的作用:
So, just by looking at the matrix of an axis-aligned scale, we can read off the two scale factors.
因此,只需查看轴对齐比例的矩阵,我们就可以读出两个比例因子。
Example 9 The matrix that shrinks x and y uniformly by a factor of two is (Figure 7.1)
例 9将x和y均匀缩小二倍的矩阵是(图 7.1 )
A matrix which halves in the horizontal and increases by three-halves in the vertical is (see Figure 7.2)
一个矩阵在水平方向上减半,在垂直方向上增加三倍,(见图7.2 )
Figure 7.1. Scaling uniformly by half for each axis: The axis-aligned scale matrix has the proportion of change in each of the diagonal elements and zeroes in the off-diagonal elements.
图 7.1.对每个轴均匀缩放一半:轴对齐缩放矩阵在每个对角线元素中都有变化比例,而在非对角线元素中则为零。
Figure 7.2. Scaling nonuniformly in x and y: The scaling matrix is diagonal with non-equal elements. Note that the square outline of the clock becomes a rectangle and the circular face becomes an ellipse.
图 7.2。x和y方向上非均匀缩放:缩放矩阵为对角矩阵,元素不相等。请注意,时钟的方形轮廓变为矩形,圆形表面变为椭圆形。
A shear is something that pushes things sideways, producing something like a deck of cards across which you push your hand; the bottom card stays put and cards move more the closer they are to the top of the deck. The horizontal and vertical shear matrices are
剪切力是一种将物体向侧面推的力,产生类似于将手推过一副牌的力;最下面的牌保持不动,牌越靠近牌堆顶部,移动的幅度就越大。水平和垂直剪切矩阵是
Example 10 The transform that shears horizontally so that vertical lines become 45° lines leaning toward the right is (see Figure 7.3)
例 10水平剪切变换使垂直线变为向右倾斜的 45° 线(见图7.3 )
Figure 7.3. An x-shear matrix moves points to the right in proportion to their y-coordinate. Now the square outline of the clock becomes a parallelogram and, as with scaling, the circular face of the clock becomes an ellipse.
图 7.3。x剪切矩阵按y坐标的比例将点向右移动。现在时钟的方形轮廓变成了平行四边形,并且与缩放一样,时钟的圆形表面变成了椭圆形。
Figure 7.4. A y-shear matrix moves points up in proportion to their x-coordinate.
图 7.4。y剪切矩阵按x坐标的比例将点向上移动。
An analogous transform vertically is (see Figure 7.4)
垂直方向上的类似变换是(见图7.4 )
In both cases, the square outline of the sheared clock becomes a parallelogram, and the circular face of the sheared clock becomes an ellipse.
在这两种情况下,剪切时钟的方形轮廓变成了平行四边形,剪切时钟的圆形表面变成了椭圆形。
Another way to think of a shear is in terms of rotation of only the vertical (or horizontal) axis. The shear transform that takes a vertical axis and tilts it clockwise by an angle ϕ is
另一种理解剪切的方式是只旋转垂直轴(或水平轴)。剪切变换取垂直轴并将其顺时针倾斜一个角度 ϕ,其形式为
In fact, the image of a circle under any matrix transformation is an ellipse.
事实上,圆在任何矩阵变换下的图像都是椭圆。
Similarly, the shear matrix which rotates the horizontal axis counterclockwise by angle ϕ is
类似地,将水平轴逆时针旋转角度φ的剪切矩阵为
Figure 7.5. The geometry for Equation (7.1).
图 7.5.方程 (7.1) 的几何形状。
Suppose we want to rotate a vector a by an angle ϕ counterclockwise to get vector b (Figure 7.5). If a makes an angle α with the x-axis, and its length is , then we know that
假设我们要将向量a逆时针旋转一个角度 φ 得到向量b (图 7.5 )。如果a与x轴的夹角为 α,其长度为r =十一个2 +是一个2 ,那么我们知道
Because b is a rotation of a, it also has length r. Because it is rotated an angle ϕ from a, b makes an angle (α + ϕ) with the x-axis. Using the trigonometric addition identities (Section 2.3.3):
因为b是a的旋转,所以它的长度也是r 。由于它从a旋转了一个角度 φ,所以b与x轴形成一个角度 (α + φ)。使用三角加法恒等式(第 2.3.3 节):
Substituting xa = r cos α and ya = r sin α gives
代入x a = r cos α 和y a = r sin α 可得
In matrix form, the transformation that takes a to b is then
以矩阵形式表示,从a到b 的变换为
Example 11 A matrix that rotates vectors by π/4 radians (45°) is (see Figure 7.6)
例 11将向量旋转π/ 4 弧度(45°)的矩阵是(见图7.6 )
Figure 7.6. A rotation by 45°. Note that the rotation is counterclockwise and that cos(45°) = sin(45°) ≈ .707.
图 7.6。旋转 45°。请注意,旋转是逆时针的,并且 cos(45°) = sin(45°) ≈ .707。
A matrix that rotates by π/6 radians (30°)inthe clockwise direction is a rotation by – π/6 radians in our framework (see Figure 7.7):
在我们的框架中,顺时针方向旋转π/ 6 弧度(30°)的矩阵相当于旋转-π/ 6 弧度(见图7.7 ):
Figure 7.7. A rotation by –30°. Note that the rotation is clockwise and that cos(–30°) ≈ .866 and sin(–30°) = –.5.
图 7.7。旋转-30 °。请注意,旋转是顺时针的,cos(-30°) ≈ .866,sin(-30°) = -.5。
Because the norm of each row of a rotation matrix is one (sin2 ϕ+cos2 ϕ = 1), and the rows are orthogonal (cos ϕ(– sin ϕ)+sin ϕ cos ϕ = 0), we see that rotation matrices are orthogonal matrices (Section 6.2.4). By looking at the matrix, we can read off two pairs of orthonormal vectors: the two columns, which are the vectors to which the transformation sends the canonical basis vectors (1, 0) and (0, 1) ; and the rows, which are the vectors that the transformations sends to the canonical basis vectors.
因为旋转矩阵每一行的范数都是 1(sin 2 ϕ+cos 2 ϕ = 1),且各行正交(cos ϕ( – sin ϕ)+sin ϕ cos ϕ = 0),所以我们知道旋转矩阵是正交矩阵(第 6.2.4 节)。通过查看矩阵,我们可以读出两对正交向量:两列,即变换将标准基向量 (1, 0) 和 (0, 1) 发送到的向量;行,即变换发送给标准基向量的向量。
Said briefly, Rei = ui and Rvi = ui, for a rotation with columns ui and rows vi.
简而言之,对于列为u 、行为v的旋转, Re = u且Rv = u 。
We can reflect a vector across either of the coordinate axes by using a scale with one negative scale factor (see Figures 7.8 and 7.9):
我们可以使用具有一个负比例因子的比例来在任一坐标轴上反射一个矢量(见图7.8和7.9 ):
Figure 7.8. A reflection about the y-axis is achieved by multiplying all x-coordinates by –1.
图 7.8。通过将所有x坐标乘以 -1 来实现关于y轴的反射。
Figure 7.9. A reflection about the x-axis is achieved by multiplying all y-coordinates by –1.
图 7.9。通过将所有y坐标乘以 -1 来实现关于x轴的反射。
While one might expect that the matrix with –1 in both elements of the diagonal is also a reflection, in fact it is just a rotation by π radians.
尽管人们可能认为对角线两个元素均为-1的矩阵也是一个反射,但实际上它只是按π弧度旋转而已。
This rotation can also be called a “reflection through the origin.”
这种旋转也可以称为“通过原点的反射”。
It is common for graphics programs to apply more than one transformation to an object. For example, we might want to first apply a scale S and then a rotation R. This would be done in two steps on a 2D vector v1:
图形程序通常会对一个对象应用多个变换。例如,我们可能希望首先应用缩放S ,然后应用旋转R。这将在 2D 向量v 1上分两步完成:
Another way to write this is
另一种写法是
Because matrix multiplication is associative, we can also write
因为矩阵乘法是结合的,所以我们也可以写成
In other words, we can represent the effects of transforming a vector by two matrices in sequence using a single matrix of the same size, which we can compute by multiplying the two matrices: M = RS (Figure 7.10).
换句话说,我们可以使用相同大小的单个矩阵来表示按顺序通过两个矩阵转换一个向量的效果,我们可以通过将两个矩阵相乘来计算: M = RS (图 7.10 )。
It is very important to remember that these transforms are applied from the right side first. So the matrix M = RS first applies S and then R.
记住这些变换首先从右侧应用,这一点非常重要。因此矩阵M = RS首先应用S ,然后应用R。
Figure 7.10. Applying the two transform matrices in sequence is the same as applying the product of those matrices once. This is a key concept that underlies most graphics hardware and software.
图 7.10。按顺序应用两个变换矩阵与应用一次这些矩阵的乘积相同。这是大多数图形硬件和软件的基础概念。
Example 12 Suppose we want to scale by one-half in the vertical direction and then rotate by π/4 radians (45°). The resulting matrix is
例 12假设我们要在垂直方向上缩放一半,然后旋转π/ 4 弧度(45°)。结果矩阵为
It is important to always remember that matrix multiplication is not commutative. So the order of transforms does matter. In this example, rotating first and then scaling result in a different matrix (see Figure 7.11):
重要的是要始终记住矩阵乘法不是可交换的。因此,变换的顺序确实很重要。在此示例中,先旋转然后缩放会产生不同的矩阵(参见图 7.11 ):
Example 13 Using the scale matrices we have presented, nonuniform scaling can only be done along the coordinate axes. If we wanted to stretch our clock by 50% along one of its diagonals, so that 8:00 through 1:00 move to the northwest and 2:00 through 7:00 move to the southeast, we can use rotation matrices in combination with an axis-aligned scaling matrix to get the result we want. The idea is to use a rotation to align the scaling axis with a coordinate axis, then scale along that axis, and then rotate back. In our example, the scaling axis is the “backslash” diagonal of the square, and we can make it parallel to the x-axis with
示例 13使用我们介绍的缩放矩阵,非均匀缩放只能沿坐标轴进行。如果我们想将时钟沿其一条对角线拉伸 50%,以便 8:00 到 1:00 向西北移动,2:00 到 7:00 向东南移动,我们可以结合使用旋转矩阵和轴对齐缩放矩阵来获得所需的结果。这个想法是使用旋转将缩放轴与坐标轴对齐,然后沿该轴缩放,然后旋转回来。在我们的例子中,缩放轴是正方形的“反斜杠”对角线,我们可以用以下方法使其与x轴平行
Figure 7.11. The order in which two transforms are applied is usually important. In this example, we do a scale by one-half in y and then rotate by 45°. Reversing the order in which these two transforms are applied yields a different result.
图 7.11。应用两个变换的顺序通常很重要。在此示例中,我们在y轴上缩放一半,然后旋转 45°。反转这两个变换的应用顺序会产生不同的结果。
a rotation by +45°. Putting these operations together, the full transformation is
旋转 +45°。将这些操作组合在一起,完整的变换就是
Remember to read the transformations from right to left.
记住从右到左阅读变换。
In mathematical notation, this can be written RSRT. The result of multiplying the three matrices together is
在数学符号中,这可以写成RSR T 。将三个矩阵相乘的结果是
It is no coincidence that this matrix is symmetric— try applying the transpose-of-product rule to the formula RSRT.
这个矩阵是对称的,这并非巧合——尝试将转置乘积规则应用于公式RSR T 。
Building up a transformation from rotation and scaling transformations actually works for any linear transformation, and this fact leads to a powerful way of thinking about these transformations, as explored in the next section.
从旋转和缩放变换构建变换实际上适用于任何线性变换,而这一事实导致了一种思考这些变换的强有力的方法,如下一节所探讨的。
Sometimes, it’s necessary to “undo” a composition of transformations, taking a transformation apart into simpler pieces. For instance, it’s often useful to present a transformation to the user for manipulation in terms of separate rotations and scale factors, but a transformation might be represented internally simply as a matrix, with the rotations and scales already mixed together. This kind of manipulation can be achieved if the matrix can be computationally disassembled into the desired pieces, the pieces adjusted, and the matrix reassembled by multiplying the pieces together again.
有时,需要“撤消”变换组合,将变换分解成更简单的部分。例如,将变换以单独的旋转和缩放因子的形式呈现给用户以供操作通常很有用,但变换可能在内部简单地表示为矩阵,其中旋转和缩放已经混合在一起。如果矩阵可以通过计算分解成所需的部分,调整这些部分,并通过将这些部分再次相乘来重新组装矩阵,则可以实现这种操作。
It turns out that this decomposition, or factorization, is possible, regardless of the entries in the matrix—and this fact provides a fruitful way of thinking about transformations and what they do to geometry that is transformed by them.
事实证明,无论矩阵中的条目是什么,这种分解或因式分解都是可能的——并且这一事实为思考变换以及它们对被变换的几何图形的影响提供了一种富有成效的方法。
Let’s start with symmetric matrices. Recall from Section 6.4 that a symmetric matrix can always be taken apart using the eigenvalue decomposition into a product of the form
让我们从对称矩阵开始。回想一下第 6.4 节,对称矩阵总是可以通过特征值分解分解为以下形式的乘积
where R is an orthogonal matrix and S is a diagonal matrix; we will call the columns of R (the eigenvectors) by the names v1 and v2, and we’ll call the diagonal entries of S (the eigenvalues) by the names λ1 and λ2.
其中R是正交矩阵, S是对角矩阵;我们将R的列(特征向量)称为v 1和v 2 ,将S的对角线项(特征值)称为 λ 1和 λ 2 。
In geometric terms, we can now recognize R as a rotation and S as a scale, so this is just a multi-step geometric transformation (Figure 7.12):
从几何角度来说,我们现在可以将R视为旋转,将S视为缩放,因此这只是一个多步骤的几何变换(图 7.12 ):
Figure 7.12. What happens when the unit circle is transformed by an arbitrary matrix A. The two perpendicular vectors v1 and v2, which are the right singular vectors of A, get scaled and changed in direction to match the left singular vectors, u1 and u2. In terms of elementary transformations, this can be seen as first rotating the right singular vectors to the canonical basis, doing an axis-aligned scale, and then rotating the canonical basis to the left singular vectors.
图 7.12。当单位圆被任意矩阵A变换时会发生什么。两个垂直向量v 1和v 2是A的右奇异向量,它们被缩放并改变方向以匹配左奇异向量u 1和u 2 。就初等变换而言,这可以看作是首先将右奇异向量旋转到标准基,进行轴对齐缩放,然后将标准基旋转到左奇异向量。
Rotate v1 and v2 to the x- and y-axes (the transform by RT).
将v 1和v 2旋转到x轴和y轴(由R T变换)。
Scale in x and y by (λ1,λ2) (the transform by S).
将x和y缩放至 (λ 1 ,λ 2 )(按S进行变换)。
Rotate the x- and y-axes back to v1 and v2 (the transform by R).
将x轴和y轴旋转回v 1和v 2 (由R变换)。
If you like to count dimensions: a symmetric 2 × 2 matrix has 3° of freedom, and the eigenvalue decomposition rewrites them as a rotation angle and two scale factors.
如果你喜欢计算维度:对称 2×2 矩阵具有 3° 自由度,特征值分解将它们重写为旋转角度和两个比例因子。
Looking at the effect of these three transforms together, we can see that they have the effect of a nonuniform scale along a pair of axes. As with an axis-aligned scale, the axes are perpendicular, but they aren’t the coordinate axes; instead, they are the eigenvectors of A. This tells us something about what it means to be a symmetric matrix: symmetric matrices are just scaling operations—albeit potentially nonuniform and non–axis-aligned ones.
综合考虑这三种变换的效果,我们可以看出它们具有沿一对轴进行非均匀缩放的效果。与轴对齐缩放一样,轴是垂直的,但它们不是坐标轴;相反,它们是A的特征向量。这告诉我们对称矩阵的含义:对称矩阵只是缩放操作——尽管可能是非均匀和非轴对齐的。
Example 14 Recall the example from Section 6.4:
例 14回想一下第 6.4 节中的例子:
The matrix above, then, according to its eigenvalue decomposition, scales in a direction 31.7° counterclockwise from three o’clock (the x-axis). This is a touch before 2 p.m. on the clockface as is confirmed by Figure 7.13.
那么,根据特征值分解,上述矩阵从三点钟方向( x轴)向逆时针方向缩放 31.7°。这是钟面上下午 2 点之前的触感,如图 7.13所示。
Figure 7.13. A symmetric matrix is always a scale along some axis. In this case, it is along the ϕ = 31.7° direction which means the real eigenvector for this matrix is in that direction.
图 7.13。对称矩阵总是沿某个轴缩放。在本例中,它沿 ϕ = 31.7° 方向,这意味着该矩阵的实特征向量在该方向上。
We can also reverse the diagonalization process; to scale by (λ1,λ2) with the first scaling direction an angle ϕ clockwise from the x-axis, we have
我们也可以反转对角化过程;按 (λ 1 ,λ 2 ) 缩放,第一个缩放方向与x轴顺时针成一个角度 ϕ,我们有
We should take heart that this is a symmetric matrix as we know must be true since we constructed it from a symmetric eigenvalue decomposition.
我们应该记住这是一个对称矩阵,因为我们知道它一定是正确的,因为我们是通过对称特征值分解构建它的。
A very similar kind of decomposition can be done with nonsymmetric matrices as well: it’s the singular value decomposition (SVD), also discussed in Section 6.4.1. The difference is that the matrices on either side of the diagonal matrix are no longer the same:
非对称矩阵也可以进行非常类似的分解:奇异值分解 (SVD),第 6.4.1 节也对此进行了讨论。不同之处在于对角矩阵两边的矩阵不再相同:
The two orthogonal matrices that replace the single rotation R are called U and V, and their columns are called ui (the left singular vectors) and vi (the right singular vectors), respectively. In this context, the diagonal entries of S are called singular values rather than eigenvalues. The geometric interpretation is very similar to that of the symmetric eigenvalue decomposition (Figure 7.14):
代替单次旋转R 的两个正交矩阵称为U和V ,它们的列分别称为u (左奇异向量)和v (右奇异向量)。在这种情况下, S的对角元素称为奇异值,而不是特征值。几何解释与对称特征值分解非常相似(图 7.14 ):
For dimension counters: a general 2 × 2 matrix has 4° of freedom, and the SVD rewrites them as two rotation angles and two scale factors. One more bit is needed to keep track of reflections, but that doesn’t add a dimension.
对于维度计数器:一般的 2 × 2 矩阵有 4° 自由度,而 SVD 会将其重写为两个旋转角度和两个比例因子。需要多一个位来跟踪反射,但这不会增加维度。
Rotate v1 and v2 to the x- and y-axes (the transform by VT).
将v 1和v 2旋转到x轴和y轴(由V T变换)。
Scale in x and y by (σ1,σ2) (the transform by S).
将x和y缩放至 ( σ 1 ,σ 2 )(按S进行变换)。
Rotate the x- and y-axes to u1 and u2 (the transform by U).
将x轴和y轴旋转至u 1和u 2 (按U变换)。
Figure 7.14. What happens when the unit circle is transformed by an arbitrary symmetric matrix A, also known as a non–axis-aligned, nonuniform scale. The two perpendicular vectors v1 and v2, which are the eigenvectors of A, remain fixed in direction but get scaled. In terms of elementary transformations, this can be seen as first rotating the eigenvectors to the canonical basis, doing an axis-aligned scale, and then rotating the canonical basis back to the eigenvectors.
图 7.14。当单位圆被任意对称矩阵A变换时会发生什么情况,也称为非轴对齐、非均匀缩放。两个垂直向量v 1和v 2是A的特征向量,它们的方向保持不变,但会进行缩放。就初等变换而言,这可以看作是首先将特征向量旋转到标准基,进行轴对齐缩放,然后将标准基旋转回特征向量。
The principal difference is between a single rotation and two different orthogonal matrices. This difference causes another, less important, difference. Because the SVD has different singular vectors on the two sides, there is no need for negative singular values: we can always flip the sign of a singular value, reverse the direction of one of the associated singular vectors, and end up with the same transformation again. For this reason, the SVD always produces a diagonal matrix with all positive entries, but the matrices U and V are not guaranteed to be rotations—they could include reflection as well. In geometric applications like graphics, this is an inconvenience, but a minor one: it is easy to differentiate rotations from reflections by checking the determinant, which is +1 for rotations and –1 for reflections, and if rotations are desired, one of the singular values can be negated, resulting in a rotation–scale–rotation sequence where the reflection is rolled in with the scale, rather than with one of the rotations.
主要区别在于单次旋转和两个不同的正交矩阵之间。这种区别导致了另一个不太重要的区别。由于 SVD 在两侧具有不同的奇异向量,因此不需要负奇异值:我们总是可以翻转奇异值的符号,反转其中一个相关奇异向量的方向,并最终再次得到相同的变换。因此,SVD 总是产生一个包含所有正项的对角矩阵,但矩阵U和V不能保证是旋转——它们也可能包括反射。在图形等几何应用中,这会带来不便,但只是小问题:通过检查行列式很容易区分旋转和反射,行列式对于旋转为 +1,对于反射为-1 ,如果需要旋转,可以将其中一个奇异值取反,从而产生旋转-缩放-旋转序列,其中反射与缩放一起滚动,而不是与其中一个旋转一起滚动。
Example 15 The example used in Section 6.4.1 is in fact a shear matrix (Figure 7.15):
例 15 6.4.1 节中所使用的例子实际上是一个剪切矩阵(图 7.15 ):
An immediate consequence of the existence of SVD is that all the 2D transformation matrices we have seen can be made from rotation matrices and scale matrices. Shear matrices are a convenience, but they are not required for expressing transformations.
SVD 存在的直接结果是,我们所见过的所有二维变换矩阵都可以由旋转矩阵和缩放矩阵构成。剪切矩阵是一种便利,但它们并不是表达变换所必需的。
In summary, every matrix can be decomposed via SVD into a rotation times a scale times another rotation. Only symmetric matrices can be decomposed via eigenvalue diagonalization into a rotation times a scale times the inverse-rotation, and such matrices are a simple scale in an arbitrary direction. The SVD of a symmetric matrix will yield the same triple product as eigenvalue decomposition via a slightly more complex algebraic manipulation.
总之,每个矩阵都可以通过 SVD 分解为旋转乘以缩放乘以另一个旋转。只有对称矩阵可以通过特征值对角化分解为旋转乘以缩放乘以逆旋转,并且此类矩阵在任意方向上都是简单的缩放。对称矩阵的 SVD 将通过稍微复杂的代数运算产生与特征值分解相同的三重乘积。
Figure 7.15. Singular Value Decomposition (SVD) for a shear matrix. Any 2D matrix can be decomposed into a product of rotation, scale, rotation. Note that the circular face of the clock must become an ellipse because it is just a rotated and scaled circle.
图 7.15.剪切矩阵的奇异值分解 (SVD)。任何二维矩阵都可以分解为旋转、缩放、旋转的乘积。请注意,时钟的圆形表面必须变成椭圆形,因为它只是一个旋转和缩放的圆。
Another decomposition uses shears to represent nonzero rotations (Paeth, 1990). The following identity allows this:
另一种分解使用剪切来表示非零旋转(Paeth,1990)。以下恒等式允许这样做:
For example, a rotation by π/4 (45°) is (see Figure 7.16)
例如,旋转π/ 4(45°)是(见图7.16 )
This particular transform is useful for raster rotation because shearing is a very efficient raster operation for images; it introduces some jagginess, but will leave no holes. The key observation is that if we take a raster position (i, j) and apply a horizontal shear to it, we get
这种特殊的变换对于光栅旋转非常有用,因为剪切是一种非常有效的图像光栅操作;它会产生一些锯齿状,但不会留下空洞。关键的观察是,如果我们取一个光栅位置 ( i, j ) 并对其应用水平剪切,我们会得到
Figure 7.16. Any 2D rotation can be accomplished by three shears in sequence. In this case, a rotation by 45° is decomposed as shown in Equation 7.2.
图 7.16.任何二维旋转都可以通过连续三次剪切来完成。在这种情况下,45° 旋转分解为公式 7.2 所示。
If we round sj to the nearest integer, this amounts to taking each row in the image and moving it sideways by some amount—a different amount for each row. Because it is the same displacement within a row, this allows us to rotate with no gaps in the resulting image. A similar action works for a vertical shear. Thus, we can implement a simple raster rotation easily.
如果我们将sj四舍五入为最接近的整数,这相当于将图像中的每一行向侧面移动一定量——每行的移动量不同。由于每行内的位移相同,因此我们可以旋转而生成的图像中没有间隙。类似的操作适用于垂直剪切。因此,我们可以轻松实现简单的光栅旋转。
The linear 3D transforms are an extension of the 2D transforms. For example, a scale along Cartesian axes is
线性三维变换是二维变换的扩展。例如,沿笛卡尔轴的缩放是
Rotation is considerably more complicated in 3D than in 2D, because there are more possible axes of rotation. However, if we simply want to rotate about the z-axis, which will only change x- and y-coordinates, we can use the 2D rotation matrix with no operation on z:
3D 中的旋转比 2D 中的旋转复杂得多,因为有更多可能的旋转轴。但是,如果我们只想绕z轴旋转,这只会改变x和y坐标,我们可以使用 2D 旋转矩阵,而无需对z进行任何操作:
Similarly we can construct matrices to rotate about the x-axis and the y-axis:
类似地,我们可以构造绕x轴和y轴旋转的矩阵:
To understand why the minus sign is in the lower left for the y-axis rotation, think of the three axes in a circular sequence: y after x; z after y; x after z.
要理解为什么y轴旋转的左下角有减号,可以将三个轴想象成一个圆形序列: y在x之后; z在y之后; x在z之后。
We will discuss rotations about arbitrary axes in the next section.
我们将在下一节讨论绕任意轴的旋转。
As in two dimensions, we can shear along a particular axis, for example,
就像在二维中一样,我们可以沿特定轴进行剪切,例如,
As with 2D transforms, any 3D transformation matrix can be decomposed using SVD into a rotation, scale, and another rotation. Any symmetric 3D matrix has an eigenvalue decomposition into rotation, scale, and inverse-rotation. Finally, a 3D rotation can be decomposed into a product of 3D shear matrices.
与二维变换一样,任何三维变换矩阵都可以使用 SVD 分解为旋转、缩放和另一个旋转。任何对称三维矩阵都有特征值分解为旋转、缩放和逆旋转。最后,三维旋转可以分解为三维剪切矩阵的乘积。
As in 2D, 3D rotations are orthogonal matrices. Geometrically, this means that the three rows of the matrix are the Cartesian coordinates of three mutually orthogonal unit vectors as discussed in Section 2.4.5. The columns are three, potentially different, mutually orthogonal unit vectors. There are an infinite number of such rotation matrices. Let’s write down such a matrix:
与二维一样,三维旋转也是正交矩阵。从几何学上讲,这意味着矩阵的三行是三个相互正交的单位向量的笛卡尔坐标,如第 2.4.5 节所述。列是三个可能不同的相互正交的单位向量。这样的旋转矩阵有无数个。让我们写下这样一个矩阵:
Here, u = xux + yuy + zuz and so on for v and w. Since the three vectors are orthonormal, we know that
这里, u = x u x + y u y + z u z ,对于v和w ,依此类推。由于这三个向量是正交的,因此我们知道
We can infer some of the behavior of the rotation matrix by applying it to the vectors u, v and w. For example,
我们可以通过将旋转矩阵应用于向量u 、 v和w来推断其部分行为。例如,
Note that those three rows of Ruvwu are all dot products:
请注意,R uvw u 的这三行都是点积:
Similarly, Ruvwv = y, and Ruvww = z. So Ruvw takes the basis uvw to the corresponding Cartesian axes via rotation.
类似地, R uvw v = y ,且R uvw w = z 。因此R uvw通过旋转将基uvw移至相应的笛卡尔坐标轴。
If Ruvw is a rotation matrix with orthonormal rows, then RTuvw is also a rotation matrix with orthonormal columns and in fact is the inverse of Ruvw (the inverse of an orthogonal matrix is always its transpose). An important point is that for transformation matrices, the algebraic inverse is also the geometric inverse. So if Ruvw takes u to x, then RTuvw takes x to u. The same should be true of v and y as we can confirm:
如果R uvw是具有正交行的旋转矩阵,则R T uvw也是具有正交列的旋转矩阵,并且实际上是R uvw的逆(正交矩阵的逆始终是其转置)。重要的一点是,对于变换矩阵,代数逆也是几何逆。因此,如果R uvw将u转换为x ,则R T uvw将x转换为u 。我们可以确认, v和y也应如此:
So we can always create rotation matrices from orthonormal bases.
所以我们总是可以从正交基创建旋转矩阵。
If we wish to rotate about an arbitrary vector a, we can form an orthonormal basis with w = a, rotate that basis to the canonical basis xyz, rotate about the z-axis, and then rotate the canonical basis back to the uvw basis. In matrix form, to rotate about the w-axis by an angle ϕ:
如果我们希望绕任意向量a旋转,我们可以用w = a形成一个正交基,将该基旋转到标准基xyz ,绕z轴旋转,然后将标准基旋转回uvw基。以矩阵形式表示,绕w轴旋转一个角度 ϕ:
Here, we have w a unit vector in the direction of a (i.e., a divided by its own length). But what are u and v? A method to find reasonable u and v is given in Section 2.4.6.
这里,我们有w一个沿a方向的单位向量(即a除以其自身长度)。但是u和v是什么?第 2.4.6 节给出了找到合理u和v 的方法。
If we have a rotation matrix and we wish to have the rotation in axis-angle form, we can compute the one real eigenvalue (which will be λ = 1), and the corresponding eigenvector is the axis of rotation. This is the one axis that is not changed by the rotation.
如果我们有一个旋转矩阵,并且希望以轴角形式进行旋转,我们可以计算一个实特征值(λ = 1),相应的特征向量就是旋转轴。这是旋转后不会改变的唯一轴。
See Section 16.2.2 for a comparison of the few most-used ways to represent rotations, besides rotation matrices.
除了旋转矩阵之外,请参见第 16.2.2 节以比较几种最常用的表示旋转的方式。
While most 3D vectors we use represent positions (offset vectors from the origin) or directions, such as where light comes from, some vectors represent surface normals. Surface normal vectors are perpendicular to the tangent plane of a surface. These normals do not transform the way we would like when the underlying surface is transformed. For example, if the points of a surface are transformed by a matrix M, a vector t that is tangent to the surface and is multiplied by M will be tangent to the transformed surface. However, a surface normal vector n that is transformed by M may not be normal to the transformed surface (Figure 7.17).
虽然我们使用的大多数 3D 矢量表示位置(相对于原点的偏移矢量)或方向(例如光线来自哪里),但有些矢量表示表面法线。表面法线矢量垂直于表面的切平面。当底层表面变换时,这些法线不会按照我们希望的方式变换。例如,如果表面的点由矩阵M变换,则与表面相切并与M相乘的矢量t将与变换后的表面相切。但是,由M变换的表面法线矢量n可能不垂直于变换后的表面(图 7.17 )。
We can derive a transform matrix N which does take n to a vector perpendicular to the transformed surface. One way to attack this issue is to note that a surface normal vector and a tangent vector are perpendicular, so their dot product is zero, which is expressed in matrix form as
我们可以导出一个变换矩阵N ,它将n转换为垂直于变换表面的向量。解决这个问题的一种方法是注意表面法向量和切向量是垂直的,因此它们的点积为零,以矩阵形式表示为
If we denote the desired transformed vectors as tM = Mt and nN = Nn, our goal is to find N such that . We can find N by some algebraic tricks.
如果我们将所需的变换向量表示为t M = Mt和n N = Nn ,我们的目标是找到N使得n否电视吨米= 0 。我们可以通过一些代数技巧来找到N。
Figure 7.17. When a normal vector is transformed using the same matrix that transforms the points on an object, the resulting vector may not be perpendicular to the surface as is shown here for the sheared rectangle. The tangent vector, however, does transform to a vector tangent to the transformed surface.
图 7.17。当使用与变换对象上的点相同的矩阵来变换法向量时,得到的向量可能不垂直于表面,如这里显示的剪切矩形。但是,切向量会变换为与变换后的表面相切的向量。
First, we can sneak an identity matrix into the dot product and then take advantage of M–1M = I:
首先,我们可以将一个单位矩阵偷偷放入点积中,然后利用M –1 M = I :
Although the manipulations above don’t obviously get us anywhere, note that we can add parentheses that make the above expression more obviously a dot product:
虽然上述操作显然没有什么结果,但请注意,我们可以添加括号,使上述表达式更明显地成为点积:
This means that the row vector that is perpendicular to tM is the left part of the expression above. This expression holds for any of the tangent vectors in the tangent plane. Since there is only one direction in 3D (and its opposite) that is perpendicular to all such tangent vectors, we know that the left part of the expression above must be the row vector expression for nN ; i.e., it is nTN , so this allows us to infer N:
这意味着垂直于t M的行向量是上述表达式的左边部分。此表达式适用于切平面中的任何切向量。由于在三维空间中只有一个方向(及其对立方向)垂直于所有此类切向量,因此我们知道上述表达式的左边部分必须是n N 的行向量表达式;即n T N ,因此我们可以推断N :
so we can take the transpose of that to get
所以我们可以取它的转置得到
Therefore, we can see that the matrix that correctly transforms normal vectors so they remain normal is N = (M–1)T, i.e., the transpose of the inverse matrix. Since this matrix may change the length of n, we can multiply it by an arbitrary scalar and it will still produce nN with the right direction. Recall from Section 6.3 that the inverse of a matrix is the transpose of the cofactor matrix divided by the determinant. Because we don’t care about the length of a normal vector, we can skip the division and find that for a 3 × 3 matrix,
因此,我们可以看出,正确变换法向量使其保持正常的矩阵是N = ( M –1 ) T ,即逆矩阵的转置。由于此矩阵可能会改变n的长度,我们可以将其乘以任意标量,它仍将以正确的方向产生n N 。回想一下第 6.3 节,矩阵的逆是余因子矩阵的转置除以行列式。因为我们不关心法向量的长度,所以我们可以跳过除法,并发现对于 3 × 3 矩阵,
This assumes the element of M in row i and column j is mij. So the full expression for N is
假设M在第i行和第j列的元素为m ij 。因此N 的完整表达式为
We have been looking at methods to change vectors using a matrix M. In two dimensions, these transforms have the form
我们一直在研究使用矩阵M来改变向量的方法。在二维中,这些变换的形式为
We cannot use such transforms to move objects, only to scale and rotate them. In particular, the origin (0, 0) always remains fixed under a linear transformation. To move, or translate, an object by shifting all its points the same amount, we need a transform of the form
我们不能使用这样的变换来移动物体,只能缩放和旋转它们。特别是,原点 (0, 0) 在线性变换下始终保持不变。要通过将物体的所有点移动相同的量来移动或平移物体,我们需要以下形式的变换
There is just no way to do that by multiplying (x, y) by a 2 × 2 matrix. One possibility for adding translation to our system of linear transformations is to simply associate a separate translation vector with each transformation matrix, letting the matrix take care of scaling and rotation and the vector take care of translation. This is perfectly feasible, but the bookkeeping is awkward and the rule for composing two transformations is not as simple and clean as with linear transformations.
通过将 ( x, y ) 乘以 2 × 2 矩阵根本无法实现这一点。将平移添加到我们的线性变换系统的一种可能性是简单地将单独的平移向量与每个变换矩阵关联起来,让矩阵负责缩放和旋转,让向量负责平移。这完全可行,但记账很麻烦,而且组合两个变换的规则不像线性变换那样简单明了。
Instead, we can use a clever trick to get a single matrix multiplication to do both operations together. The idea is simple: represent the point (x, y) by a 3D vector [x y 1]T, and use 3 × 3 matrices of the form
相反,我们可以使用一个巧妙的技巧,通过一个矩阵乘法来同时执行两个运算。这个想法很简单:用 3D 向量 [ xy 1] T表示点 ( x, y ),并使用以下形式的 3 × 3 矩阵
The fixed third row serves to copy the 1 into the transformed vector, so that all vectors have a 1 in the last place, and the first two rows compute x and y as linear combinations of x, y,and 1:
固定的第三行用于将 1 复制到变换后的向量中,以便所有向量的最后一个位置都有一个 1,前两行计算x和y作为x 、 y和 1 的线性组合:
The single matrix implements a linear transformation followed by a translation! This kind of transformation is called an affine transformation, and this way of implementing affine transformations by adding an extra dimension is called homogeneous coordinates (Roberts, 1965; Riesenfeld, 1981; Penna & Patterson, 1986). Homogeneous coordinates not only clean up the code for transformations, but this scheme also makes it obvious how to compose two affine transformations: simply multiply the matrices.
单个矩阵实现了线性变换,然后进行了平移!这种变换称为仿射变换,而通过添加额外维度来实现仿射变换的方式称为齐次坐标(Roberts,1965;Riesenfeld,1981;Penna & Patterson,1986)。齐次坐标不仅简化了变换代码,而且该方案还明确了如何组合两个仿射变换:只需将矩阵相乘即可。
A problem with this new formalism arises when we need to transform vectors that are not supposed to be positions—they represent directions or offsets between positions. Vectors that represent directions or offsets should not change when we translate an object. Fortunately, we can arrange for this by setting the third coordinate to zero:
当我们需要转换不属于位置的向量时,这种新形式主义就会出现问题——它们表示位置之间的方向或偏移。当我们平移对象时,表示方向或偏移的向量不应该改变。幸运的是,我们可以通过将第三个坐标设置为零来实现这一点:
If there is a scaling/rotation transformation in the upper-left 2 × 2 entries of the matrix, it will apply to the vector, but the translation still multiplies with the zero and is ignored. Furthermore, the zero is copied into the transformed vector, so direction vectors remain direction vectors after they are transformed.
如果矩阵左上角的 2×2 项中有缩放/旋转变换,它将应用于向量,但平移仍会与零相乘并被忽略。此外,零会被复制到变换后的向量中,因此方向向量在变换后仍然是方向向量。
This gives an explanation for the name “homogeneous:” translation, rotation, and scaling of positions and directions all fit into a single system.
这解释了“同质”这一名称的含义:位置和方向的平移、旋转和缩放都适合单一系统。
This is exactly the behavior we want for vectors, so they fit smoothly into the system: the extra (third) coordinate will be either 1 or 0 depending on whether we are encoding a position or a direction. We actually do need to store the homoge-neous coordinate so we can distinguish between locations and other vectors. For example,
这正是我们想要的向量行为,因此它们可以顺利地融入系统:额外的(第三个)坐标将是 1 或 0,具体取决于我们是在编码位置还是方向。我们实际上确实需要存储同质坐标,以便我们能够区分位置和其他向量。例如,
Later, when we do perspective viewing, we will see that it is useful to allow the homogeneous coordinate to take on values other than one or zero.
稍后,当我们进行透视观察时,我们将看到允许齐次坐标采用除一或零之外的其他值是有用的。
Homogeneous coordinates are used nearly universally to represent transformations in graphics systems. In particular, homogeneous coordinates underlie the design and operation of renderers implemented in graphics hardware. We will see in Chapter 8 that homogeneous coordinates also make it easy to draw scenes in perspective, another reason for their popularity.
齐次坐标几乎普遍用于表示图形系统中的变换。具体而言,齐次坐标是图形硬件中实现的渲染器的设计和操作的基础。我们将在第 8 章中看到,齐次坐标还使透视场景的绘制变得容易,这是其受欢迎的另一个原因。
Homogeneous coordinates are also ubiquitous in computer vision.
齐次坐标在计算机视觉中也普遍存在。
Homogeneous coordinates can be considered just a clever way to handle the bookkeeping for translation, but there is also a different, geometric interpretation. The key observation is that when we do a 3D shear based on the z-coordinate, we get this transform:
齐次坐标可以被认为是一种处理平移簿记的巧妙方法,但也存在不同的几何解释。关键的观察是,当我们基于z坐标进行 3D 剪切时,我们得到了以下变换:
Note that this almost has the form we want in x and y for a 2D translation, but has a z hanging around that doesn’t have a meaning in 2D. Now comes the key decision: we will add a coordinate z = 1 to all 2D locations. This gives us
请注意,这几乎具有我们想要的 2D 平移x和y 的形式,但有一个z在 2D 中没有意义。现在到了关键的决定:我们将在所有 2D 位置添加一个坐标 z = 1。这给了我们
By associating a (z = 1)-coordinate with all 2D points, we now can encode translations into matrix form. For example, to first translate in 2D by (xt ,yt) and then rotate by angle ϕ we would use the matrix
通过将 ( z = 1) 坐标与所有 2D 点关联起来,我们现在可以将平移编码为矩阵形式。例如,要先在 2D 中平移 ( xt , yt ),然后旋转角度 ϕ,我们将使用矩阵
Note that the 2D rotation matrix is now 3 × 3 with zeros in the “translation slots.” With this type of formalism, which uses shears along z = 1 to encode translations, we can represent any number of 2D shears, 2D rotations, and 2D translations as one composite 3D matrix. The bottom row of that matrix will always be (0, 0, 1) , so we don’t really have to store it. We just need to remember it is there when we multiply two matrices together.
请注意,二维旋转矩阵现在是 3 × 3,其中“平移槽”为零。通过这种使用沿z = 1 的剪切来编码平移的形式,我们可以将任意数量的二维剪切、二维旋转和二维平移表示为一个复合三维矩阵。该矩阵的底行始终为 (0, 0, 1) ,因此我们实际上不必存储它。我们只需记住在将两个矩阵相乘时它就在那里。
In 3D, the same technique works: we can add a fourth coordinate, a homogeneous coordinate, and then, we have translations:
在 3D 中,同样的技术也有效:我们可以添加第四个坐标,即齐次坐标,然后我们就有了平移:
Again, for a direction vector, the fourth coordinate is zero and the vector is thus unaffected by translations.
同样,对于方向向量,第四个坐标为零,因此该向量不受平移的影响。
Example 16 (Windowing transformations) Often in graphics, we need to create a transform matrix that takes points in the rectangle . This can be accomplished with a single scale and translate in sequence. However, it is more intuitive to create the transform from a sequence of three operations (Figure 7.18):
实施例 16(窗口变换)在图形学中,我们经常需要创建一个变换矩阵,该矩阵取矩形 [x1′xh′]×[y1′yh′] 中的点。这可以通过一次缩放和按顺序平移来实现。但是,从三个操作序列创建变换更为直观(图 7.18 ):
Move the point (xl ,yl) to the origin.
将点 (x,y) 移动到原点。
Scale the rectangle to be the same size as the target rectangle.
将矩形缩放到与目标矩形相同的大小。
Move the origin to point (xl ,yl) .
将原点移动到点 (x,y) 。
Figure 7.18. To take one rectangle (window) to the other, we first shift the lower-left corner to the origin, then scale it to the new size, and then move the origin to the lower-left corner of the target rectangle.
图 7.18.要将一个矩形(窗口)移到另一个矩形(窗口),我们首先将左下角移到原点,然后将其缩放到新的大小,然后将原点移动到目标矩形的左下角。
Remembering that the right-hand matrix is applied first, we can write
记住,首先应用的是右侧矩阵,因此我们可以写出
It is perhaps not surprising to some readers that the resulting matrix has the form it does, but the constructive process with the three matrices leaves no doubt as to the correctness of the result.
对于一些读者来说,也许结果矩阵具有这样的形式并不奇怪,但这三个矩阵的构造过程毫无疑问是结果的正确性。
An exactly analogous construction can be used to define a 3D windowing transformation, which maps the box [xl ,xh] × [yl ,yh] × [zl ,zh] to the box
可以使用完全类似的构造来定义 3D 窗口变换,将框 [x ,xh] × [y ,yh] × [z ,zh] 映射到框
It is interesting to note that if we multiply an arbitrary matrix composed of scales, shears, and rotations with a simple translation (translation comes second), we get
有趣的是,如果我们将由缩放、剪切和旋转组成的任意矩阵与简单的平移(平移排在第二位)相乘,我们会得到
Thus, we can look at any matrix and think of it as a scaling/rotation part and a translation part because the components are nicely separated from each other.
因此,我们可以查看任何矩阵并将其视为缩放/旋转部分和平移部分,因为各个组件彼此之间很好地分离。
An important class of transforms are rigid-body transforms. These are composed only of translations and rotations, so they have no stretching or shrinking of the objects. Such transforms will have a pure rotation for the aij above.
一类重要的变换是刚体变换。这些变换仅由平移和旋转组成,因此它们不会拉伸或收缩物体。此类变换对于上面的a ij具有纯旋转。
While we can always invert a matrix algebraically, we can use geometry if we know what the transform does. For example, the inverse of scale(sx, sy, sz) is scale(1/sx, 1/sy, 1/sz) . The inverse of a rotation is the same rotation with the opposite sign on the angle. The inverse of a translation is a translation in the opposite direction. If we have a series of matrices M = M1M2... Mn, then .
虽然我们总是可以用代数方法求逆矩阵,但如果我们知道变换的作用,我们可以使用几何方法。例如,scale( s x , s y , s z )的逆是scale(1/ s x , 1 /s y , 1/ s z )。旋转的逆是相同的旋转,但角度符号相反。平移的逆是反方向的平移。如果我们有一系列矩阵M = M 1 M 2 ... M n ,那么米− 1 =米n − 1 ...米2 − 1米1 − 1 。
Also, certain types of transformation matrices are easy to invert. We’ve already mentioned scales, which are diagonal matrices; the second important example is rotations, which are orthogonal matrices. Recall (Section 6.2.4) that the inverse of an orthogonal matrix is its transpose. This makes it easy to invert rotations and rigid body transformations (see Exercise 6). Also, it’s useful to know that a matrix with [0 0 0 1] in the bottom row has an inverse that also has [0 0 0 1] in the bottom row (see Exercise 7).
此外,某些类型的变换矩阵很容易求逆。我们已经提到过尺度矩阵,它们是对角矩阵;第二个重要的例子是旋转矩阵,它们是正交矩阵。回想一下(第 6.2.4 节),正交矩阵的逆是其转置。这使得求逆旋转和刚体变换变得很容易(参见练习 6)。此外,了解底行包含 [0 0 0 1] 的矩阵的逆也包含底行包含 [0 0 0 1] 是很有用的(参见练习 7)。
Interestingly, we can use SVD to invert a matrix as well. Since we know that any matrix can be decomposed into a rotation times a scale times a rotation, inversion is straightforward. For example, in 3D we have
有趣的是,我们也可以使用 SVD 来反转矩阵。由于我们知道任何矩阵都可以分解为旋转乘以缩放乘以旋转,因此反转很简单。例如,在 3D 中我们有
and from the rules above, it follows easily that
根据上述规则,很容易得出
All of the previous discussion has been in terms of using transformation matrices to move points around. We can also think of them as simply changing the coordinate system in which the point is represented. For example, in Figure 7.19, we see two ways to visualize a movement. In different contexts, either interpretation may be more suitable.
前面的所有讨论都是关于使用变换矩阵来移动点的。我们也可以将它们视为简单地改变表示点的坐标系。例如,在图 7.19中,我们看到了两种可视化移动的方式。在不同的情况下,任何一种解释可能都更合适。
Figure 7.19. The point (2,1) has a transform “translate by (–1,0)” applied to it. On the top right is our mental image if we view this transformation as a physical movement, and on the bottom right is our mental image if we view it as a change of coordinates (a movement of the origin in this case). The artificial boundary is just an artifice, and the relative position of the axes and the point are the same in either case.
图 7.19。点 (2,1) 被应用了“平移 ( -1,0 )”变换。如果我们将此变换视为物理运动,则右上角是我们的心理图像;如果我们将其视为坐标的变化(在本例中为原点的移动),则右下角是我们的心理图像。人为的边界只是一种假象,无论哪种情况,轴和点的相对位置都是相同的。
For example, a driving game may have a model of a city and a model of a car. If the player is presented with a view out the windshield, objects inside the car are always drawn in the same place on the screen, while the streets and buildings appear to move backward as the player drives. On each frame, we apply a transformation to these objects that moves them farther back than on the previous frame. One way to think of this operation is simply that it moves the buildings backward; another way to think of it is that the buildings are staying put but the coordinate system in which we want to draw them—which is attached to the car—is moving. In the second interpretation, the transformation is changing the coordinates of the city geometry, expressing them as coordinates in the car’s coordinate system. Both ways will lead to exactly the same matrix that is applied to the geometry outside the car.
例如,赛车游戏可能有一个城市模型和一个汽车模型。如果向玩家展示挡风玻璃外的视图,汽车内的物体总是绘制在屏幕上的同一位置,而街道和建筑物似乎在玩家驾驶时向后移动。在每一帧上,我们都对这些物体应用变换,使它们比前一帧向后移动得更远。一种思考此操作的方式是简单地将建筑物向后移动;另一种思考方式是建筑物保持不变,但我们想要绘制它们的坐标系(与汽车相连)在移动。在第二种解释中,变换是改变城市几何的坐标,将它们表示为汽车坐标系中的坐标。这两种方式都将导致应用于汽车外部几何的完全相同的矩阵。
If the game also supports an overhead view to show where the car is in the city, the buildings and streets need to be drawn in fixed positions while the car needs to move from frame to frame. The same two interpretations apply: we can think of the changing transformation as moving the car from its canonical position to its current location in the world; or we can think of the transformation as simply changing the coordinates of the car’s geometry, which is originally expressed in terms of a coordinate system attached to the car, to express them instead in a coordinate system fixed relative to the city. The change-of-coordinates interpretation makes it clear that the matrices used in these two modes (city-to-car coordinate change vs. car-to-city coordinate change) are inverses of one another.
如果游戏还支持俯视图来显示汽车在城市中的位置,则建筑物和街道需要绘制在固定位置,而汽车则需要逐帧移动。同样的两种解释适用:我们可以将变换视为将汽车从其标准位置移动到其在世界上的当前位置;或者我们可以将变换视为简单地改变汽车几何的坐标,该坐标最初以附在汽车上的坐标系表示,而是以相对于城市固定的坐标系表示。坐标变换的解释清楚地表明,这两种模式(城市到汽车的坐标变化与汽车到城市的坐标变化)中使用的矩阵是彼此的逆。
The idea of changing coordinate systems is much like the idea of type conversions in programming. Before we can add a floating-point number to an integer, we need to convert the integer to floating point or the floating-point number to an integer, depending on our needs, so that the types match. And before we can draw the city and the car together, we need to convert the city to car coordinates or the car to city coordinates, depending on our needs, so that the coordinates match.
改变坐标系的思想与编程中的类型转换思想非常相似。在将浮点数添加到整数之前,我们需要根据需要将整数转换为浮点数或将浮点数转换为整数,以使类型匹配。在将城市和汽车绘制在一起之前,我们需要根据需要将城市坐标转换为汽车坐标或将汽车坐标转换为城市坐标,以使坐标匹配。
When managing multiple coordinate systems, it’s easy to get confused and wind up with objects in the wrong coordinates, causing them to show up in unexpected places. But with systematic thinking about transformations between coordinate systems, you can reliably get the transformations right.
管理多个坐标系时,很容易混淆,导致对象处于错误的坐标中,从而导致它们出现在意想不到的地方。但是,通过系统地思考坐标系之间的转换,您可以可靠地进行正确的转换。
Geometrically, a coordinate system, or coordinate frame, consists of an origin and a basis—a set of three vectors. Orthonormal bases are so convenient that we’ll normally assume frames are orthonormal unless otherwise specified. In a frame with origin p and basis {u, v, w}, the coordinates (u, v, w) describe the point
从几何学上讲,坐标系或坐标框架由原点和基(一组三个向量)组成。正交基非常方便,因此除非另有说明,否则我们通常会假设框架是正交的。在原点为p 、基为{ u , v , w }的框架中,坐标 ( u, v, w ) 描述点
In 2D, of course, there are two basis vectors.
当然,在二维中有两个基向量。
When we store these vectors in the computer, they need to be represented in terms of some coordinate system. To get things started, we have to designate some canonical coordinate system, often called “global” or “world” coordinates, which is used to describe all other systems. In the city example, we might adopt the street grid and use the convention that the x-axis points along Main Street, the y-axis points up, and the z-axis points along Central Avenue. Then, when we write the origin and basis of the car frame in terms of these coordinates, it is clear what we mean.
当我们将这些向量存储在计算机中时,需要用某种坐标系来表示它们。首先,我们必须指定某种规范坐标系,通常称为“全局”或“世界”坐标,用于描述所有其他系统。在城市示例中,我们可能采用街道网格,并使用x轴指向主街、 y轴指向上方、 z轴指向中央大道的惯例。然后,当我们用这些坐标系写出汽车框架的原点和基准时,我们的意思就很清楚了。
In 2D our convention, it is to use the point o for the origin, and x and y for
在二维中,我们的惯例是使用点o作为原点, x和y表示
In 2D, right-handed means y is counterclockwise from x. the right-handed orthonormal basis vectors x and y (Figure 7.20).
在二维中,右手系表示y从x开始逆时针旋转。右手系正交基向量x和y (图 7.20 )。
Figure 7.20. The point p can be represented in terms of either coordinate system.
图 7.20.点p可以用任一坐标系来表示。
Another coordinate system might have an origin e and right-handed orthonormal basis vectors u and v. Note that typically the canonical data o, x, and y are never stored explicitly. They are the frame-of-reference for all other coordinate systems. In that coordinate system, we often write down the location of p as an ordered pair, which is shorthand for a full vector expression:
另一个坐标系可能有一个原点e和右手正交基向量u和v 。请注意,通常不会显式存储规范数据o 、 x和y 。它们是所有其他坐标系的参考系。在该坐标系中,我们经常将p的位置写成有序对,这是完整向量表达式的简写:
For example, in Figure 7.20, (xp, yp) = (2.5, 0.9) . Note that the pair (xp, yp) implicitly assumes the origin o. Similarly, we can express p in terms of another equation:
例如,在图 7.20中,( x p , y p ) = (2.5, 0.9)。请注意,对 ( x p , y p ) 隐式假设了原点o 。类似地,我们可以用另一个方程来表示p :
In Figure 7.20, this has (up, vp) = (0.5, –0.7). Again, the origin e is left as an implicit part of the coordinate system associated with u and v.
在图 7.20中, ( u p , v p ) = (0.5, – 0.7)。同样,原点e保留为与u和v关联的坐标系的隐式部分。
We can express this same relationship using matrix machinery, like this:
我们可以使用矩阵机制来表达同样的关系,如下所示:
Note that this assumes we have the point e and vectors u and v stored in canonical coordinates; the (x, y)-coordinate system is the first among equals. In terms of the basic types of transformations we’ve discussed in this chapter, this is a rotation (involving u and v) followed by a translation (involving e). Looking at the matrix for the rotation and translation together, you can see it’s very easy to write down: we just put u, v, and e into the columns of a matrix, with the usual [0 0 1] in the third row. To make this even clearer, we can write the matrix like this:
请注意,这假设我们将点e和向量u和v存储在标准坐标中;( x, y ) 坐标系是同类坐标系中的第一个。就本章讨论的基本变换类型而言,这是一个旋转(涉及u和v ),然后是一个平移(涉及e )。一起查看旋转和平移的矩阵,您会发现它很容易写下来:我们只需将u 、 v和e放入矩阵的列中,第三行通常为 [0 0 1]。为了更清楚起见,我们可以像这样写矩阵:
The name “frame-to-canonical” is based on thinking about changing the coordinates of a vector from one system to another. Thinking in terms of moving vectors around, the frame-to-canonical matrix maps the canonical frame to the (u,v) frame.
“框架到规范”这个名称是基于将向量的坐标从一个系统更改为另一个系统的思考。从移动向量的角度思考,框架到规范矩阵将规范框架映射到 ( u , v ) 框架。
We call this matrix the frame-to-canonical matrix for the (u, v) frame. It takes points expressed in the (u, v) frame and converts them to the same points expressed in the canonical frame.
我们将这个矩阵称为 ( u , v ) 框架的框架到规范矩阵。它将 ( u, v ) 框架中表达的点转换为规范框架中表达的相同点。
To go in the other direction, we have
为了实现另一个方向,我们
This is a translation followed by a rotation; they are the inverses of the rotation and translation we used to build the frame-to-canonical matrix, and when multiplied together, they produce the inverse of the frame-to-canonical matrix, which is (not surprisingly) called the canonical-to-frame matrix:
这是平移后跟旋转;它们是我们用来构建框架到规范矩阵的旋转和平移的逆,当将它们相乘时,它们会产生框架到规范矩阵的逆,该矩阵(毫不奇怪地)被称为规范到框架矩阵:
The canonical-to-frame matrix takes points expressed in the canonical frame and converts them to the same points expressed in the (u,v) frame. We have written this matrix as the inverse of the frame-to-canonical matrix because it can’t immediately be written down using the canonical coordinates of e, u, and v. But remember that all coordinate systems are equivalent; it’s only our convention of storing vectors in terms of x- and y-coordinates that creates this seeming asymmetry. The canonical-to-frame matrix can be expressed simply in terms of the (u, v) coordinates of o, x,and y:
标准到框架矩阵取标准框架中表示的点,并将它们转换为 ( u , v ) 框架中表示的相同点。我们将此矩阵写为框架到标准矩阵的逆,因为它不能立即用e 、 u和v的标准坐标写下来。但请记住,所有坐标系都是等价的;只是我们用x和y坐标存储向量的惯例造成了这种表面上的不对称。标准到框架矩阵可以简单地用o 、 x和y的 ( u , v ) 坐标来表示:
All these ideas work strictly analogously in 3D, where we have
所有这些想法在三维空间中都是严格类似的,我们有
and
和
Can’t I just hardcode transforms rather than use the matrix formalisms?
我不能只硬编码转换而不是使用矩阵形式吗?
Yes, but in practice it is harder to derive, harder to debug, and not any more efficient. Also, all current graphics APIs use this matrix formalism so it must be understood even to use graphics libraries.
是的,但实际上它更难推导,更难调试,而且效率也不高。此外,所有当前图形 API 都使用这种矩阵形式,因此即使使用图形库也必须理解它。
The bottom row of the matrix is always (0,0,0,1). Do I have to store it?
矩阵的底行始终为 (0,0,0,1)。我必须存储它吗?
You do not have to store it unless you include perspective transforms (Chapter 8).
除非包含透视变换(第 8 章),否则您不必存储它。
The derivation of the transformation properties of normals is based on Properties of Surface Normal Transformations (Turkowski, 1990). In many treatments through the mid-1990s, vectors were represented as row vectors and premulti-plied, e.g., b = aM. In our notation, this would be bT = aTMT. If you want to find a rotation matrix R that takes one vector a to a vector b of the same length: b = Ra, you could use two rotations constructed from orthonormal bases. A more efficient method is given in Efficiently Building a Matrix to Rotate One Vector to Another (Akenine-Möller, Haines, & Hoffman, 2008).
法线变换性质的推导基于《表面法线变换性质》 (Turkowski,1990 年)。在 20 世纪 90 年代中期的许多处理中,向量被表示为行向量并预乘,例如b = aM 。在我们的符号中,这将是b T = a T M T 。如果您想找到一个旋转矩阵R ,将一个向量a转换为长度相同的向量b : b = Ra ,您可以使用由正交基构造的两个旋转。在《有效地构建一个矩阵以将一个向量旋转到另一个向量》 (Akenine-Möller、Haines 和 Hoffman,2008 年)中给出了一种更有效的方法。
1. Write down the 4 × 4 3D matrix to move by (xm, ym, zm).
1.写下要移动的 4×4 三维矩阵( xm , ym , zm )。
2. Write down the 4 × 4 3D matrix to rotate by an angle θ about the y-axis.
2.写下 4 × 4 三维矩阵,绕y轴旋转角度θ 。
3. Write down the 4 × 4 3D matrix to scale an object by 50% in all directions.
3.写下 4×4 3D 矩阵,将对象在所有方向上缩放 50%。
4. Write the 2D rotation matrix that rotates by 90° clockwise.
4.写出顺时针旋转90°的二维旋转矩阵。
5. Write the matrix from Exercise 4 as a product of three shear matrices.
5.将练习 4 中的矩阵写为三个剪切矩阵的乘积。
6. Find the inverse of the rigid body transformation:
6.求出刚体变换的逆:
where R is a 3 × 3 rotation matrix and t is a 3-vector.
其中R是 3×3 旋转矩阵, t是 3 向量。
7. Show that the inverse of the matrix for an affine transformation (one that has all zeros in the bottom row except for a one in the lower right entry) also has the same form.
7.证明仿射变换矩阵的逆(除了右下方的元素为 1 以外,底行其他元素均为 0)也具有相同的形式。
8. Describe in words what this 2D transform matrix does:
8.用文字描述这个二维变换矩阵的作用:
9. Write down the 3 × 3 matrix that rotates a 2D point by angle θ about a point p = (xp, yp) .
9.写下 3 × 3 矩阵,该矩阵围绕点p = ( x p , y p ) 旋转角度θ 。
10. Write down the 4 × 4 rotation matrix that takes the orthonormal 3D vectors u = (xu, yu, zu), v = (xv, yv, zv), and w = (xw, yw, zw), to orthonormal 3D vectors a = (xa, ya, za), b = (xb, yb, zb), and c = (xc, yc, zc), So M u = a, M v = b,and M w = c.
10.写下 4 × 4 旋转矩阵,将正交三维向量u = ( x u , y u , z u )、 v = ( x v , y v , z v ) 和w = ( x w , y w , z w ) 转换为正交三维向量a = ( x a , y a , z a )、 b = ( x b , y b , z b ) 和c = ( x c , y c , z c ),因此M u = a 、 M v = b和M w = c 。
11. What is the inverse matrix for the answer to the previous problem?
11.上一个问题答案的逆矩阵是什么?
In the previous chapter, we saw how to use matrix transformations as a tool for arranging geometric objects in 2D or 3D space. A second important use of geometric transformations is in moving objects between their 3D locations and their positions in a 2D view of the 3D world. This 3D to 2D mapping is called a viewing transformation, and it plays an important role in object-order rendering, in which we need to rapidly find the image-space location of each object in the scene.
在上一章中,我们了解了如何使用矩阵变换作为在二维或三维空间中排列几何对象的工具。几何变换的第二个重要用途是在三维位置和三维世界的二维视图中移动对象。这种三维到二维的映射称为视点变换,它在对象顺序渲染中起着重要作用,在渲染中我们需要快速找到场景中每个对象的图像空间位置。
When we studied ray tracing in Chapter 4, we covered the different types of perspective and orthographic views and how to generate viewing rays according to any given view. This chapter is about the inverse of that process. Here, we explain how to use matrix transformations to express any parallel or perspective view. The transformations in this chapter project 3D points in the scene (world space) to 2D points in the image (image space), and they will project any point on a given pixel’s viewing ray back to that pixel’s position in image space.
当我们在第 4 章学习光线追踪时,我们介绍了不同类型的透视和正交视图以及如何根据任何给定的视图生成视线。本章介绍该过程的逆过程。在这里,我们解释如何使用矩阵变换来表达任何平行或透视视图。本章中的变换将场景(世界空间)中的 3D 点投影到图像(图像空间)中的 2D 点,并且它们会将给定像素的视线上的任何点投影回该像素在图像空间中的位置。
If you have not looked at it recently, it is advisable to review the discussion of perspective and ray generation in Chapter 4 before reading this chapter.
如果您最近没有看过,建议您在阅读本章之前先回顾一下第 4 章中有关透视和射线生成的讨论。
By itself, the ability to project points from the world to the image is only good for producing wireframe renderings—renderings in which only the edges of objects are drawn, and closer surfaces do not occlude more distant surfaces (Figure 8.1). Just as a ray tracer needs to find the closest surface intersection along each viewing ray, an object-order renderer displaying solid-looking objects has to work out which of the (possibly many) surfaces drawn at any given point on the screen is closest and display only that one. In this chapter, we assume we are drawing a model consisting only of 3D line segments that are specified by the (x, y, z) coordinates of their two endpoints. Later chapters will discuss the machinery needed to produce renderings of solid surfaces.
就其本身而言,将点从现实世界投影到图像的能力仅适用于生成线框渲染 - 在这种渲染中,只绘制物体的边缘,并且较近的表面不会遮挡较远的表面(图 8.1 )。就像光线追踪器需要找到沿每条视线的最近表面交点一样,显示立体物体的对象顺序渲染器必须确定在屏幕上任何给定点绘制的(可能有很多)表面中哪一个最近,并只显示那个。在本章中,我们假设我们正在绘制一个仅由 3D 线段组成的模型,这些线段由其两个端点的( x,y,z )坐标指定。后面的章节将讨论生成立体表面渲染所需的机制。
Figure 8.1. (a) Wireframe cube in orthographic projection. (b) Wireframe cube in perspective projection. (c) Perspective projection with hidden lines removed.
图 8.1。 (a) 正交投影中的线框立方体。 (b) 透视投影中的线框立方体。 (c) 删除隐藏线的透视投影。
The viewing transformation has the job of mapping 3D locations, represented as (x, y, z) coordinates in the canonical coordinate system, to coordinates in the image, expressed in units of pixels. It is a complicated beast that depends on many different things, including the camera position and orientation, the type of projection, the field of view, and the resolution of the image. As with all complicated transformations, it is best approached by breaking it up into a product of several simpler transformations. Most graphics systems do this by using a sequence of three transformations:
观察变换的任务是将标准坐标系中表示为 ( x, y, z ) 坐标的 3D 位置映射到图像中以像素为单位的坐标。这是一个复杂的过程,取决于许多不同的因素,包括相机的位置和方向、投影类型、视野和图像分辨率。与所有复杂的变换一样,最好将其分解为几个更简单的变换的乘积。大多数图形系统通过使用三个变换序列来实现这一点:
A camera transformation or eye transformation, which is a rigid body transformation that places the camera at the origin in a convenient orientation. It depends only on the position and orientation, or pose, of the camera.
相机变换或眼睛变换,这是一种刚体变换,将相机以方便的方向放置在原点。它仅取决于相机的位置和方向或姿势。
A projection transformation, which projects points from camera space so that all visible points fall in the range –1 to 1 in x and y. It depends only on the type of projection desired.
投影变换,从相机空间投影点,使得所有可见点在x和y方向上落在-1到 1 的范围内。它仅取决于所需的投影类型。
A viewport transformation or windowing transformation, which maps this unit image rectangle to the desired rectangle in pixel coordinates. It depends only on the size and position of the output image.
一个视口变换或窗口变换,将单位图像矩形映射到像素坐标中的所需矩形。它仅取决于输出图像的大小和位置。
Some APIs use “viewing transformation” for just the piece of our viewing transformation that we call the camera transformation.
一些 API 使用“视图变换”仅表示我们称之为相机变换的视图变换部分。
To make it easy to describe the stages of the process (Figure 8.2), we give names to the coordinate systems that are the inputs and output of these transformations.
为了便于描述该过程的各个阶段(图 8.2 ),我们为这些变换的输入和输出的坐标系命名。
Figure 8.2. The sequence of spaces and transformations that gets objects from their original coordinates into screen space.
图 8.2.将对象从其原始坐标移到屏幕空间的空间和变换序列。
The camera transformation converts points in canonical coordinates (or world space) to camera coordinates or places them in camera space. The projection transformation moves points from camera space to the canonical view volume. Finally, the viewport transformation maps the canonical view volume to screen space.
相机变换将标准坐标(或世界空间)中的点转换为相机坐标或将它们放置在相机空间中。投影变换将点从相机空间移动到规范视体积。最后,视口变换将规范视体积映射到屏幕空间。
Each of these transformations is individually quite simple. We’ll discuss them in detail for the orthographic case beginning with the viewport transformation and then cover the changes required to support perspective projection.
这些变换中的每一个都非常简单。我们将从视口变换开始,详细讨论正交情况,然后介绍支持透视投影所需的更改。
Other names: camera space is also “eye space,” and the camera transformation is sometimes the “viewing transformation;” the canonical view volume is also “clip space” or “normalized device coordinates;” screen space is also “pixel coordinates.”
其他名称:相机空间也称为“眼睛空间”,相机变换有时也称为“观看变换”;规范视图体积也称为“剪辑空间”或“标准化设备坐标”;屏幕空间也称为“像素坐标”。
We begin with a problem whose solution will be reused for any viewing condition. We assume that the geometry we want to view is in the canonical view volume, and we wish to view it with an orthographic camera looking in the – z direction. The canonical view volume is the cube containing all 3D points whose Cartesian coordinates are between –1 and +1—that is, (x, y, z) ∈ [–1, 1]3 (Figure 8.3). We project x = –1 to the left side of the screen, x = +1 to the right side of the screen, y = –1 to the bottom of the screen, and y = +1 to the top of the screen.
我们从一个问题开始,该问题的解决方案可在任何查看条件下重复使用。我们假设要查看的几何体位于规范视图体积中,并且我们希望使用朝 -z方向观察的正交相机来查看它。规范视图体积是一个立方体,其中包含所有笛卡尔坐标介于-1和 +1 之间的 3D 点,即 ( x, y, z ) ∈ [ - 1, 1] 3 (图 8.3 )。我们将x = -1投影到屏幕左侧, x = +1 投影到屏幕右侧, y = -1 投影到屏幕底部, y = +1 投影到屏幕顶部。
The word “canonical” crops up again—it means something arbitrarily chosen for convenience. For instance, the unit circle could be called the “canonical circle.”
“正则”这个词又出现了——它意味着为了方便而任意选择的事物。例如,单位圆可以称为“正则圆”。
Recall the conventions for pixel coordinates from Chapter 3: each pixel “owns” a unit square centered at integer coordinates; the image boundaries have a half-unit overshoot from the pixel centers; and the smallest pixel center coordinates are (0, 0) . If we are drawing into an image (or window on the screen) that has nx by ny pixels, we need to map the square [–1, 1]2 to the rectangle [–0.5,nx – 0.5] × [–0.5,ny – 0.5].
回想一下第 3 章中关于像素坐标的约定:每个像素“拥有”一个以整数坐标为中心的单位正方形;图像边界与像素中心有半个单位的偏差;最小像素中心坐标为 (0, 0)。如果我们要绘制一个有n x x n y像素的图像(或屏幕上的窗口),我们需要将正方形 [ – 1, 1] 2映射到矩形 [ – 0.5 ,n x – 0.5] × [ – 0.5 ,n y – 0.5]。
Mapping a square to a potentially non-square rectangle is not a problem; x and y just end up with different scale factors going from canonical to pixel coordinates.
将正方形映射到可能非正方形的矩形不是问题; x和y只是从标准坐标到像素坐标有不同的比例因子。
For now, we will assume that all line segments to be drawn are completely inside the canonical view volume. Later, we will relax that assumption when we discuss clipping.
现在,我们假设所有要绘制的线段都完全在标准视体积内。稍后,当我们讨论剪裁。
Since the viewport transformation maps one axis-aligned rectangle to another, it is a case of the windowing transform given by Equation (7.6):
由于视口变换将一个轴对齐矩形映射到另一个轴对齐矩形,因此它是公式 (7.6) 给出的窗口变换的一种情况:
Note that this matrix ignores the z-coordinate of the points in the canonical view volume, because a point’s distance along the projection direction doesn’t affect where that point projects in the image. But before we officially call this the view-port matrix, we add a row and column to carry along the z-coordinate without changing it. We don’t need it in this chapter, but eventually, we will need the z values because they can be used to make closer surfaces hide more distant surfaces (see Section 9.2.3).
请注意,此矩阵忽略了标准视点体积中点的z坐标,因为点沿投影方向的距离不会影响该点在图像中的投影位置。但在正式将其称为视口矩阵之前,我们添加了一行和一列来承载z坐标而不改变它。我们在本章中不需要它,但最终我们将需要z值,因为它们可用于使较近的表面隐藏较远的表面(参见第 9.2.3 节)。
Figure 8.3. The canonical view volume is a cube with side of length two centered at the origin.
图 8.3。标准视体积是一个以原点为中心,边长为 2 的立方体。
Of course, we usually want to render geometry in some region of space other than the canonical view volume. Our first step in generalizing the view will keep the view direction and orientation fixed looking along – z with +y up, but will allow arbitrary rectangles to be viewed. Rather than replacing the viewport matrix, we’ll augment it by multiplying it with another matrix on the right.
当然,我们通常希望在除标准视区之外的某个空间区域渲染几何图形。我们概括视图的第一步是保持视图方向和方向固定,即沿-z 方向看,+ y方向向上,但允许查看任意矩形。我们不会替换视口矩阵,而是通过将其与右侧的另一个矩阵相乘来增强它。
Figure 8.4. The ortho-graphic view volume.
图 8.4.正交视图体积。
Under these constraints, the view volume is an axis-aligned box, and we’ll name the coordinates of its sides so that the view volume is [l, r] × [b, t] × [f, n] shown in Figure 8.4. We call this box the orthographic view volume and refer to the bounding planes as follows:
在这些约束下,视景体是一个轴对齐的盒子,我们将其边的坐标命名为 [ l, r ] × [ b, t ] × [ f, n ],如图 8.4所示。我们称这个盒子为正交视图体积并参考边界平面如下:
Figure 8.5. The orthographic view volume is along the negative z-axis, so f is a more negative number than n; thus, n > f.
图 8.5。正交视图体积沿着负z轴,因此f比n更负;因此, n > f 。
That vocabulary assumes a viewer who is looking along the minus z-axis with his head pointing in the y-direction.1 This implies that n > f, which may be unintuitive, but if you assume the entire orthographic view volume has negative z values, then the z = n “near” plane is closer to the viewer if and only if n > f ; here, f is a smaller number than n, i.e., a negative number of larger absolute value than n.
该词汇表假设观看者沿着负 z轴观看,其头部指向y方向。1这意味着n > f ,这可能不直观,但如果你假设整个正交视图体积具有负z值,则当且仅当n > f时, z = n “近”平面更靠近观看者;这里, f是一个比n更小的数,即,一个绝对值大于n的负数。
This concept is shown in Figure 8.5. The transform from orthographic view volume to the canonical view volume is another windowing transform, so we can simply substitute the bounds of the orthographic and canonical view volumes into Equation (7.7) to obtain the matrix for this transformation:
这个概念如图 8.5所示。从正交视点体积到标准视点体积的变换是另一个窗口变换,因此我们可以简单地将正交视点体积和标准视点体积的边界代入公式 (7.7) 中,以获得此变换的矩阵:
This matrix is very close to the one used traditionally in OpenGL, except that n, f, and zcanonical all have the opposite sign.
这个矩阵与 OpenGL 中传统使用的矩阵非常接近,只是n 、 f和z 的标准符号都相反。
1 Most programmers find it intuitive to have the x-axis pointing right and the y-axis pointing up. In a right-handed coordinate system, this implies that we are looking in the –z direction. Some systems use a left-handed coordinate system for viewing so that the gaze direction is along +z. Which is best is a matter of taste, and this text assumes a right-handed coordinate system. A reference that argues for the left-handed system instead is given in the notes at the end of this chapter.
1大多数程序员都认为x轴指向右, y轴指向上是直观的。在右手坐标系中,这意味着我们正朝着-z方向看。有些系统使用左手坐标系进行查看,因此注视方向是沿着 + z方向。哪种方式最好取决于个人喜好,本文假设采用右手坐标系。本章末尾的注释中给出了一个支持左手坐标系的参考资料。
To draw 3D line segments in the orthographic view volume, we project them into screen x-and y-coordinates and ignore z-coordinates. We do this by combining Equations (8.2) and (8.3). Note that in a program, we multiply the matrices together to form one matrix and then manipulate points as follows:
为了在正交视图体积中绘制 3D 线段,我们将它们投影到屏幕x和y坐标中并忽略z坐标。我们通过组合方程 (8.2) 和 (8.3) 来实现这一点。请注意,在程序中,我们将矩阵相乘以形成一个矩阵,然后按如下方式操作点:
The z-coordinate will now be in [–1, 1]. We don’t take advantage of this now, but it will be useful when we examine z-buffer algorithms.
z坐标现在位于 [ - 1, 1] 中。我们现在不利用这一点,但在检查 z 缓冲区算法时它将很有用。
The code to draw many 3D lines with endpoints ai and bi thus becomes both simple and efficient:
这样,绘制多条端点为a和b 的3D 线的代码就变得简单而高效:
This is a first example of how matrix transformation machinery makes graphics programs clean and efficient.
这是矩阵变换机制如何使图形程序清晰、高效的第一个例子。
construct Mvp
construct Morth
M = MvpMorth
for each line segment (ai, bi) do
p = Mai
q = Mbi
drawline(xp, yp, xq, yq)
We’d like to be able to change the viewpoint in 3D and look in any direction. There are a multitude of conventions for specifying viewer position and orientation. We will use the following one (see Figure 8.6):
我们希望能够在 3D 中改变视点并朝任何方向看。有许多用于指定观察者位置和方向的约定。我们将使用以下约定(参见图 8.6 ):
Figure 8.6. The user spec-ifies viewing as an eye position e, a gaze direction g, and an up vector t. We construct a right-handed basis with w pointing opposite to the gaze and v being in the same plane as g and t.
图 8.6。用户将视线指定为眼位e 、注视方向g和向上向量t 。我们构建一个右手系基,其中w指向注视方向的反方向, v与g和t位于同一平面。
the eye position e,
眼位e ,
the gaze direction g,
注视方向g ,
the view-up vector t.
视线向上向量t 。
The eye position is a location that the eye “sees from.” If you think of graphics as a photographic process, it is the center of the lens. The gaze direction is any vector in the direction that the viewer is looking. The view-up vector is any vector in the plane that both bisects the viewer’s head into right and left halves and points “to the sky” for a person standing on the ground. These vectors provide us with enough information to set up a coordinate system with origin e and a uvw basis, using the construction of Section 2.4.7:
眼位是眼睛“看见”的位置。如果你将图形视为摄影过程,那么它就是镜头的中心。注视方向是观察者注视方向上的任意向量。视线向上向量是平面上的任意向量,它既将观察者的头部一分为二,又指向站在地面上的人的“天空”。这些向量为我们提供了足够的信息,可以使用第 2.4.7 节的构造,建立一个以原点e和uvw为基的坐标系:
Figure 8.7. For arbitrary viewing, we need to change the points to be stored in the “appropriate” coordinate system. In this case, it has origin e and offset coordinates in terms of uvw.
图 8.7。为了任意查看,我们需要将点更改为存储在“适当”的坐标系中。在本例中,它具有原点e和uvw的偏移坐标。
Our job would be done if all points we wished to transform were stored in coordinates with origin e and basis vectors u, v,and w. But as shown in Figure 8.7, the coordinates of the model are stored in terms of the canonical (or world) origin o and the x-, y-, and z-axes. To use the machinery we have already developed, we just need to convert the coordinates of the line segment endpoints we wish to draw from xyz-coordinates into uvw-coordinates. This kind of transformation was discussed in Section 7.5, and the matrix that enacts this transformation is the canonical-to-basis matrix of the camera’s coordinate frame:
如果所有需要变换的点都存储在原点为e 、基向量为u 、 v和w的坐标系中,那么我们的工作就完成了。但是如图 8.7所示,模型的坐标是以标准(或世界)原点o和x 、 y和z轴为单位存储的。要使用我们已经开发的机制,我们只需将要绘制的线段端点的坐标从xyz坐标系转换为uvw坐标系即可。这种变换已在7.5 节中讨论过,实现这种变换的矩阵是相机坐标系的标准到基矩阵:
Alternatively, we can think of this same transformation as first moving e to the origin, then aligning u, v, w to x, y, z.
或者,我们可以将同样的变换想象为首先将e移动到原点,然后将u , v , w与x , y , z对齐。
To make our previously z-axis-only viewing algorithm work for cameras with any location and orientation, we just need to add this camera transformation to the product of the viewport and projection transformations, so that it converts the incoming points from world to camera coordinates before they are projected:
为了使我们之前的仅z轴的查看算法适用于任何位置和方向的相机,我们只需要将此相机变换添加到视口和投影变换的乘积中,以便它在投影之前将传入的点从世界坐标转换为相机坐标:
construct Mvp
construct Morth
construct Mcam
M = MvpMorthMcam
for each line segment (ai, bi) do
p = Mai
q = Mbi
drawline(xp, yp, xq, yq)
Again, almost no code is needed once the matrix infrastructure is in place.
再次,一旦矩阵基础设施到位,几乎不需要任何代码。
We have left perspective for last because it takes a little bit of cleverness to make it fit into the system of vectors and matrix transformations that has served us so well up to now. To see what we need to do, let’s look at what the perspective projection transformation needs to do with points in camera space. Recall that the viewpoint is positioned at the origin and the camera is looking along the z-axis.
我们把透视放在最后,因为要让它适应迄今为止一直为我们服务的向量和矩阵变换系统,需要一点小技巧。要了解我们需要做什么,让我们看看透视投影变换需要对相机空间中的点做什么。回想一下,视点位于原点,相机沿着z轴观察。
For the moment, we will ignore the sign of z to keep the equations simpler, but it will return on page 168.
目前,我们将忽略z的符号以使方程更简单,但它将在第 168 页返回。
The key property of perspective is that the size of an object on the screen is proportional to 1/z for an eye at the origin looking up the negative z-axis. This can be expressed more precisely in an equation for the geometry in Figure 8.8:
透视的关键特性是,对于位于原点、朝负 z 轴方向看的眼睛,屏幕上物体的大小与 1 /z成正比。这可以用图 8.8中的几何方程更精确地表达:
Figure 8.8. The geometry for Equation (8.5). The viewer’s eye is at e, and the gaze direction is g (the minus z-axis). The view plane is a distance d from the eye. A point is projected toward e and where it intersects the view plane is where it is drawn.
图 8.8。方程 (8.5) 的几何形状。观察者的眼睛位于e处,注视方向为g (负z轴)。视平面与眼睛的距离为d 。一个点被投影到e处,它与视平面的交点就是绘制该点的位置。
where y is the distance of the point along the y-axis, and ys is where the point should be drawn on the screen.
其中y是该点沿y轴的距离, y s是该点应在屏幕上绘制的位置。
We would really like to use the matrix machinery we developed for ortho-graphic projection to draw perspective images; we could then just multiply another matrix into our composite matrix and use the algorithm we already have. However, this type of transformation, in which one of the coordinates of the input vector appears in the denominator, can’t be achieved using affine transformations.
我们非常希望使用我们为正交投影开发的矩阵机制来绘制透视图像;然后我们可以将另一个矩阵乘以我们的复合矩阵并使用我们已有的算法。但是,这种类型的变换(其中输入向量的其中一个坐标出现在分母中)无法使用仿射变换实现。
We can allow for division with a simple generalization of the mechanism of homogeneous coordinates that we have been using for affine transformations. We have agreed to represent the point (x, y, z) using the homogeneous vector [x y z 1]T; the extra coordinate, w, is always equal to 1, and this is ensured by always using [0 0 0 1]T as the fourth row of an affine transformation matrix.
我们可以用一个用于仿射变换的齐次坐标机制的简单概括来实现除法。我们同意使用齐次向量 [ xyz 1] T来表示点 ( x, y, z );额外的坐标w始终等于 1,这可以通过始终使用 [0 0 0 1] T作为仿射变换矩阵的第四行来确保。
Rather than just thinking of the 1 as an extra piece bolted on to coerce matrix multiplication to implement translation, we now define it to be the denominator of the x-, y-, and z-coordinates: the homogeneous vector [x y z w]T represents the point (x/w, y/w, z/w) . This makes no difference when w = 1, but it allows a broader range of transformations to be implemented if we allow any values in the bottom row of a transformation matrix, causing w to take on values other than 1.
我们现在不再将 1 视为强制矩阵乘法实现平移的附加部分,而是将其定义为x 、 y和z坐标的分母:齐次向量 [ xyzw ] T表示点 ( x/w, y/w, z/w )。当w = 1 时,这没有区别,但如果我们允许变换矩阵底行中的任何值,则它允许实现更广泛的变换,从而使w取 1 以外的值。
Concretely, linear transformations allow us to compute expressions like
具体来说,线性变换使我们能够计算如下表达式
and affine transformations extend this to
仿射变换将其扩展为
Treating w as the denominator further expands the possibilities, allowing us to compute functions like
将w作为分母进一步扩展了可能性,使我们能够计算如下函数
this could be called a “linear rational function” of x, y,and z. But there is an extra constraint—the denominators are the same for all coordinates of the transformed point:
这可以称为x 、 y和z的“线性有理函数”。但有一个额外的限制——变换点的所有坐标的分母都是相同的:
Expressed as a matrix transformation,
用矩阵变换来表示,
and
和
A transformation like this is known as a projective transformation or a homography.
像这样的变换被称为射影变换或单应性变换。
Example 17 The matrix
例 17矩阵
represents a 2D projective transformation that transforms the unit square ([0, 1] × [0, 1]) to the quadrilateral shown in Figure 8.9.
表示一个二维射影变换,将单位正方形 ([0, 1] × [0, 1]) 变换为图 8.9所示的四边形。
For instance, the lower-right corner of the square at (1, 0) is represented by the homogeneous vector [1 0 1]T and transforms as follows:
例如,正方形右下角 (1, 0) 处的位置用齐次向量 [1 0 1] T表示,变换如下:
Figure 8.9. A projective transformation maps a square to a quadrilateral, preserving straight lines but not parallel lines.
图 8.9。射影变换将正方形映射到四边形,保留直线但不保留平行线。
which represents the point , or (3, 0). Note that if we use the matrix
表示点 (1/130/13),或 (3, 0)。请注意,如果我们使用矩阵
instead, the result is [3 0 1]T, which also represents (3, 0) . In fact, any scalar multiple cM is equivalent: the numerator and denominator are both scaled by c, which does not change the result.
相反,结果是 [3 0 1] T ,它也表示 (3, 0) 。事实上,任何标量倍数c M都是等价的:分子和分母都按c缩放,这不会改变结果。
There is a more elegant way of expressing the same idea, which avoids treating the w-coordinate specially. In this view, a 3D projective transformation is simply a 4D linear transformation, with the extra stipulation that all scalar multiples of a vector refer to the same point:
有一种更优雅的方式来表达同样的想法,避免特殊处理w坐标。从这个角度来看,3D 射影变换只是一个 4D 线性变换,额外规定一个向量的所有标量倍数都指向同一个点:
The symbol ~ is read as “is equivalent to” and means that the two homogeneous vectors both describe the same point in space.
符号 ~ 读作“等同于”,表示两个齐次向量都描述空间中的同一个点。
Figure 8.10. The point x = 1.5 is represented by any point on the line x =1.5h, such as points at the hollow circles. However, before we interpret x as a conventional Cartesian coordinate, we first divide by h to get (x,h ) = (1.5,1) as shown by the black point.
图 8.10。点x = 1.5 可以用直线x =1.5 h上的任意点表示,例如空心圆处的点。然而,在我们将x解释为常规笛卡尔坐标之前,我们首先除以h得到 ( x,h ) = (1.5,1),如黑点所示。
Example 18 In 1D homogeneous coordinates, in which we use 2-vectors to represent points on the real line, we could represent the point (1.5) using the homogeneous vector [1.5 1]T, or any other point on the line x = 1.5h in homogeneous space. (See Figure 8.10.)
例 18在一维齐次坐标系中,我们使用2向量表示实线上的点,我们可以用齐次向量 [1.5 1] T来表示点 (1.5),也可以用齐次空间中直线x = 1.5 h上的任何其他点来表示。(见图8.10 。)
In 2D homogeneous coordinates, in which we use 3-vectors to represent points in the plane, we could represent the point (–1, –0.5) using the homogeneous vector [–2; –1; 2]T, or any other point on the line x = α[–1 – 0.5 1]T. Any homogeneous vector on the line can be mapped to the line’s intersection with the plane w = 1 to obtain its Cartesian coordinates. (See Figure 8.11.)
在二维齐次坐标系中,我们使用 3 个向量来表示平面上的点,我们可以使用齐次向量 [ – 2; – 1; 2] T来表示点 ( – 1, – 0.5),或者直线x = α[ – 1 – 0.5 1] T上的任何其他点。直线上的任何齐次向量都可以映射到直线与平面w = 1 的交点,以获得其笛卡尔坐标。(见图8.11 。)
It’s fine to transform homogeneous vectors as many times as needed, without worrying about the value of the w-coordinate—in fact, it is fine if the w-coordinate is zero at some intermediate phase. It is only when we want the ordinary Cartesian coordinates of a point that we need to normalize to an equivalent point that has w = 1, which amounts to dividing all the coordinates by w. Once we’ve done this, we are allowed to read off the (x, y, z) -coordinates from the first three components of the homogeneous vector.
可以按需要多次变换齐次向量,而不必担心w坐标的值——事实上,如果w坐标在某个中间阶段为零,那就没问题。只有当我们需要某个点的普通笛卡尔坐标时,我们才需要将其归一化为w = 1 的等效点,这相当于将所有坐标除以w 。完成此操作后,我们就可以从齐次向量的前三个分量中读取 ( x, y, z ) 坐标。
Figure 8.11. A point in homogeneous coordinates is equivalent to any other point on the line through it and the origin, and normalizing the point amounts to intersecting this line with the plane w =1.
图 8.11。齐次坐标中的一点等同于通过它和原点的直线上的任何其他点,并且对该点进行规范化相当于将该直线与平面w =1 相交。
The mechanism of projective transformations makes it simple to implement the division by z required to implement perspective. In the 2D example shown in Figure 8.8, we can implement the perspective projection with a matrix transformation
投影变换的机制使得实现透视所需的除以z变得简单。在图 8.8所示的 2D 示例中,我们可以使用矩阵变换来实现透视投影
as follows:
如下:
This transforms the 2D homogeneous vector [y; z;1]T to the 1D homogeneous vector [dy z]T, which represents the 1D point (dy/z) (because it is equivalent to the 1D homogeneous vector [dy/z 1]T. This matches Equation (8.5).
这将二维齐次向量 [ y ; z ;1] T转换为一维齐次向量 [ dy z ] T ,表示一维点 ( dy/z )(因为它等同于一维齐次向量 [ dy/z 1] T )。这与公式 (8.5) 相符。
For the “official” perspective projection matrix in 3D, we’ll adopt our usual convention of a camera at the origin facing in the – z direction, so the distance of the point (x, y, z) is – z. As with orthographic projection, we also adopt the notion of near and far planes that limit the range of distances to be seen. In this context, we will use the near plane as the projection plane, so the image plane distance is – n.
对于 3D 中的“官方”透视投影矩阵,我们将采用通常的惯例,即相机位于原点,朝向-z方向,因此点 ( x, y, z ) 的距离为-z 。与正交投影一样,我们也采用近平面和远平面的概念来限制可见距离的范围。在这种情况下,我们将使用近平面作为投影平面,因此图像平面距离为-n 。
The desired mapping is then ys = (n/z)y, and similarly for x. This transformation can be implemented by the perspective matrix:
那么所需的映射就是y s = ( n/z ) y ,对于x也是类似。这种变换可以通过透视矩阵来实现:
Remember, n < 0.
记住, n < 0。
The first, second, and fourth rows simply implement the perspective equation. The third row, as in the orthographic and viewport matrices, is designed to bring the z-coordinate “along for the ride” so that we can use it later for hidden surface removal. In the perspective projection, though, the addition of a non-constant denominator prevents us from actually preserving the value of z—it’s actually impossible to keep z from changing while getting x and y to do what we need them to do. Instead, we’ve opted to keep z unchanged for points on the near or far planes.
第一、二和第四行只是实现了透视方程。第三行,与正交矩阵和视口矩阵一样,旨在将z坐标“随波逐流”,以便我们稍后可以将其用于隐藏表面移除。然而,在透视投影中,添加非常量分母会阻止我们实际保留z的值 — 实际上不可能在让x和y执行我们需要它们执行的操作的同时阻止z发生变化。相反,我们选择保持近平面或远平面上的点的z不变。
More on this later.
稍后将详细介绍。
There are many matrices that could function as perspective matrices, and all of them nonlinearly distort the z-coordinate. This specific matrix has the nice properties shown in Figures 8.12 and 8.13; it leaves points on the (z = n)-plane entirely alone, and it leaves points on the (z = f ) -plane while “squishing” them in x and y by the appropriate amount. The effect of the matrix on a point (x, y, z) is
有许多矩阵可以用作透视矩阵,它们都会非线性地扭曲z坐标。这个特定的矩阵具有图 8.12和8.13所示的良好特性;它让 ( z = n ) 平面上的点完全独立,让 ( z = f ) 平面上的点保持原样,同时在x和y方向上以适当的量“挤压”它们。矩阵对点 ( x, y, z ) 的影响是
Figure 8.12. The perspective projection leaves points on the z = n plane unchanged and maps the large z = f rectangle at the back of the perspective volume to the small z = f rectangle at the back of the orthographic volume.
图 8.12。透视投影保持z = n平面上的点不变,并将透视体积后部的大z = f矩形映射到正交体积后部的小z = f矩形。
Figure 8.13. The perspective projection maps any line through the origin/eye to a line parallel to the z-axis and without moving the point on the line at z =n.
图 8.13透视投影将通过原点/眼睛的任何线映射到与z轴平行的线,而不移动z =n处线上的点。
As you can see, x and y are scaled and, more importantly, divided by z. Because both n and z (inside the view volume) are negative, there are no “flips” in x and y. Although it is not obvious (see the exercise at the end of this chapter), the transform also preserves the relative order of z values between z = n and z = f, allowing us to do depth ordering after this matrix is applied. This will be important later when we do hidden surface elimination.
如你所见, x和y被缩放,更重要的是,除以z 。因为n和z (在视图体积内)都是负数,所以x和y没有“翻转”。虽然并不明显(参见本章末尾的练习),但变换还保留了z = n和z = f之间z值的相对顺序,允许我们在应用此矩阵后进行深度排序。这在我们稍后进行隐藏表面消除时很重要。
Sometimes, we will want to take the inverse of P, for example, to bring a screen coordinate plus z back to the original space, as we might want to do for picking. The inverse is
有时,我们会想要取P的逆,例如,将屏幕坐标加上z带回到原始空间,就像我们想要拾取的那样。逆是
Since multiplying a homogeneous vector by a scalar does not change its meaning, the same is true of matrices that operate on homogeneous vectors. So we can write the inverse matrix in a prettier form by multiplying through by nf :
因为将齐次向量乘以标量不会改变其含义,所以对齐次向量进行运算的矩阵也是如此。因此,我们可以通过将 乘以nf以更漂亮的形式写出逆矩阵:
This matrix is not literally the inverse of the matrix P, but the transformation it describes is the inverse of the transformation described by P.
这个矩阵实际上并不是矩阵P的逆,但它描述的变换是P描述的变换的逆。
Taken in the context of the orthographic projection matrix Morth in Equation (8.3), the perspective matrix simply maps the perspective view volume (which is shaped like a slice, or frustum, of a pyramid) to the orthographic view volume (which is an axis-aligned box). The beauty of the perspective matrix is that once we apply it, we can use an orthographic transform to get to the canonical view volume. Thus, all of the orthographic machinery applies, and all that we have added is one matrix and the division by w. It is also heartening that we are not “wasting” the bottom row of our four by four matrices!
以公式 (8.3) 中的正交投影矩阵M orth为例,透视矩阵只是将透视视图体积(形状像金字塔的切片或截头体)映射到正交视图体积(轴对齐的盒子)。透视矩阵的妙处在于,一旦我们应用它,我们就可以使用正交变换来获得标准视图体积。因此,所有正交机制都适用,我们所添加的只是一个矩阵和除以w 。同样令人欣慰的是,我们没有“浪费”四乘四矩阵的底行!
Concatenating P with Morth results in the perspective projection matrix,
将P与M连接起来得到透视投影矩阵,
One issue, however, is: How are l,r,b,t determined for perspective? They identify the “window” through which we look. Since the perspective matrix does not change the values of x and y on the (z = n) -plane, we can specify (l, r, b, t) on that plane.
然而,有一个问题是:透视中的l、r、b、t是如何确定的?它们确定了我们观察的“窗口”。由于透视矩阵不会改变 ( z = n ) 平面上的x和y值,因此我们可以在该平面上指定 ( l、r、b、t )。
To integrate the perspective matrix into our orthographic infrastructure, we simply replace Morth with Mper, which inserts the perspective matrix P after the camera matrix Mcam has been applied but before the orthographic projection. So the full set of matrices for perspective viewing is
为了将透视矩阵集成到我们的正交基础架构中,我们只需将M orth替换为M per ,这将在应用相机矩阵M cam之后但在正交投影之前插入透视矩阵P 。因此,透视视图的完整矩阵集为
The resulting algorithm is
得到的算法是
compute Mvp
compute Mper
compute Mcam
M = MvpMperMcam
for each line segment (ai, bi) do
p = Mai
q = Mbi
drawline(xp /wp, yp /wp, xq /wq, yq /wq)
Note that the only change other than the additional matrix is the divide by the homogeneous coordinate w.
请注意,除了附加矩阵之外,唯一的变化是用齐次坐标w进行划分。
Multiplied out, the matrix Mper looks like this:
相乘后,矩阵M如下所示:
This or similar matrices often appear in documentation, and they are less mysterious when one realizes that they are usually the product of a few simple matrices.
这种矩阵或类似的矩阵经常出现在文献中,当人们意识到它们通常是几个简单矩阵的乘积时,它们就不会那么神秘了。
Example 19 Many APIs such as OpenGL (Shreiner, Neider, Woo, & Davis, 2004) use the same canonical view volume as presented here. They also usually have the user specify the absolute values of n and f . The projection matrix for OpenGL is
示例 19许多 API(例如OpenGL (Shreiner、Neider、Woo 和 Davis,2004))使用与此处介绍的相同的规范视图体积。它们通常还让用户指定n和f的绝对值。OpenGL 的投影矩阵为
Other APIs send n and f to 0 and 1, respectively. Blinn (1996) recommends making the canonical view volume [0, 1]3 for efficiency. All such decisions will change the projection matrix slightly.
其他 API 分别将n和f发送至 0 和 1。Blinn (1996) 建议将规范视图体积设为 [0, 1] 3以提高效率。所有这些决定都会稍微改变投影矩阵。
An important property of the perspective transform is that it takes lines to lines and planes to planes. In addition, it takes line segments in the view volume to line segments in the canonical volume. To see this, consider the line segment
透视变换的一个重要特性是它将线变成线,将平面变成平面。此外,它将视场中的线段变成标准体积中的线段。要了解这一点,请考虑线段
When transformed by a 4 × 4 matrix M, it is a point with possibly varying homogeneous coordinate:
当用 4×4 矩阵M变换时,它是一个可能变化的齐次坐标的点:
The homogenized 3D line segment is
均匀化的三维线段为
If Equation (8.6) can be rewritten in a form
如果方程(8.6)可以改写成
then all the homogenized points lie on a 3D line. Brute force manipulation of Equation (8.6) yields such a form with
那么所有同质化点都位于一条三维线上。对方程 (8.6) 进行强力操作可得到这样的形式
It also turns out that the line segments do map to line segments preserving the ordering of the points (Exercise 8); i.e., they do not get reordered or “torn.”
事实证明,线段确实会映射到保留点的顺序的线段(练习 8);也就是说,它们不会被重新排序或“撕裂”。
A byproduct of the transform taking line segments to line segments is that it takes the edges and vertices of a triangle to the edges and vertices of another triangle. Thus, it takes triangles to triangles and planes to planes.
将线段转换为线段的变换的一个副产品是将一个三角形的边和顶点转换为另一个三角形的边和顶点。因此,它将三角形转换为三角形,将平面转换为平面。
While we can specify any window using the (l, r, b, t) and n values, sometimes we would like to have a simpler system where we look through the center of the window. This implies the constraint that
虽然我们可以使用 ( l, r, b, t ) 和n值指定任何窗口,但有时我们希望有一个更简单的系统,让我们通过窗口的中心进行查看。这意味着约束
If we also add the constraint that the pixels are square, i.e., there is no distortion of shape in the image, then the ratio of r to t must be the same as the ratio of the number of horizontal pixels to the number of vertical pixels:
如果我们还添加像素为正方形的约束,即图像中没有形状的扭曲,则r与t的比率必须与水平像素数与垂直像素数的比率相同:
Figure 8.14. The field-of-view θ is the angle from the bottom of the screen to the top of the screen as measured from the eye.
图 8.14.视野 θ 是从眼睛测量的从屏幕底部到屏幕顶部的角度。
Once nx and ny are specified, this leaves only one degree of freedom. That is often set using the field-of-view shown as θ in Figure 8.14. This is sometimes called the vertical field-of-view to distinguish it from the angle between left and right sides or from the angle between diagonal corners. From the figure, we can see that
一旦指定了n x和n y ,就只剩下一个自由度。这通常使用图 8.14中所示的θ来设置。这有时被称为垂直视野,以区别于左右两侧之间的角度或对角线之间的角度。从图中我们可以看到
If n and θ are specified, then we can derive t and use code for the more general viewing system. In some systems, the value of n is hard-coded to some reasonable value, and thus, we have one fewer degree of freedom.
如果指定了n和θ ,那么我们可以推导出t并使用更通用的观看系统的代码。在某些系统中, n的值被硬编码为某个合理的值,因此我们的自由度会少一个。
Is orthographic projection ever useful in practice?
正交投影在实践中有用吗?
It is useful in applications where relative length judgments are important. It can also yield simplifications where perspective would be too expensive as occurs in some medical visualization applications.
它在相对长度判断很重要的应用中非常有用。它还可以简化透视成本过高的情况,例如在某些医学可视化应用中。
The tessellated spheres I draw in perspective look like ovals. Is this a bug?
我绘制的镶嵌球体在透视图中看起来像椭圆形。这是一个错误吗?
No. It is correct behavior. If you place your eye in the same relative position to the screen as the virtual viewer has with respect to the viewport, then these ovals will look like circles because they themselves are viewed at an angle.
不。这是正确的行为。如果你将眼睛与屏幕的相对位置与虚拟观看者与视口的相对位置相同,那么这些椭圆形将看起来像圆形,因为它们本身是以一定角度观看的。
Does the perspective matrix take negative z values to positive z values with a reversed ordering? Doesn’t that cause trouble?
透视矩阵是否会将负z值转换为正z值,且顺序相反?这不会造成麻烦吗?
Yes. The equation for transformed z is
是的。变换后的z的方程为
So z = + is transformed to z = –∞ and z = – is transformed to z = ∞. So any line segments that span z = 0 will be “torn” although all points will be projected to an appropriate screen location. This tearing is not relevant when all objects are contained in the viewing volume. This is usually assured by clipping to the view volume. However, clipping itself is made more complicated by the tearing phenomenon as is discussed in Chapter 9.
因此z = + 变换为z = –∞ 且z = –转换为z = ∞。因此,尽管所有点都将投影到适当的屏幕位置,但任何跨越z = 0 的线段都将被“撕裂”。当所有对象都包含在视体积中时,这种撕裂无关紧要。这通常通过裁剪到视体积来确保。然而,裁剪本身因撕裂现象而变得更加复杂,如第 9 章所述。
The perspective matrix changes the value of the homogeneous coordinate. Doesn’t that make the move and scale transformations no longer work properly?
透视矩阵改变了齐次坐标的值。这是否会导致移动和缩放变换不再正常工作?
Applying a translation to a homogeneous point, we have
对同质点进行平移,可得
Similar effects are true for other transforms (see Exercise 5).
对于其他变换也有类似的效果(见练习 5)。
Most of the discussion of viewing matrices is based on information in Real-Time Rendering (Akenine-Möller, Haines, & Hoffman, 2008), the OpenGL Programming Guide (Shreiner et al., 2004), Computer Graphics (Hearn & Baker, 1986), and 3D Game Engine Design (Eberly, 2000).
关于查看矩阵的大部分讨论均基于《实时渲染》 (Akenine-Möller、Haines 和 Hoffman,2008 年)、 《OpenGL 编程指南》 (Shreiner 等,2004 年)、 《计算机图形学》 (Hearn 和 Baker,1986 年)和《3D 游戏引擎设计》 (Eberly,2000 年)中的信息。
1. Construct the viewport matrix required for a system in which pixel coordinates count down from the top of the image, rather than up from the bottom.
1.构建系统所需的视口矩阵,其中像素坐标从图像顶部向下计数,而不是从底部向上计数。
2. Multiply the viewport and orthographic projection matrices, and show that the result can also be obtained by a single application of Equation (7.7).
2.将视口和正交投影矩阵相乘,并表明结果也可以通过一次应用公式 (7.7) 获得。
3. Derive the third row of Equation (8.3) from the constraint that z is preserved for points on the near and far planes.
3.从近平面和远平面上的点的z保持不变的约束出发,推导出方程 (8.3) 的第三行。
4. Show algebraically that the perspective matrix preserves order of z values within the view volume.
4.从代数角度证明透视矩阵保留了视图体积内z值的顺序。
5. For a 4×4 matrix whose top three rows are arbitrary and whose bottom row is (0, 0, 0, 1), show that the points (x, y, z, 1) and (hx, hy, hz, h) transform to the same point after homogenization.
5.对于一个 4×4 矩阵,其上三行任意,下行为 (0, 0, 0, 1),证明点 ( x, y, z, 1) 和 ( hx, hy, hz, h ) 经均质化后变换为同一点。
6. Verify that the form of M–1 p given in the text is correct.
6.验证文中给出的M – 1 p的形式是否正确。
7. Verify that the full perspective to canonical matrix Mper takes (r, t, n) to (1, 1, 1) .
7.验证全透视到标准矩阵M满足( r, t, n ) 到 (1, 1, 1) 。
8. Write down a perspective matrix for n = 1, f = 2.
8.写出n = 1、 f = 2 的透视矩阵。
9. For the point p = (x, y, z, 1), what are the homogenized and unhomogenized results for that point transformed by the perspective matrix in Exercise 6?
9.对于点p = ( x, y, z, 1),通过练习 6 中的透视矩阵变换,该点的均质化和非均质化结果是什么?
10. For the eye position e = (0, 1, 0), a gaze vector g = (0, –1, 0), andaviewup vector t = (1, 1, 0), what is the resulting orthonormal uvw basis used for coordinate rotations?
10.对于眼睛位置e = (0, 1, 0)、注视矢量g = (0, - 1, 0) 和视线向上矢量t = (1, 1, 0),用于坐标旋转的结果正交uvw基是什么?
11. Show, that for a perspective transform, line segments that start in the view volume do map to line segments in the canonical volume after homogenization. Furthermore, show that the relative ordering of points on the two segments is the same. Hint: Show that the f (t) in Equation (8.8) has the properties f (0) = 0, f (1) = 1, the derivative of f is positive for all t ∈ [0, 1], and the homogeneous coordinate does not change sign.
11.证明,对于透视变换,在均质化之后,从视点体积开始的线段确实会映射到标准体积中的线段。此外,证明两个线段上点的相对顺序相同。提示:证明方程 (8.8) 中的f ( t ) 具有以下属性: f (0) = 0、 f (1) = 1, f的导数对所有t ∈ [0, 1] 都为正,并且齐次坐标不变。
The previous several chapters have established the mathematical scaffolding we need to look at the second major approach to rendering: drawing objects one by one onto the screen or object-order rendering. Unlike in ray tracing, where we consider each pixel in turn and find the objects that influence its color, we’ll now instead consider each geometric object in turn and find the pixels that it could have an effect on. The process of finding all the pixels in an image that are occupied by a geometric primitive is called rasterization, so object-order rendering can also be called rendering by rasterization. The sequence of operations that is required, starting with objects and ending by updating pixels in the image, is known as the graphics pipeline.
前面几章已经建立了数学框架,我们现在需要这种框架来研究第二种主要的渲染方法:将对象逐个绘制到屏幕上,或对象顺序渲染。与光线追踪不同,在光线追踪中我们依次考虑每个像素并找到影响其颜色的对象,而现在我们将依次考虑每个几何对象并找到它可能产生影响的像素。在图像中找到几何图元所占据的所有像素的过程称为光栅化,因此 对象顺序渲染也可以称为按光栅化渲染。所需的操作序列(从对象开始到更新图像中的像素结束)称为图形管道。
Any graphics system has one or more types of “primitive object” that it can handle directly, and more complex objects are converted into these “primitives.” Triangles are the most often used primitive.
任何图形系统都有一种或多种可直接处理的“原始对象”,而更复杂的对象则会被转换成这些“原始对象”。三角形是最常用的原始对象。
Rasterization-based systems are also called scanline renderers.
基于光栅化的系统也称为扫描线渲染器。
Object-order rendering has enjoyed great success because of its efficiency. For large scenes, management of data access patterns is crucial to performance, and making a single pass over the scene visiting each bit of geometry once has significant advantages over repeatedly searching the scene to retrieve the objects required to shade each pixel.
对象顺序渲染因其效率而获得了巨大成功。对于大型场景,数据访问模式的管理对于性能至关重要,而一次遍历场景访问每个几何体比反复搜索场景以检索着色每个像素所需的对象具有显著的优势。
The title of this chapter suggests that there is only one way to do object-order rendering. Of course, this isn’t true—two quite different examples of graphics pipelines with very different goals are the hardware pipelines used to support interactive rendering via APIs like OpenGL and Direct3D and the software pipelines used in film production, supporting APIs like RenderMan. Hardware pipelines must run fast enough to react in real time for games, visualizations, and user interfaces. Production pipelines must render the highest quality animation and visual effects possible and scale to enormous scenes, but may take much more time to do so. Despite the different design decisions resulting from these divergent goals, a remarkable amount is shared among most, if not all, pipelines, and this chapter attempts to focus on these common fundamentals, erring on the side of following the hardware pipelines more closely.
本章的标题表明只有一种方法可以进行对象顺序渲染。当然,事实并非如此——两个截然不同的图形管线示例具有非常不同的目标,分别是用于通过 OpenGL 和 Direct3D 等 API 支持交互式渲染的硬件管线,以及用于电影制作的软件管线,支持 RenderMan 等 API。硬件管线必须运行得足够快,才能实时响应游戏、可视化和用户界面。生产管线必须渲染最高质量的动画和视觉效果,并扩展到巨大的场景,但可能需要更多时间才能做到这一点。尽管这些不同的目标导致了不同的设计决策,但大多数(如果不是全部)管线之间有大量的共同点,本章试图关注这些共同的基本原理,更倾向于更紧密地遵循硬件管线。
The work that needs to be done in object-order rendering can be organized into the task of rasterization itself, the operations that are done to geometry before rasterization, and the operations that are done to pixels after rasterization. The most common geometric operation is applying matrix transformations, as discussed in the previous two chapters, to map the points that define the geometry from object space to screen space, so that the input to the rasterizer is expressed in pixel coordinates, or screen space. The most common pixelwise operation is hidden surface removal which arranges for surfaces closer to the viewer to appear in front of surfaces farther from the viewer. Many other operations also can be included at each stage, thereby achieving a wide range of different rendering effects using the same general process.
对象顺序渲染需要完成的工作可以组织成光栅化本身的任务、光栅化之前对几何体执行的操作以及光栅化之后对像素执行的操作。最常见的几何操作是应用矩阵变换(如前两章所述),将定义几何体的点从对象空间映射到屏幕空间,以便光栅化器的输入以像素坐标表示,或者屏幕空间。最常见的像素操作是隐藏表面移除,即安排靠近观察者的表面出现在远离观察者的表面之前。每个阶段还可以包括许多其他操作,从而使用相同的一般过程实现各种不同的渲染效果。
For the purposes of this chapter, we’ll discuss the graphics pipeline in terms of four stages (Figure 9.1). Geometric objects are fed into the pipeline from an interactive application or from a scene description file, and they are always described by sets of vertices. The vertices are operated on in the vertex-processing stage, then the primitives using those vertices are sent to the rasterization stage. The rasterizer breaks each primitive into a number of fragments, one for each pixel covered by the primitive. The fragments are processed in the fragment processing stage, and then, the various fragments corresponding to each pixel are combined in the fragment blending stage.
出于本章的目的,我们将从四个阶段来讨论图形流水线(图 9.1 )。几何对象从交互式应用程序或场景描述文件输入到流水线中,并且始终通过顶点集来描述它们。在顶点处理阶段对顶点进行操作,然后将使用这些顶点的图元发送到光栅化阶段。光栅化器将每个图元分解为多个片段,图元覆盖的每个像素对应一个片段。在片段处理阶段对片段进行处理,然后,在片段混合阶段将对应于每个像素的各种片段组合在一起。
Figure 9.1. The stages of a graphics pipeline.
图 9.1。图形管道的各个阶段。
We’ll begin by discussing rasterization and then illustrate the purpose of the geometric and pixel-wise stages by a series of examples.
我们将首先讨论光栅化,然后通过一系列示例说明几何和像素阶段的目的。
Rasterization is the central operation in object-order graphics, and the rasterizer is central to any graphics pipeline. For each primitive that comes in, the rasterizer has two jobs: it enumerates the pixels that are covered by the primitive and it interpolates values, called attributes, across the primitive—the purpose for these attributes will be clear with later examples. The output of the rasterizer is a set of fragments, one for each pixel covered by the primitive. Each fragment “lives” at a particular pixel and carries its own set of attribute values.
光栅化是对象顺序图形的核心操作,而光栅化器是任何图形管道的核心。对于每个进入的图元,光栅化器有两个任务:枚举图元覆盖的像素,并在图元中插入值(称为属性)——这些属性的用途将在后面的示例中清楚说明。光栅化器的输出是一组片段,每个片段对应于图元覆盖的每个像素。每个片段“位于”特定像素处并带有自己的一组属性值。
In this chapter, we will present rasterization with a view toward using it to render three-dimensional scenes. The same rasterization methods are used to draw lines and shapes in 2D as well—although it is becoming more and more common to use the 3D graphics system “under the covers” to do all 2D drawing.
在本章中,我们将介绍光栅化,以便用它来渲染三维场景。同样的光栅化方法也用于在二维空间中绘制线条和形状——尽管在幕后使用三维图形系统进行所有二维绘制正变得越来越普遍。
Most graphics packages contain a line drawing command that takes two endpoints in screen coordinates (see Figure 3.10) and draws a line between them. For example, the call for endpoints (1,1) and (3,2) would turn on pixels (1,1) and (3,2) and fill in one pixel between them. For general screen coordinate endpoints (x0,y0) and (x1,y1), the routine should draw some “reasonable” set of pixels that approximates a line between them. Drawing such lines is based on line equations, and we have two types of equations to choose from: implicit and parametric. This section describes the approach using implicit lines.
大多数图形包都包含一个线条绘制命令,该命令以屏幕坐标系中的两个端点为输入(参见图 3.10 ),并在它们之间画一条线。例如,对端点 (1,1) 和 (3,2) 的调用将打开像素 (1,1) 和 (3,2),并在它们之间填充一个像素。对于一般的屏幕坐标端点 ( x 0 ,y 0 ) 和 ( x 1 ,y 1 ),例程应该绘制一些“合理”的像素集,以近似于它们之间的一条线。绘制这样的线条基于线方程,我们有两种类型的方程可供选择:隐式和参数化。本节介绍使用隐式线条的方法。
Even though we often use integer-valued endpoints for examples, it’s important to properly support arbitrary endpoints.
尽管我们经常使用整数值端点作为示例,但正确支持任意端点也很重要。
The most common way to draw lines using implicit equations is the midpoint algorithm ((Pitteway, 1967; van Aken & Novak, 1985)). The midpoint algorithm ends up drawing the same lines as the Bresenham algorithm (Bresenham, 1965), but it is somewhat more straightforward.
使用隐式方程绘制直线的最常见方法是中点算法(Pitteway,1967;van Aken & Novak,1985)。中点算法最终绘制的直线与Bresenham 算法(Bresenham,1965),但它更简单一些。
The first thing to do is find the implicit equation for the line as discussed in Section 2.7.2:
首先要做的是找到直线的隐式方程,如第 2.7.2 节所述:
We assume that x0 ≤ x1. If that is not true, we swap the points so that it is true. The slope m of the line is given by
我们假设x 0 ≤ x 1 。如果不正确,我们交换点以使其成立。直线的斜率m为
The following discussion assumes m ∈ (0, 1]. Analogous discussions can be derived for m ∈ (–∞, –1], m ∈ (–1, 0],and m ∈ (1, ∞). The four cases cover all possibilities.
以下讨论假设m ∈ (0, 1)。对于m ∈ (-∞, -1]、 m ∈ (-1, 0] 和m ∈ (1, ∞) 也可以得出类似的讨论。这四种情况涵盖了所有可能性。
For the case m ∈ (0, 1], there is more “run” than “rise” ; i.e., the line is moving faster in x than in y. If we have an API where the y-axis points downward, we might have a concern about whether this makes the process harder, but, in fact, we can ignore that detail. We can ignore the geometric notions of “up” and “down,” because the algebra is exactly the same for the two cases. Cautious readers can confirm that the resulting algorithm works for the y-axis downward case. The key assumption of the midpoint algorithm is that we draw the thinnest line possible that has no gaps. A diagonal connection between two pixels is not considered a gap.
对于m ∈ (0, 1] 的情况,“运行”比“上升”更多;即线在x 方向上的移动速度比在y方向上的速度更快。如果我们有一个 API,其中y轴向下,我们可能会担心这是否会使过程变得更加困难,但实际上,我们可以忽略这个细节。我们可以忽略“上”和“下”的几何概念,因为这两种情况的代数完全相同。细心的读者可以确认,生成的算法适用于y轴向下的情况。中点算法的关键假设是我们绘制尽可能细的没有间隙的线。两个像素之间的对角线连接不被视为间隙。
As the line progresses from the left endpoint to the right, there are only two possibilities: draw a pixel at the same height as the pixel drawn to its left, or draw a pixel one higher. There will always be exactly one pixel in each column of pixels between the endpoints. Zero would imply a gap, and two would be too thick a line. There may be two pixels in the same row for the case we are considering; the line is more horizontal than vertical, so sometimes it will go right and sometimes up. This concept is shown in Figure 9.2, where three “reasonable” lines are shown, each advancing more in the horizontal direction than in the vertical direction.
当线从左端点向右延伸时,只有两种可能性:在与左侧像素相同的高度绘制一个像素,或者在高一个像素的位置绘制一个像素。端点之间的每列像素中始终只有一个像素。零表示有间隙,而两个像素表示线太粗。对于我们正在考虑的情况,同一行中可能有两个像素;线条的水平方向比垂直方向更明显,因此有时会向右,有时会向上。图 9.2显示了这一概念,其中显示了三条“合理”的线,每条线在水平方向上的推进幅度都大于在垂直方向上的推进幅度。
Figure 9.2. Three “reasonable” lines that go seven pixels horizontally and three pixels vertically.
图 9.2。三条“合理”的线,水平方向有七个像素,垂直方向有三个像素。
The midpoint algorithm for m ∈ (0, 1] first establishes the leftmost pixel and the column number (x-value) of the rightmost pixel and then loops horizontally establishing the row (y-value) of each pixel. The basic form of the algorithm is
m ∈ (0, 1] 的中点算法首先确定最左边的像素和最右边像素的列号(x 值),然后水平循环确定每个像素的行(y 值)。该算法的基本形式是
y = y0
for x = x0 to x1 do
draw(x, y)
if (some condition) then
y = y +1
Note that x and y are integers. In words this says, “keep drawing pixels from left to right and sometimes move upward in the y-direction while doing so.” The key is to establish efficient ways to make the decision in the if statement.
请注意, x和y是整数。用文字来说,这表示“继续从左到右绘制像素,有时在绘制过程中沿y方向向上移动。”关键是建立有效的方法来在if语句中做出决定。
An effective way to make the choice is to look at the midpoint of the line between the two potential pixel centers. More specifically, the pixel just drawn is pixel (x, y) whose center in real screen coordinates is at (x, y). The candidate pixels to be drawn to the right are pixels (x + 1,y) and (x + 1,y + 1). The midpoint between the centers of the two candidate pixels is (x +1,y +0.5). If the line passes below this midpoint, we draw the bottom pixel, and otherwise, we draw the top pixel (Figure 9.3).
做出选择的有效方法是查看两个潜在像素中心之间连线的中点。更具体地说,刚刚绘制的像素是像素 ( x, y ),其实际屏幕坐标中的中心位于 ( x, y )。要绘制在右侧的候选像素是像素 ( x + 1, y ) 和 ( x + 1, y + 1 )。两个候选像素中心之间的中点是 ( x +1, y +05)。如果连线从该中点下方穿过,则绘制底部像素,否则绘制顶部像素(图 9.3 )。
Figure 9.3. Top: the line goes above the midpoint, so the top pixel is drawn. Bottom: the line goes below the midpoint, so the bottom pixel is drawn.
图 9.3。顶部:线在中点上方,因此绘制顶部像素。底部:线在中点下方,因此绘制底部像素。
To decide whether the line passes above or below (x+1,y +0.5),weevaluate f (x +1,y +0.5) in Equation (9.1). Recall from Section 2.7.1 that f (x, y) = 0 for points (x, y) on the line, f (x, y) > 0 for points on one side of the line, and f (x, y) < 0 for points on the other side of the line. Because –f (x, y) = 0 and f (x, y) = 0 are both perfectly good equations for the line, it is not immediately clear whether f (x, y) being positive indicates that (x, y) is above the line or whether it is below. However, we can figure it out; the key term in Equation (9.1) is the y term (x1 – x0)y . Note that (x1 – x0) is definitely positive because x1 > x0. This means that as y increases, the term (x1 – x0)y gets larger (i.e., more positive or less negative). Thus, the case f (x, +∞) is definitely positive, and definitely above the line, implying points above the line are all positive.
为了判断直线是通过 ( x +1, y +05) 的上方还是下方,我们在公式 (9.1) 中计算f ( x +1, y +05)。回想一下2.7.1 节,对于直线上的点 ( x , y ), f ( x, y ) = 0 ,对于直线一侧的点,f ( x, y ) > 0,而对于直线另一侧的点, f ( x, y ) < 0。因为 -f ( x, y ) = 0 和f ( x , y ) = 0 都是直线的完美方程,所以无法立即判断f ( x, y ) 为正表示 ( x, y ) 是在线上方还是下方。但是,我们可以弄清楚;公式 (9.1) 中的关键项是y项 ( x 1 - x 0 ) y 。请注意,( x 1 – x 0 ) 肯定为正,因为x 1 > x 0 。这意味着,随着y 的增加,( x 1 – x 0 ) y项会变大(即,正值变大或负值变小)。因此, f ( x, +∞) 的情况肯定为正,并且肯定位于直线上方,这意味着直线上方的点都是正值。
Another way to look at it is that the y component of the gradient vector is positive. So above the line, where y can increase arbitrarily, f (x, y) must be positive. This means we can make our code more specific by filling in the if statement:
另一种看待它的方式是梯度向量的y分量是正的。因此,在y可以任意增加的线上方, f ( x, y ) 必须为正。这意味着我们可以通过填写if语句来使我们的代码更加具体:
if f (x + 1,y + 0.5) < 0 then
y = y + 1
The above code will work nicely for lines of the appropriate slope (i.e., between zero and one). The reader can work out the other three cases which differ only in small details.
上述代码对于斜率合适的直线(即介于 0 和 1 之间)效果很好。读者可以计算出其他三种仅在细节上有所不同的情况。
If greater efficiency is desired, using an incremental method can help. An incremental method tries to make a loop more efficient by reusing computation from the previous step. In the midpoint algorithm as presented, the main computation is the evaluation of f (x +1,y +0.5). Note that inside the loop, after the first iteration, either we already evaluated f (x – 1,y +0.5) or f (x – 1,y – 0.5) (Figure 9.4). Note also this relationship:
如果希望提高效率,使用增量方法会有所帮助。增量方法试图通过重用上一步的计算来提高循环效率。在所示的中点算法中,主要计算是对f ( x +1, y +0.5) 的求值。请注意,在循环内部,在第一次迭代之后,我们要么已经求值了f ( x – 1, y +0.5),要么已经求值了f ( x – 1, y – 0.5)(图 9.4 )。还请注意以下关系:
Figure 9.4. When using the decision point shown between the two orange pixels, we just drew the blue pixel, so we evaluated f at one of the two left points shown.
图 9.4.当使用两个橙色像素之间显示的决策点时,我们只绘制蓝色像素,因此我们在所示的两个左侧点之一处评估f 。
This allows us to write an incremental version of the code:
这使得我们可以编写代码的增量版本:
y = y0
d = f (x0 + 1,y0 + 0.5)
for x = x0 to x1 do
draw(x, y)
if d < 0 then
y = y + 1
d = d + (x1 – x0) + (y0 – y1)
else
d = d + (y0 – y1)
This code should run faster since it has little extra setup cost compared to the non-incremental version (that is not always true for incremental algorithms), but it may accumulate more numeric error because the evaluation of f (x, y + 0.5) may be composed of many adds for long lines. However, given that lines are rarely longer than a few thousand pixels, such an error is unlikely to be critical. Slightly longer setup cost, but faster loop execution, can be achieved by storing (x1 – x0)+(y0 – y1) and (y0 – y1) as variables. We might hope a good compiler would do that for us, but if the code is critical, it would be wise to examine the results of compilation to make sure.
此代码运行速度应该更快,因为与非增量版本相比,它几乎没有额外的设置成本(对于增量算法来说并非总是如此),但它可能会积累更多的数字错误,因为f ( x, y + 0.5) 的评估可能由长行的许多加法组成。 但是,考虑到线条很少超过几千个像素,这种错误不太可能很严重。 通过将 ( x 1 – x 0 )+( y 0 – y 1 ) 和 ( y 0 – y 1 ) 存储为变量,可以实现稍长的设置成本,但更快的循环执行。 我们可能希望一个好的编译器可以为我们做到这一点,但如果代码至关重要,那么检查编译结果以确保万无一失是不明智的。
We often want to draw a 2D triangle with 2D points p0 = (x0,y0), p1 = (x1,y1), and p2 = (x2,y2) in screen coordinates. This is similar to the line drawing problem, but it has some of its own subtleties. As with line drawing, we may wish to interpolate color or other properties from values at the vertices. This is straightforward if we have the barycentric coordinates (Section 2.9). For example, if the vertices have colors c0, c1, and c2, the color at a point in the triangle with barycentric coordinates (α, β, γ) is
我们经常想在屏幕坐标系中绘制一个二维三角形,其中包含二维点p 0 = ( x 0 , y 0 )、 p 1 = ( x 1 , y 1 ) 和p 2 = ( x 2 , y 2 )。这类似于线条绘制问题,但它有一些自己的微妙之处。与线条绘制一样,我们可能希望从顶点处的值插入颜色或其他属性。如果我们有重心坐标(第 2.9 节),这很简单。例如,如果顶点的颜色为c 0 、 c 1和c 2 ,则重心坐标为 (α, β, γ) 的三角形中某一点的颜色为
This type of interpolation of color is known in graphics as Gouraud interpolation after its inventor (Gouraud, 1971).
这种颜色插值类型在图形学中被称为Gouraud插值,以其发明者的名字命名(Gouraud,1971)。
Another subtlety of rasterizing triangles is that we are usually rasterizing triangles that share vertices and edges. This means we would like to rasterize adjacent triangles, so there are no holes. We could do this by using the midpoint algorithm to draw the outline of each triangle and then fill in the interior pixels. This would mean adjacent triangles both draw the same pixels along each edge. If the adjacent triangles have different colors, the image will depend on the order in which the two triangles are drawn. The most common way to rasterize triangles that avoids the order problem and eliminates holes is to use the convention that pixels are drawn if and only if their centers are inside the triangle; i.e., the barycentric coordinates of the pixel center are all in the interval (0, 1). Thisraises the issue of what to do if the center is exactly on the edge of the triangle. There are several ways to handle this as will be discussed later in this section. The key observation is that barycentric coordinates allow us to decide whether to draw a pixel and what color that pixel should be if we are interpolating colors from the vertices. So our problem of rasterizing the triangle boils down to efficiently finding the barycentric coordinates of pixel centers (Pineda, 1988). The brute force rasterization algorithm is
光栅化三角形的另一个微妙之处在于,我们通常光栅化共享顶点和边的三角形。这意味着我们希望光栅化相邻的三角形,这样就不会出现空洞。我们可以通过使用中点算法绘制每个三角形的轮廓,然后填充内部像素来实现这一点。这意味着相邻的三角形都沿着每条边绘制相同的像素。如果相邻的三角形具有不同的颜色,则图像将取决于绘制这两个三角形的顺序。避免顺序问题并消除空洞的最常见光栅化三角形方法是使用以下惯例:当且仅当像素的中心位于三角形内部时,才会绘制像素;即,像素中心的重心坐标都在区间 (0, 1) 内。这引发了一个问题,即如果中心恰好位于三角形的边缘,该怎么办。有几种方法可以解决这个问题,我们将在本节后面讨论。关键的观察是,重心坐标允许我们决定是否绘制像素以及如果我们从顶点插入颜色,该像素应该是什么颜色。因此,我们对三角形进行光栅化的问题归结为有效地找到像素中心的重心坐标(Pineda,1988)。强力光栅化算法是
for all x do
for all y do
compute (α, β, γ) for (x, y)
if (α ∈ [0, 1] and β ∈ [0, 1] and γ ∈ [0, 1]) then
c = αc0 + βc1 + γc2
drawpixel (x, y) with color c
The rest of the algorithm limits the outer loops to a smaller set of candidate pixels and makes the barycentric computation efficient.
该算法的其余部分将外循环限制到较小的候选像素集,并使重心计算高效。
We can add a simple efficiency by finding the bounding rectangle of the three vertices and only looping over this rectangle for candidate pixels to draw. We can compute barycentric coordinates using Equation (2.32). This yields the algorithm:
我们可以通过找到三个顶点的边界矩形并仅循环遍历此矩形以绘制候选像素来提高效率。我们可以使用公式 (2.32) 计算重心坐标。这产生了以下算法:
xmin = floor(xi)
xmax = ceiling(xi)
ymin = floor(yi)
ymax = ceiling(yi)
for y = ymin to ymax do
for x = xmin to xmax do
α = f12(x, y)/f12(x0,y0)
β = f20(x, y)/f20(x1,y1)
γ = f01(x, y)/f01(x2,y2)
if (α > 0 and β > 0 and γ > 0) then
c = αc0 + βc1 + γc2
drawpixel (x, y) with color c
Here, fij is the line given by Equation (9.1) with the appropriate vertices:
这里, f ij是由公式 (9.1) 给出的具有适当顶点的直线:
Note that we have exchanged the test α ∈ (0, 1) with α > 0 etc., because if all of α, β, γ are positive, then we know they are all less than one because α + β + γ = 1. We could also compute only two of the three barycentric variables and get the third from that relation, but it is not clear that this saves computation once the algorithm is made incremental, which is possible as in the line drawing algorithms; each of the computations of α, β, and γ does an evaluation of the form f (x, y) = Ax + By + C. In the inner loop, only x changes, and it changes by one. Note that f (x +1,y) = f (x, y)+ A. This is the basis of the incremental algorithm. In the outer loop, the evaluation changes for f (x, y) to f (x, y + 1), so a similar efficiency can be achieved. Because α, β, and γ change by constant increments in the loop, so does the color c. So this can be made incremental as well. For example, the red value for pixel (x + 1,y) differs from the red value for pixel (x, y) by a constant amount that can be precomputed. An example of a triangle with color interpolation is shown in Figure 9.5.
请注意,我们已经将测试 α ∈ (0, 1) 与 α > 0 等交换,因为如果 α、β、γ 全部为正,那么我们知道它们都小于一,因为 α + β + γ = 1。我们也可以只计算三个重心变量中的两个,并从该关系中得到第三个,但不清楚一旦算法变为增量式,这是否会节省计算量,这在线绘制算法中是可能的;α、β 和 γ 的每次计算都会执行形式为f ( x, y ) = Ax + By + C的评估。在内循环中,只有x变化,并且变化一。请注意f ( x +1, y ) = f ( x, y ) + A 。这是增量算法的基础。在外循环中, f ( x, y ) 的评估变为f ( x, y + 1),因此可以实现类似的效率。由于 α、β 和 γ 在循环中以恒定增量变化,颜色c也是如此。因此,这也可以设为增量。例如,像素 ( x + 1, y ) 的红色值与像素 ( x, y ) 的红色值相差一个可以预先计算的常数量。图 9.5显示了具有颜色插值的三角形示例。
Figure 9.5. A colored triangle with barycentric interpolation. Note that the changes in color components are linear in each row and column as well as along each edge. In fact, it is constant along every line, such as the diagonals, as well.
图 9.5.带重心插值的彩色三角形。请注意,颜色分量的变化在每一行和每一列以及每条边上都是线性的。事实上,它沿着每条线(例如对角线)也是恒定的。
We have still not discussed what to do for pixels whose centers are exactly on the edge of a triangle. If a pixel is exactly on the edge of a triangle, then it is also on the edge of the adjacent triangle if there is one. There is no obvious way to award the pixel to one triangle or the other. The worst decision would be to not draw the pixel because a hole would result between the two triangles. Better, but still not good, would be to have both triangles draw the pixel. If the triangles are transparent, this will result in a double-coloring. We would really like to award the pixel to exactly one of the triangles, and we would like this process to be simple; which triangle is chosen does not matter as long as the choice is well defined.
我们还没有讨论如何处理中心正好位于三角形边缘的像素。如果一个像素正好位于三角形的边缘,那么它也位于相邻三角形的边缘(如果有的话)。没有明显的方法将像素授予一个三角形或另一个三角形。最糟糕的决定是不绘制像素,因为两个三角形之间会出现一个洞。更好的(但仍然不是最好的)方法是让两个三角形都绘制像素。如果三角形是透明的,这将导致双重着色。我们真的希望将像素授予其中一个三角形,我们希望这个过程简单;选择哪个三角形并不重要,只要选择明确即可。
One approach is to note that any off-screen point is definitely on exactly one side of the shared edge and that is the edge we will draw. For two non-overlapping triangles, the vertices not on the edge are on opposite sides of the edge from each other. Exactly one of these vertices will be on the same side of the edge as the off-screen point (Figure 9.6). This is the basis of the test. The test if numbers p and q have the same sign can be implemented as the test pq > 0, which is very efficient in most environments.
一种方法是注意任何屏幕外的点肯定位于共享边的恰好一侧,而这正是我们要绘制的边。对于两个不重叠的三角形,不在边上的顶点彼此位于边的相对侧。这些顶点中恰好有一个与屏幕外的点位于边的同一侧(图 9.6 )。这是测试的基础。测试数字p和q是否具有相同的符号可以实现为测试pq > 0,这在大多数环境中都非常有效。
Figure 9.6. The off-screen point will be on one side of the triangle edge or the other. Exactly one of the non-shared vertices a and b will be on the same side.
图 9.6。屏幕外点将位于三角形边的一侧或另一侧。非共享顶点a和b中恰好有一个位于同一侧。
Note that the test is not perfect because the line through the edge may also go through the off-screen point, but we have at least greatly reduced the number of problematic cases. Which off-screen point is used is arbitrary, and (x, y) = (–1, –1) is as good a choice as any. We will need to add a check for the case of a point exactly on an edge. We would like this check not to be reached for common cases, which are the completely inside or outside tests. This suggests
请注意,该测试并不完美,因为通过边缘的线也可能经过屏幕外的点,但我们至少大大减少了有问题的情况。使用哪个屏幕外的点是任意的,并且 ( x, y ) = (-1, -1) 是任何选择都一样好。我们需要添加一个检查,以检查点恰好位于边缘的情况。我们希望这种检查不会在常见情况下达到,即完全在内部或外部的测试。这表明
fα = f12(x0,y0)
fβ = f20(x1,y1)
fγ = f01(x2,y2)
for y = ymin to ymax do
for x = xmin to xmax do
α = f12(x, y)/fα
β = f20(x, y)/fβ
γ = f01(x, y)/fγ
if (α ≥ 0 and β ≥ 0 and γ ≥ 0) then
if (α > 0 or fαf12(–1, –1) > 0) and
(β > 0 or fβf20(–1, –1) > 0) and
(γ > 0 or fγf01(–1, –1) > 0) then
c = αc0 + βc1 + γc2
drawpixel (x, y) with color c
We might expect that the above code would work to eliminate holes and double-draws only if we use exactly the same line equation for both triangles. In fact, the line equation is the same only if the two shared vertices have the same order in the draw call for each triangle. Otherwise, the equation might flip in sign. This could be a problem depending on whether the compiler changes the order of operations. So if a robust implementation is needed, the details of the compiler and arithmetic unit may need to be examined. The first four lines in the pseudocode above must be coded carefully to handle cases where the edge exactly hits the pixel center.
我们可能认为,只有对两个三角形使用完全相同的线方程,上述代码才能消除孔洞和重复绘制。事实上,只有当两个共享顶点在每个三角形的绘制调用中具有相同的顺序时,线方程才是相同的。否则,方程的符号可能会翻转。这可能是一个问题,具体取决于编译器是否更改了操作顺序。因此,如果需要一个健壮的实现,则可能需要检查编译器和算术单元的细节。必须仔细编码上述伪代码中的前四行,以处理边缘恰好击中像素中心的情况。
In addition to being amenable to an incremental implementation, there are several potential early exit points. For example, if α is negative, there is no need to compute β or γ. While this may well result in a speed improvement, profiling is always a good idea; the extra branches could reduce pipelining or concurrency and might slow down the code. So as always, test any attractive-looking optimizations if the code is a critical section.
除了适合增量实现之外,还有几个潜在的早期退出点。例如,如果 α 为负,则无需计算 β 或 γ。虽然这很可能导致速度提高,但分析始终是一个好主意;额外的分支可能会减少流水线或并发性,并可能减慢代码速度。因此,如果代码是关键部分,请像往常一样测试任何有吸引力的优化。
Another detail of the above code is that the divisions could be divisions by zero for degenerate triangles, i.e., if fγ = 0. Either the floating point error conditions should be accounted for properly, or another test will be needed.
上述代码的另一个细节是,对于退化三角形,除法可能为零,即,如果f γ = 0。要么应该正确考虑浮点错误条件,要么需要进行另一个测试。
There are some subtleties in achieving correct-looking perspective when interpolating quantities, such as texture coordinates or 3D positions, that need to vary linearly across the 3D triangles. We’ll use texture coordinates as an example of a quantity where perspective correction is important, but the same considerations apply to any attribute where linearity in 3D space is important.
在对纹理坐标或 3D 位置等需要在 3D 三角形上线性变化的量进行插值时,实现正确透视需要一些微妙之处。我们将使用纹理坐标作为透视校正很重要的量的示例,但同样的考虑也适用于 3D 空间中的线性很重要的任何属性。
The reason things are not straightforward is that just interpolating texture coordinates in screen space results in incorrect images, as shown for the grid texture in Figure 9.7. Because things in perspective get smaller as the distance to the viewer increases, the lines that are evenly spaced in 3D should compress in 2D image space. More careful interpolation of texture coordinates is needed to accomplish this.
事情之所以不那么简单,是因为仅仅在屏幕空间中插入纹理坐标会导致图像不正确,如图 9.7中的网格纹理所示。由于透视图中的物体随着与观察者的距离增加而变小,因此在 3D 中均匀分布的线条应该在 2D 图像空间中压缩。要实现这一点,需要更仔细地插入纹理坐标。
Figure 9.7. Left: correct perspective. Right: interpolation in screen space.
图 9.7。左:正确的透视。右:屏幕空间中的插值。
We can implement texture mapping on triangles by interpolating the (u, v) coordinates, modifying the rasterization method of Section 9.1.2, but this results in the problem shown at the right of Figure 9.7. A similar problem occurs for triangles if screen space barycentric coordinates are used as in the following rasterization code:
我们可以通过插值( u,v )坐标来实现三角形上的纹理映射,修改第 9.1.2 节的光栅化方法,但这会导致图 9.7右侧所示的问题。如果使用屏幕空间重心坐标,三角形也会出现类似的问题,如以下光栅化代码所示:
for all x do
for all y do
compute (α, β, γ) for (x, y)
if α ∈ (0, 1) and β ∈ (0, 1) and γ ∈ (0, 1) then
t = αt0 + βt1 + γt2
drawpixel (x, y) with color texture(t) for a solid texture
or with texture(β, γ) for a 2D texture.
This code will generate images, but there is a problem. To unravel the basic problem, let’s consider the progression from world space q to homogeneous point r to homogenized point s:
此代码将生成图像,但存在一个问题。为了解决基本问题,让我们考虑从世界空间q到齐次点r再到齐次点s的进展:
The simplest form of the texture coordinate interpolation problem is when we have texture coordinates (u, v) associated with two points, q and Q, and we need to generate texture coordinates in the image along the line between s and S. Ifthe world-space point q′ that is on the line between q and Q projects to the screen-space point s′ on the line between s and S, then the two points should have the same texture coordinates.
纹理坐标插值问题最简单的形式是,当我们有与两个点q和Q相关的纹理坐标 ( u, v ) 时,我们需要沿s和S之间的连线在图像中生成纹理坐标。如果q和Q之间的连线上的世界空间点q ′ 投影到s和S之间的连线上的屏幕空间点s ′ ,那么这两个点应该具有相同的纹理坐标。
The naïve screen-space approach, embodied by the algorithm above, says that at the point s′ = s + α(S – s), we should use texture coordinates us + α(uS – us) and vs + α(vS – vs) . This doesn’t work correctly because the world-space point q′ that transforms to s′ is not q + α(Q – q).
上述算法体现了简单的屏幕空间方法,即在点s ′ = s + α( S – s ) 处,我们应该使用纹理坐标u s + α( u S – u s ) 和v s + α( v S – v s )。这无法正常工作,因为转换为s ′ 的世界空间点q ′不是q + α( Q – q )。
However, we know from Section 8.4 that the points on the line segment between q and Q do end up somewhere on the line segment between s and S; infact, in that section we showed that
然而,我们从第 8.4 节知道, q和Q之间的线段上的点最终会落在s和S之间的线段上的某个地方;事实上,在那一节中我们证明了
The interpolation parameters t and α are not the same, but we can compute one from the other:1
插值参数t和 α 并不相同,但我们可以通过一个参数计算出另一个参数: 1
These equations provide one possible fix to the screen-space interpolation idea. To get texture coordinates for the screen-space point s′ = s + α(S – s), compute u′s = us + t(α)(uS – us) and v′s = vs + t(α)(vS – vs). These are the coordinates of the point q′ that maps to s′, so this will work. However, it is slow to evaluate t(α) for each fragment, and there is a simpler way.
这些方程为屏幕空间插值思想提供了一种可能的解决方法。要获取屏幕空间点s ′ = s + α( S – s ) 的纹理坐标,请计算u ′ s = u s + t (α)( u S – u s ) 和v ′ s = v s + t (α)( v S – v s )。这些是映射到s ′ 的点q ′ 的坐标,因此这种方法可行。但是,为每个片段评估t (α) 的速度很慢,还有一种更简单的方法。
The key observation is that because, as we know, the perspective transform preserves lines and planes, it is safe to linearly interpolate any attributes we want across triangles, but only as long as they go through the perspective transformation along with the points (Figure 9.8). To get a geometric intuition for this, reduce the dimension so that we have homogeneous points (xr,yr,wr) and a single attribute u being interpolated. The attribute u is supposed to be a linear function of xr and yr,soifweplot u as a height field over (xr,yr), the result is a plane. Now, if we think of u as a third spatial coordinate (call it ur to emphasize that it’s treated the same as the others) and send the whole 3D homogeneous point (xr,yr,ur,wr) through the perspective transformation, the result (xs,ys,us) still generates points that lie on a plane. There will be some warping within the plane, but the plane stays flat. This means that us is a linear function of (xs,ys)—which is to say, we can compute us anywhere by using linear interpolation based on the coordinates (xs,ys).
关键的观察是,因为我们知道透视变换保留了线和平面,所以可以安全地在三角形之间线性插值任何我们想要的属性,但前提是它们与点一起经过透视变换(图 9.8 )。为了获得对此的几何直觉,请降低维度,以便我们得到同质点( xr , yr , wr )和一个要插值的属性u 。属性u应该是xr和yr的线性函数,因此如果我们将u绘制为( xr , yr )上的高度场,结果就是一个平面。现在,如果我们将u视为第三个空间坐标(将其称为u r以强调其与其他坐标相同),并将整个 3D 同质点 ( x r , y r , u r , w r ) 发送通过透视变换,则结果 ( x s , y s , u s ) 仍会生成位于平面上的点。平面内会有一些扭曲,但平面保持平坦。 这意味着u s是 ( x s , y s ) 的线性函数——也就是说,我们可以根据坐标 ( x s , y s ) 使用线性插值来计算任意位置的u s 。
Figure 9.8. Geometric reasoning for screen-space interpolation. Top: ur is to be interpolated as a linear function of (xr,yr). Bottom: after a perspective transformation from (xr,yr,ur,wr) to (xs,ys,us, 1), us is a linear function of (xs,ys).
图 9.8。屏幕空间插值的几何推理。顶部: u r将作为 ( x r , y r ) 的线性函数进行插值。底部:在从 ( x r , y r , u r , w r ) 到 ( x s , y s , u s , 1 ) 的透视变换之后, u s是 ( x s , y s ) 的线性函数。
Returning to the full problem, we need to interpolate texture coordinates (u, v) that are linear functions of the world space coordinates (xq,yq,zq). After transforming the points to screen space, and adding the texture coordinates as if they were additional coordinates, we have
回到完整的问题,我们需要插入纹理坐标 ( u, v ),它们是世界空间坐标 ( x q , y q , z q ) 的线性函数。将点转换为屏幕空间,并将纹理坐标添加为附加坐标后,我们得到
1 It is worthwhile to derive these functions yourself from Equation (7.6); in that chapter’s notation, α = f(t).
1值得自己根据公式 (7.6) 推导出这些函数;在该章的符号中,α = f ( t )。
The practical implication of the previous paragraph is that we can go ahead and interpolate all of these quantities based on the values of (xs,ys)—including the value zs, used in the z-buffer. The problem with the naïve approach is simply that we are interpolating components selected inconsistently—as long as the quantities involved are from before or all from after the perspective divide, all will be well.
上一段的实际含义是,我们可以继续根据( xs , ys )的值(包括 z 缓冲区中使用的zs值)插入所有这些量。这种简单方法的问题在于,我们插入的组件选择不一致——只要所涉及的量来自透视除法之前或之后,一切都会很好。
The one remaining problem is that (u/wr,v/wr) is not directly useful for looking up texture data; we need (u, v). This explains the purpose of the extra parameter we slipped into (9.3), whose value is always 1: once we have u/wr, v/wr,and 1/wr, we can easily recover (u, v) by dividing.
剩下的一个问题是( u/w r , v/w r )对于查找纹理数据没有直接用处;我们需要( u, v )。这解释了我们在(9.3)中插入的额外参数的目的,其值始终为 1:一旦我们有了u/w r 、 v/w r和 1 /w r ,我们就可以通过除法轻松恢复( u, v )。
To verify that this is all correct, let’s check that interpolating the quantity 1/wr in screen space indeed produces the reciprocal of the interpolated wr in world space. To see this is true, confirm (Exercise 2):
为了验证这一切是否正确,让我们检查一下在屏幕空间中插入数量 1 /w r是否确实会产生在世界空间中插入的w r的倒数。要验证这一点,请确认(练习 2):
remembering that α(t) and t are related by Equation 9.2.
记住,α( t )和t通过公式9.2关联起来。
This ability to interpolate 1/wr linearly with no error in the transformed space allows us to correctly texture triangles. We can use these facts to modify our scan-conversion code for three points ti = (xi,yi,zi,wi) that have been passed through the viewing matrices, but have not been homogenized, complete with texture coordinates ti = (ui,vi):
这种在变换空间中无误差地线性插值 1 /w r的能力使我们能够正确地纹理三角形。我们可以利用这些事实来修改我们的扫描转换代码,用于三个点t = (x,y,z,w),这些点已经通过了观察矩阵,但还没有被均匀化,并带有纹理坐标t = (u,v):
for all xs do
for all ys do
compute (α, β, γ) for (xs,ys)
if (α ∈ [0, 1] and β ∈ [0, 1] and γ ∈ [0, 1]) then
us = α(u0/w0) + β(u1/w1) + γ(u2/w2)
vs = α(v0/w0) + β(v1/w1) + γ(v2/w2)
1s = α(1/w0) + β(1/w1) + γ(1/w2)
u = us/1s
v = vs/1s
drawpixel (xs,ys) with color texture(u, v)
Of course, many of the expressions appearing in this pseudocode would be precomputed outside the loop for speed.
当然,为了提高速度,这个伪代码中出现的许多表达式都会在循环外进行预先计算。
In practice, modern systems interpolate all attributes in a perspective-correct way, unless some other method is specifically requested.
实际上,现代系统会以透视正确的方式插入所有属性,除非特别要求使用其他方法。
Simply transforming primitives into screen space and rasterizing them does not quite work by itself. This is because primitives that are outside the view volume—particularly, primitives that are behind the eye—can end up being rasterized, leading to incorrect results. For instance, consider the triangle shown in Figure 9.9. Two vertices are in the view volume, but the third is behind the eye. The projection transformation maps this vertex to a nonsensical location behind the far plane, and if this is allowed to happen, the triangle will be rasterized incorrectly. For this reason, rasterization has to be preceded by a clipping operation that removes parts of primitives that could extend behind the eye.
简单地将图元变换到屏幕空间并对其进行栅格化本身并不能完全发挥作用。这是因为视空间之外的图元(尤其是位于眼睛后面的图元)最终可能会被栅格化,从而导致不正确的结果。例如,考虑图 9.9所示的三角形。两个顶点在视空间中,但第三个顶点在眼睛后面。投影变换会将此顶点映射到远平面后面一个毫无意义的位置,如果允许这种情况发生,三角形将被错误地栅格化。因此,在栅格化之前必须进行裁剪操作,以移除可能延伸到眼睛后面的图元部分。
Figure 9.9. The depth z is transformed to the depth z by the perspective transform. Note that when z moves from positive to negative, z switches from negative to positive. Thus vertices behind the eye are moved in front of the eye beyond z = n + f. This will lead to wrong results, which is why the triangle is first clipped to ensure all vertices are in front of the eye.
图 9.9.深度z转换为深度z通过透视变换。注意当z从正移到负时, z从负变为正。因此,眼睛后面的顶点被移动到眼睛前面,超出z = n + f 。这将导致错误的结果,这就是为什么首先要剪裁三角形以确保所有顶点都在眼前。
Clipping is a common operation in graphics, needed whenever one geometric entity “cuts” another. For example, if you clip a triangle against the plane x = 0, the plane cuts the triangle into two parts if the signs of the x-coordinates of the vertices are not all the same. In most applications of clipping, the portion of the triangle on the “wrong” side of the plane is discarded. This operation for a single plane is shown in Figure 9.10.
裁剪是图形学中的一种常见操作,当一个几何实体“裁剪”另一个几何实体时,就需要裁剪。例如,如果将三角形裁剪到平面x = 0,如果顶点x坐标的符号不完全相同,平面会将三角形裁剪成两部分。在大多数裁剪应用中,三角形在平面“错误”一侧的部分会被丢弃。图 9.10显示了单个平面的此操作。
Figure 9.10. A polygon is clipped against a clipping plane. The portion “inside” the plane is retained.
图 9.10.多边形被裁剪到裁剪平面上。裁剪平面“内部”的部分被保留。
In clipping to prepare for rasterization, the “wrong” side is the side outside the view volume. It is always safe to clip away all geometry outside the view volume—that is, clipping against all six faces of the volume—but many systems manage to get away with only clipping against the near plane.
在为光栅化做准备的裁剪过程中,“错误”的一侧是视图体积之外的一侧。裁剪掉视图体积之外的所有几何体(即裁剪体积的所有六个面)始终是安全的,但许多系统只裁剪近平面。
This section discusses the basic implementation of a clipping module. Those interested in implementing an industrial-speed clipper should see the book by Blinn mentioned in the notes at the end of this chapter.
本节讨论裁剪模块的基本实现。对实现工业级速度裁剪器感兴趣的人可以参阅本章末尾注释中提到的 Blinn 所著书籍。
The two most common approaches for implementing clipping are
实现剪辑的两种最常见方法是
In world coordinates using the six planes that bound the truncated viewing pyramid,
在世界坐标系中,使用包围截断视点金字塔的六个平面,
In the 4D transformed space before the homogeneous divide.
在同质分裂之前的四维变换空间中。
Either possibility can be effectively implemented (J. Blinn, 1996) using the following approach for each triangle:
对于每个三角形,可以使用以下方法有效实现任一可能性(J. Blinn,1996):
for each of six planes do
if (triangle entirely outside of plane) then
break (triangle is not visible)
else if triangle spans plane then
clip triangle
if (quadrilateral is left) then
break into two triangles
Option 1 has a straightforward implementation. The only question is, “What are the six plane equations?” Because these equations are the same for all triangles rendered in the single image, we do not need to compute them very efficiently. For this reason, we can just invert the transform shown in Figure 7.12 and apply it to the eight vertices of the transformed view volume:
选项 1 的实现很简单。唯一的问题是,“这六个平面方程是什么?”因为这些方程对于单个图像中渲染的所有三角形都是相同的,所以我们不需要非常高效地计算它们。因此,我们可以反转图 7.12中所示的变换并将其应用于变换后的视图体积的八个顶点:
The plane equations can be inferred from here. Alternatively, we can use vector geometry to get the planes directly from the viewing parameters.
平面方程可以从这里推导出来。或者,我们也可以使用矢量几何直接从观察参数中获取平面。
Surprisingly, the option usually implemented is that of clipping in homogeneous coordinates before the divide. Here, the view volume is 4D, and it is bounded by 3D volumes (hyperplanes). These are
令人惊讶的是,通常实施的选项是在划分之前在齐次坐标中进行裁剪。在这里,视图体积是 4D,并且由 3D 体积(超平面)界定。这些是
These planes are quite simple, so the efficiency is better than for Option 1. They still can be improved by transforming the view volume [l, r] × [b, t] × [f, n] to [0, 1]3. It turns out that the clipping of the triangles is not much more complicated than in 3D.
这些平面非常简单,因此效率比方案 1 要高。它们仍然可以通过将视图体积 [ l, r ] × [ b, t ] × [ f, n ] 转换为 [0, 1] 3来改进。事实证明,三角形的裁剪并不比 3D 复杂多少。
No matter which option we choose, we must clip against a plane. Recall from Section 2.7.5 that the implicit equation for a plane through point q with normal n is
无论我们选择哪种方式,我们都必须根据平面进行裁剪。回想一下2.7.5 节,通过点q且法线为n 的平面的隐式方程为
Interestingly, this equation not only describes a 3D plane, but also describes a line in 2D and the volume analog of a plane in 4D. All of these entities are usually called planes in their appropriate dimension.
有趣的是,这个方程不仅描述了三维平面,还描述了二维中的一条线和四维中平面的体积类似物。所有这些实体通常在其相应的维度上被称为平面。
If we have a line segment between points a and b, we can “clip” it against a plane using the techniques for cutting the edges of 3D triangles in BSP tree programs described in Section 12.4.3. Here, the points a and b are tested to determine whether they are on opposite sides of the plane f (p) = 0 by checking whether f (a) and f (b) have different signs. Typically, f (p) < 0 is defined to be “inside” the plane, and f (p) > 0 is “outside” the plane. If the plane does split the line, then we can solve for the intersection point by substituting the equation for the parametric line,
如果点a和b之间有一条线段,我们可以使用12.4.3 节中描述的 BSP 树程序中切割 3D 三角形边缘的技术,将其“裁剪”到平面上。这里,通过检查f ( a ) 和f ( b ) 是否具有不同的符号,测试点a和b以确定它们是否位于平面f ( p ) = 0 的相对侧。通常, f ( p ) < 0 定义为“在平面内”,而f ( p ) > 0 定义为“在平面外”。如果平面确实分割了这条线,那么我们可以通过将方程代入参数线来求解交点,
into the f (p) = 0 plane of Equation (9.5). This yields
进入方程 (9.5) 的f ( p ) = 0 平面。这得到
Solving for t gives
求解t可得
We can then find the intersection point and “shorten” the line.
然后我们可以找到交点并“缩短”这条线。
To clip a triangle, we again can follow Section 12.4.3 to produce one or two triangles.
要裁剪一个三角形,我们可以再次按照第 12.4.3 节来生成一个或两个三角形。
Before a primitive can be rasterized, the vertices that define it must be in screen coordinates, and the colors or other attributes that are supposed to be interpolated across the primitive must be known. Preparing this data is the job of the vertex-processing stage of the pipeline. In this stage, incoming vertices are transformed by the modeling, viewing, and projection transformations, mapping them from their original coordinates into screen space (where, recall, position is measured in terms of pixels). At the same time, other information, such as colors, surface normals, or texture coordinates, is transformed as needed; we’ll discuss these additional attributes in the examples below.
在将图元光栅化之前,定义它的顶点必须位于屏幕坐标中,并且必须知道应该插入到图元中的颜色或其他属性。准备这些数据是管道的顶点处理阶段的工作。在此阶段,传入的顶点通过建模、查看和投影变换进行变换,将它们从原始坐标映射到屏幕空间(回想一下,其中位置以像素为单位)。同时,其他信息(如颜色、表面法线或纹理坐标)也会根据需要进行变换;我们将在下面的示例中讨论这些附加属性。
After rasterization, further processing is done to compute a color and depth for each fragment. This processing can be as simple as just passing through an interpolated color and using the depth computed by the rasterizer; or it can involve complex shading operations. Finally, the blending phase combines the fragments generated by the (possibly several) primitives that overlapped each pixel to compute the final color. The most common blending approach is to choose the color of the fragment with the smallest depth (closest to the eye).
光栅化后,将进行进一步处理以计算每个片段的颜色和深度。此处理可以简单到只需传递插值颜色并使用光栅化器计算出的深度;或者它可能涉及复杂的着色操作。最后,混合阶段将由重叠每个像素的(可能是多个)图元生成的片段组合起来以计算最终颜色。最常见的混合方法是选择深度最小(最接近眼睛)的片段的颜色。
The purposes of the different stages are best illustrated by examples.
最好通过示例来说明不同阶段的目的。
The simplest possible pipeline does nothing in the vertex or fragment stages, and in the blending stage, the color of each fragment simply overwrites the value of the previous one. The application supplies primitives directly in pixel coordinates, and the rasterizer does all the work. This basic arrangement is the essence of many simple, older APIs for drawing user interfaces, plots, graphs, and other 2D content. Solid color shapes can be drawn by specifying the same color for all vertices of each primitive, and our model pipeline also supports smoothly varying color using interpolation.
最简单的管道在顶点或片段阶段不执行任何操作,而在混合阶段,每个片段的颜色只会覆盖前一个片段的值。应用程序直接在像素坐标中提供基元,光栅化器完成所有工作。这种基本安排是许多用于绘制用户界面、绘图、图形和其他 2D 内容的简单、较旧的 API 的本质。可以通过为每个基元的所有顶点指定相同的颜色来绘制纯色形状,我们的模型管道还支持使用插值平滑地改变颜色。
To draw objects in 3D, the only change needed to the 2D drawing pipeline is a single matrix transformation: the vertex-processing stage multiplies the incoming vertex positions by the product of the modeling, camera, projection, and viewport matrices, resulting in screen-space triangles that are then drawn in the same way as if they’d been specified directly in 2D.
为了在 3D 中绘制对象,2D 绘制管道所需要做的唯一改变就是进行一次矩阵变换:顶点处理阶段将传入的顶点位置乘以建模、相机、投影和视口矩阵的乘积,从而生成屏幕空间三角形,然后以与直接在 2D 中指定的方式相同的方式绘制这些三角形。
One problem with the minimal 3D pipeline is that in order to get occlusion relationships correct—to get nearer objects in front of farther away objects—primitives must be drawn in back-to-front order. This is known as the painter’s algorithm for hidden surface removal, by analogy to painting the background of a painting first, and then painting the foreground over it. The painter’s algorithm is a perfectly valid way to remove hidden surfaces, but it has several drawbacks. It cannot handle triangles that intersect one another, because there is no correct order in which to draw them. Similarly, several triangles, even if they don’t intersect, can still be arranged in an occlusion cycle, as shown in Figure 9.11, another case in which the back-to-front order does not exist. And most importantly, sorting the primitives by depth is slow, especially for large scenes, and disturbs the efficient flow of data that makes object-order rendering so fast. Figure 9.12 shows the result of this process when the objects are not sorted by depth.
最小 3D 管道的一个问题是,为了获得正确的遮挡关系(使较近的物体位于较远的物体前面),必须按从后到前的顺序绘制图元。这被称为画家算法,用于隐藏表面去除,类似于先绘制绘画的背景,然后在其上绘制前景。画家算法是一种去除隐藏表面的完美方法,但它有几个缺点。它无法处理相互交叉的三角形,因为没有正确的绘制顺序。同样,即使几个三角形不相交,它们仍然可以排列成遮挡循环,如图 9.11所示,这是另一种不存在从后到前的顺序的情况。最重要的是,按深度对图元进行排序很慢,尤其是对于大型场景,并且会干扰使对象顺序渲染如此快速的有效数据流。图 9.12显示了当对象未按深度排序时此过程的结果。
Figure 9.11. Two occlusion cycles, which cannot be drawn in back-to-front order.
图 9.11。两个遮挡循环,无法按从后到前的顺序绘制。
Figure 9.12. The result of drawing two spheres of identical size using the minimal pipeline. The sphere that appears smaller is farther away but is drawn last, so it incorrectly overwrites the nearer one.
图 9.12。使用最小管道绘制两个相同大小的球体的结果。看起来较小的球体距离较远,但最后绘制,因此它错误地覆盖了较近的球体。
In practice, the painter’s algorithm is rarely used; instead, a simple and effective hidden surface removal algorithm known as the z-buffer algorithm is used. The method is very simple: at each pixel, we keep track of the distance to the closest surface that has been drawn so far, and we throw away fragments that are farther away than that distance. The closest distance is stored by allocating an extra value for each pixel, in addition to the red, green, and blue color values, which is known as the depth, or z-value. The depth buffer, or z-buffer, is the name for the grid of depth values.
实际上,画家算法很少使用;相反,人们使用一种简单有效的隐藏表面移除算法,即z 缓冲算法。该方法非常简单:在每个像素处,我们跟踪到迄今为止绘制的最近表面的距离,并丢弃距离比该距离更远的片段。除了红色、绿色和蓝色值之外,通过为每个像素分配一个额外的值来存储最近距离,这被称为深度或 z 值。深度缓冲区或 z 缓冲区是深度值网格的名称。
The z-buffer algorithm is implemented in the fragment blending phase, by comparing the depth of each fragment with the current value stored in the z-buffer. If the fragment’s depth is closer, both its color and its depth value overwrite the values currently in the color and depth buffers. If the fragment’s depth is farther away, it is discarded. To ensure that the first fragment will pass the depth test, the z buffer is initialized to the maximum depth (the depth of the far plane). Irrespective of the order in which surfaces are drawn, the same fragment will win the depth test, and the image will be the same.
z 缓冲区算法在片段混合阶段实现,通过将每个片段的深度与 z 缓冲区中存储的当前值进行比较。如果片段的深度较近,则其颜色和深度值都会覆盖颜色和深度缓冲区中的当前值。如果片段的深度较远,则将其丢弃。为了确保第一个片段通过深度测试, z缓冲区被初始化为最大深度(远平面的深度)。无论表面的绘制顺序如何,同一个片段都会赢得深度测试,并且图像将是相同的。
Of course there can be ties in the depth test, in which case the order may well matter.
当然,深度测试中可能会存在平局,在这种情况下顺序可能很重要。
The z-buffer algorithm requires each fragment to carry a depth. This is done simply by interpolating the z-coordinate as a vertex attribute, in the same way that color or other attributes are interpolated.
Z 缓冲区算法要求每个片段都带有深度。这可以通过将z坐标插入为顶点属性来实现,就像颜色或其他属性的插入方式一样。
The z-buffer is such a simple and practical way to deal with hidden surfaces in object-order rendering that it is by far the dominant approach. It is much simpler than geometric methods that cut surfaces into pieces that can be sorted by depth, because it avoids solving any problems that don’t need to be solved. The depth order only needs to be determined at the locations of the pixels, and that is all that the z-buffer does. It is universally supported by hardware graphics pipelines and is also the most commonly used method for software pipelines. Figures 9.13 and 9.14 show example results.
Z 缓冲区是一种简单实用的方法来处理对象顺序渲染中的隐藏表面,因此它是目前为止占主导地位的方法。它比将表面切割成可以按深度排序的碎片的几何方法简单得多,因为它避免了解决任何不需要解决的问题。深度顺序只需要在像素的位置确定,这就是 Z 缓冲区所做的一切。它受到硬件图形管道的普遍支持,也是软件管道最常用的方法。图 9.13和9.14显示了示例结果。
Figure 9.13. The result of drawing the same two spheres using the z-buffer.
图 9.13.使用 z 缓冲区绘制相同两个球体的结果。
Figure 9.14. A z-buffer rasterizing two triangles in each of two possible orders. The first triangle is fully rasterized. The second triangle has every pixel computed, but for three of the pixels, the depth-contest is lost, and those pixels are not drawn. The final image is the same regardless.
图 9.14。Z缓冲区以两种可能的顺序对两个三角形进行光栅化。第一个三角形已完全光栅化。第二个三角形已计算每个像素,但对于其中三个像素,深度竞争失败,并且不会绘制这些像素。无论如何,最终图像都是相同的。
In practice, the z-values stored in the buffer are nonnegative integers. This is preferable to true floats because the fast memory needed for the z-buffer is somewhat expensive and is worth keeping to a minimum.
实际上,存储在缓冲区中的z值是非负整数。这比真正的浮点数更可取,因为 z 缓冲区所需的快速内存有点昂贵,值得保持在最低限度。
The use of integers can cause some precision problems. If we use an integer range having B values {0, 1,...,B – 1}, we can map 0 to the near clipping plane z = n and B –1 to the far clipping plane z = f . Note, that for this discussion, we assume z, n,and f are positive. This will result in the same results as the negative case, but the details of the argument are easier to follow. We send each z-value to a “bucket” with depth Δz = (f – n)/B. We would not use the integer z-buffer if memory were not a premium, so it is useful to make B as small as possible.
使用整数可能会导致一些精度问题。如果我们使用具有B个值{ 0, 1, ...,B – 1 }的整数范围,我们可以将 0 映射到近裁剪平面z = n ,将B –1 映射到远裁剪平面z = f 。请注意,对于本讨论,我们假设z 、 n和f为正数。这将导致与负数情况相同的结果,但论证的细节更容易理解。我们将每个z值发送到深度为 Δ z = ( f – n ) /B 的“存储桶”。如果内存不是很重要,我们不会使用整数 z 缓冲区,因此让B尽可能小是有用的。
If we allocate b bits to store the z-value, then B = 2b. We need enough bits to make sure any triangle in front of another triangle will have its depth mapped to distinct depth bins.
如果我们分配b位来存储z值,则B = 2 b 。我们需要足够的位来确保另一个三角形前面的任何三角形的深度都映射到不同的深度箱中。
For example, if you are rendering a scene where triangles have a separation of at least one meter, then Δz < 1 should yield images without artifacts. There are two ways to make Δz smaller: move n and f closer together or increase b. If b is fixed, as it may be in APIs or on particular hardware platforms, adjusting n and f is the only option.
例如,如果您渲染的场景中三角形之间的距离至少为 1 米,则 Δ z < 1 应能生成没有伪影的图像。有两种方法可以减小 Δ z :将n和f移近或增加b 。如果b是固定的(如 API 或特定硬件平台上的 b 一样),则调整n和f是唯一的选择。
The precision of z-buffers must be handled with great care when perspective images are created. The value Δz above is used after the perspective divide. Recall from Section 8.3 that the result of the perspective divide is
创建透视图像时,必须非常小心地处理 z 缓冲区的精度。上面的 Δ z值在透视除法之后使用。回想一下第 8.3 节,透视除法的结果是
The actual bin depth is related to zw, the world depth, rather than z, the post-perspective divide depth. We can approximate the bin size by differentiating both sides:
实际的 bin 深度与zw (世界深度)有关,而不是与z (透视分割后的深度)有关。我们可以通过微分两边来近似 bin 大小:
Bin sizes vary in depth. The bin size in world space is
箱体大小随深度不同而变化。世界空间中的箱体大小为
Note that the quantity Δz is as previously discussed. The biggest bin will be for z = f ,where
请注意,数量 Δ z与前面讨论的一样。最大的 bin 是z = f ,其中
Note that choosing n = 0, a natural choice if we don’t want to lose objects right in front of the eye, will result in an infinitely large bin—a very bad condition. To make Δzmaxw as small as possible, we want to minimize f and maximize n. Thus, it is always important to choose n and f carefully.
请注意,如果我们不想丢失眼前的物体,那么选择n = 0 是一个自然选择,但这会导致无限大的容器——这是一种非常糟糕的情况。为了使 Δ z max w尽可能小,我们希望最小化f并最大化n 。因此,谨慎选择n和f始终很重要。
So far the application sending triangles into the pipeline is responsible for setting the color; the rasterizer just interpolates the colors and they are written directly into the output image. For some applications, this is sufficient, but in many cases, we want 3D objects to be drawn with shading, using the same illumination equations that we used for image-order rendering in Chapter 4. Recall that these equations require a light direction, an eye direction, and a surface normal to compute the color of a surface.
到目前为止,将三角形发送到管道的应用程序负责设置颜色;光栅化器只是对颜色进行插值,并将它们直接写入输出图像。对于某些应用程序来说,这已经足够了,但在许多情况下,我们希望使用与第 4 章中用于图像顺序渲染的相同照明方程来绘制带有阴影的 3D 对象。回想一下,这些方程需要光方向、视线方向和表面法线来计算表面的颜色。
One way to handle shading computations is to perform them in the vertex stage. The application provides normal vectors at the vertices, and the positions and colors of the lights are provided separately (they don’t vary across the surface, so they don’t need to be specified for each vertex). For each vertex, the direction to the viewer and the direction to each light are computed based on the positions of the camera, the lights, and the vertex. The desired shading equation is evaluated to compute a color, which is then passed to the rasterizer as the vertex color. Per-vertex shading is sometimes called Gouraud shading.
处理着色计算的一种方法是将其在顶点阶段执行。应用程序在顶点处提供法线向量,并单独提供灯光的位置和颜色(它们不会在整个表面上变化,因此不需要为每个顶点指定它们)。对于每个顶点,根据相机、灯光和顶点的位置计算到观察者的方向和到每个灯光的方向。评估所需的着色方程以计算颜色,然后将其作为顶点颜色传递给光栅化器。每个顶点着色有时称为高洛德阴影。
One decision to be made is the coordinate system in which shading computations are done. World space or eye space are good choices. It is important to choose a coordinate system that is orthonormal when viewed in world space, because shading equations depend on angles between vectors, which are not preserved by operations like nonuniform scale that are often used in the modeling transformation, or perspective projection, often used in the projection to the canonical view volume. Shading in eye space has the advantage that we don’t need to keep track of the camera position, because the camera is always at the origin in eye space, in perspective projection, or the view direction is always +z in orthographic projection.
需要做出的一个决定是进行着色计算的坐标系。世界空间或眼空间都是不错的选择。选择一个在世界空间中观察时正交的坐标系很重要,因为着色方程取决于向量之间的角度,而这些角度不会通过建模变换中经常使用的非均匀缩放等操作来保留,或者透视投影通常用于投影到规范视图体积。在眼空间中进行着色的优点是我们不需要跟踪相机的位置,因为相机在眼空间中、在透视投影中始终位于原点,或者在正交投影中视图方向始终为 + z 。
Per-vertex shading has the disadvantage that it cannot produce any details in the shading that are smaller than the primitives used to draw the surface, because it only computes shading once for each vertex and never in between vertices. For instance, in a room with a floor that is drawn using two large triangles and illuminated by a light source in the middle of the room, shading will be evaluated only at the corners of the room, and the interpolated value will likely be much too dark in the center. Also, curved surfaces that are shaded with specular highlights must be drawn using primitives small enough that the highlights can be resolved.
逐顶点着色的缺点是,它无法在着色中产生任何小于用于绘制表面的图元的细节,因为它只为每个顶点计算一次着色,而从不在顶点之间计算着色。例如,在一个使用两个大三角形绘制地板并由房间中间的光源照亮的房间中,着色将仅在房间的角落处进行评估,并且插值在中心可能太暗。此外,使用镜面高光着色的曲面必须使用足够小的图元来绘制,以便可以解析高光。
Figure 9.15 shows our two spheres drawn with per-vertex shading.
图 9.15展示了我们用逐顶点着色绘制的两个球体。
Figure 9.15. Two spheres drawn using per-vertex (Gouraud) shading. Because the triangles are large, interpolation artifacts are visible.
图 9.15.使用逐顶点 (Gouraud) 着色绘制的两个球体。由于三角形很大,因此可以看到插值伪影。
To avoid the interpolation artifacts associated with per-vertex shading, we can avoid interpolating colors by performing the shading computations after the interpolation, in the fragment stage. In per-fragment shading, the same shading equations are evaluated, but they are evaluated for each fragment using interpolated vectors, rather than for each vertex using the vectors from the application.
为了避免与逐顶点着色相关的插值伪影,我们可以在片段阶段的插值之后执行着色计算,从而避免插值颜色。在逐片段着色中,会评估相同的着色方程,但它们是使用插值向量针对每个片段进行评估的,而不是使用应用程序中的向量针对每个顶点进行评估的。
In per-fragment shading, the geometric information needed for shading is passed through the rasterizer as attributes, so the vertex stage must coordinate with the fragment stage to prepare the data appropriately. One approach is to interpolate the eye-space surface normal and the eye-space vertex position, which then can be used just as they would in per-vertex shading.
在逐片段着色中,着色所需的几何信息作为属性通过光栅化器传递,因此顶点阶段必须与片段阶段协调以适当地准备数据。一种方法是插入眼空间表面法线和眼空间顶点位置,然后可以像在逐顶点着色中一样使用它们。
Figure 9.16 shows our two spheres drawn with per-fragment shading.
图 9.16展示了我们用逐片段着色绘制的两个球体。
Figure 9.16. Two spheres drawn using per-fragment shading. Because the triangles are large, interpolation artifacts are visible.
图 9.16.使用逐片段着色绘制的两个球体。由于三角形很大,因此可以看到插值伪影。
Per-fragment shading is sometimes called Phong shading, which is confusing because the same name is attached to the Phong illumination model.
每片段着色有时被称为 Phong 着色,这容易引起混淆,因为 Phong 照明模型也有相同的名称。
Textures (discussed in Chapter 11) are images that are used to add extra detail to the shading of surfaces that would otherwise look too homogeneous and artificial. The idea is simple: each time shading is computed, we read one of the values used in the shading computation—the diffuse color, for instance—from a texture instead of using the attribute values that are attached to the geometry being rendered. This operation is known as a texture lookup: the shading code specifies a texture coordinate, a point in the domain of the texture, and the texture-mapping system finds the value at that point in the texture image and returns it. The texture value is then used in the shading computation.
纹理(第 11 章中讨论)是用于为表面着色添加额外细节的图像,否则表面着色会看起来过于同质和不自然。 这个想法很简单:每次计算着色时,我们从纹理中读取着色计算中使用的值之一(例如漫反射颜色),而不是使用附加到要渲染的几何体的属性值。 此操作称为纹理查找:着色代码指定纹理坐标,即纹理域中的点,然后纹理映射系统在纹理图像中的该点处找到该值并返回它。 然后在着色计算中使用纹理值。
The most common way to define texture coordinates is simply to make the texture coordinate another vertex attribute. Each primitive then knows where it lives in the texture.
定义纹理坐标的最常见方式就是将纹理坐标作为另一个顶点属性。这样每个图元就知道它在纹理中的位置。
The decision about where to place shading computations depends on how fast the color changes—the scale of the details being computed. Shading with large-scale features, such as diffuse shading on curved surfaces, can be evaluated fairly infrequently and then interpolated: it can be computed with a low shading frequency. Shading that produces small-scale features, such as sharp highlights or detailed textures, needs to be evaluated at a high shading frequency. For details that need to look sharp and crisp in the image, the shading frequency needs to be at least one shading sample per pixel.
决定将着色计算放在何处取决于颜色变化的速度——正在计算的细节的规模。具有大规模特征的着色(例如曲面上的漫反射着色)可以相当不频繁地进行评估,然后进行插值:可以用低着色频率进行计算。产生小规模特征的着色(例如清晰的高光或详细的纹理)需要以高着色频率进行评估。对于需要在图像中看起来清晰明快的细节,着色频率需要至少为每像素一个着色样本。
So large-scale effects can safely be computed in the vertex stage, even when the vertices defining the primitives are many pixels apart. Effects that require a high shading frequency can also be computed at the vertex stage, as long as the vertices are close together in the image; alternatively, they can be computed at the fragment stage when primitives are larger than a pixel.
因此,即使定义图元的顶点相隔许多像素,也可以在顶点阶段安全地计算大规模效果。需要高着色频率的效果也可以在顶点阶段计算,只要图像中的顶点靠得很近;或者,当图元大于一个像素时,可以在片段阶段计算它们。
For example, a hardware pipeline as used in a computer game, generally using primitives that cover several pixels to ensure high efficiency, normally does most shading computations per fragment. On the other hand, the PhotoRealistic RenderMan system does all shading computations per vertex, after first subdividing, or dicing, all surfaces into small quadrilaterals called micropolygons that are about the size of pixels. Since the primitives are small, per-vertex shading in this system achieves a high shading frequency that is suitable for detailed shading.
例如,计算机游戏中使用的硬件管道通常使用覆盖多个像素的图元来确保高效率,通常每个片段执行大多数着色计算。另一方面,PhotoRealistic RenderMan 系统首先将所有表面细分或切割成称为微多边形的大小与像素相当。由于图元较小,因此该系统中的每个顶点着色可实现适合详细着色的高着色频率。
Just as with ray tracing, rasterization will produce jagged lines and triangle edges if we make an all-or-nothing determination of whether each pixel is inside the primitive or not. In fact, the set of fragments generated by the simple triangle rasterization algorithms described in this chapter, sometimes called standard or aliased rasterization, is exactly the same as the set of pixels that would be mapped to that triangle by a ray tracer that sends one ray through the center of each pixel. Also as in ray tracing, the solution is to allow pixels to be partly covered by a primitive (Crow, 1978). In practice, this form of blurring helps visual quality, especially in animations. This is shown as the top line of Figure 9.17.
与光线追踪一样,如果我们非此即彼地判断每个像素是否在图元内,那么光栅化将产生锯齿状线条和三角形边缘。事实上,本章描述的简单三角形光栅化算法(有时称为标准光栅化或别名光栅化)生成的片段集与光线追踪器(从每个像素的中心发出一条光线)映射到该三角形的像素集完全相同。同样与光线追踪一样,解决方案是允许像素被图元部分覆盖(Crow,1978)。实际上,这种形式的模糊有助于提高视觉质量,尤其是在动画中。如图 9.17的顶行所示。
Figure 9.17. An antialiased and a jaggy line viewed at close range so individual pixels are visible.
图 9.17.近距离观看抗锯齿线和锯齿线,因此可以看到各个像素。
There are a number of different approaches to antialiasing in rasterization applications. Just as with a ray tracer, we can produce an antialiased image by setting each pixel value to the average color of the image over the square area belonging to the pixel, an approach known as box filtering. This means we have to think of all drawable entities as having well-defined areas. For example, the line in Figure 9.17 can be thought of as approximating a one-pixel-wide rectangle.
在光栅化应用中,有许多不同的抗锯齿方法。就像光线追踪器一样,我们可以通过将每个像素值设置为属于该像素的正方形区域上图像的平均颜色来生成抗锯齿图像,这种方法称为框过滤。这意味着我们必须将所有可绘制实体视为具有明确定义的区域。例如,图 9.17中的线可以被认为是近似于一个像素宽的矩形。
There are better filters than the box, but a box filter will suffice for all but the most demanding applications.
有比盒子更好的过滤器,但是除了最苛刻的应用之外,盒子过滤器就足以满足所有应用的需求。
The easiest way to implement box-filter antialiasing is by supersampling: create images at very high resolutions and then downsample. For example, if our goal is a 256 × 256 pixel image of a line with width 1.2 pixels, we could rasterize a rectangle version of the line with width 4.8 pixels on a 1024 × 1024 screen, and then average 4 × 4 groups of pixels to get the colors for each of the 256 × 256 pixels in the “shrunken” image. This is an approximation of the actual boxfiltered image, but works well when objects are not extremely small relative to the distance between pixels.
实现盒式滤波器抗锯齿的最简单方法是超采样:以非常高的分辨率创建图像,然后进行下采样。例如,如果我们的目标是一条线的 256 × 256 像素图像,其宽度为 1.2 像素,那么我们可以在 1024 × 1024 屏幕上光栅化该线的矩形版本,其宽度为 4.8 像素,然后平均 4 × 4 组像素以获得“缩小”图像中每个 256 × 256 像素的颜色。这是实际的盒式滤波图像的近似值,但当物体相对于像素之间的距离不是非常小时,效果很好。
Supersampling is quite expensive, however. Because the very sharp edges that cause aliasing are normally caused by the edges of primitives, rather than sudden variations in shading within a primitive, a widely used optimization is to sample visibility at a higher rate than shading. If information about coverage and depth is stored for several points within each pixel, very good antialiasing can be achieved even if only one color is computed. In systems like RenderMan that use per-vertex shading, this is achieved by rasterizing at high resolution: it is inexpensive to do so because shading is simply interpolated to produce colors for the many fragments, or visibility samples. In systems with per-fragment shading, such as hardware pipelines, multisample antialiasing is achieved by storing for each fragment a single color plus a coverage mask and a set of depth values.
然而,超级采样非常昂贵。由于导致锯齿的非常尖锐的边缘通常是由图元的边缘引起的,而不是图元内着色的突然变化,因此广泛使用的优化是以高于着色的速率对可见性进行采样。如果为每个像素内的多个点存储有关覆盖和深度的信息,即使只计算一种颜色也可以实现非常好的抗锯齿。在使用每个顶点着色的 RenderMan 等系统中,这是通过高分辨率光栅化实现的:这样做成本低廉,因为着色只是插值以产生许多片段或可见性样本的颜色。在具有每个片段着色的系统(例如硬件管道)中,通过为每个片段存储单一颜色加上覆盖掩码和一组深度值来实现多重采样抗锯齿。
The strength of object-order rendering, that it requires a single pass over all the geometry in the scene, is also a weakness for complex scenes. For instance, in a model of an entire city, only a few buildings are likely to be visible at any given time. A correct image can be obtained by drawing all the primitives in the scene, but a great deal of effort will be wasted processing geometry that is behind the visible buildings, or behind the viewer, and therefore doesn’t contribute to the final image.
对象顺序渲染的优点在于它需要对场景中的所有几何图形进行一次遍历,但这对于复杂场景来说也是一个弱点。例如,在整个城市的模型中,任何给定时间可能只有少数建筑物可见。通过绘制场景中的所有图元可以获得正确的图像,但大量的精力将浪费在处理可见建筑物后面或观察者后面的几何图形上,因此对最终图像没有贡献。
Identifying and throwing away invisible geometry to save the time that would be spent processing it is known as culling. Three commonly implemented culling strategies (often used in tandem) are
识别并丢弃不可见的几何体以节省处理时间的过程称为剔除。三种常用的剔除策略(通常串联使用)是
view volume culling—the removal of geometry that is outside the view volume;
视图体积剔除——移除视图体积之外的几何体;
occlusion culling—the removal of geometry that may be within the view volume but is obscured, or occluded, by other geometry closer to the camera;
遮挡剔除——移除可能位于视图体内但被靠近相机的其他几何体遮挡或遮挡的几何体;
backface culling—the removal of primitives facing away from the camera.
背面剔除——移除背对相机的图元。
We will briefly discuss view volume culling and backface culling, but culling in high performance systems is a complex topic; see Akenine-Möller, Haines, and Hoffman (2008) for a complete discussion and for information about occlusion culling.
我们将简要讨论视图体积剔除和背面剔除,但高性能系统中的剔除是一个复杂的主题;请参阅 Akenine-Möller、Haines 和 Hoffman (2008) 的完整讨论和有关遮挡剔除的信息。
When an entire primitive lies outside the view volume, it can be culled, since it will produce no fragments when rasterized. If we can cull many primitives with a quick test, we may be able to speed up drawing significantly. On the other hand, testing primitives individually to decide exactly which ones need to be drawn may cost more than just letting the rasterizer eliminate them.
当整个图元位于视图体积之外时,它可以被剔除,因为它在光栅化时不会产生任何碎片。如果我们可以通过快速测试剔除许多图元,我们可能能够显著加快绘制速度。另一方面,单独测试图元以准确确定需要绘制哪些图元可能比让光栅化器消除它们花费更多。
View volume culling, also known as view frustum culling, is especially helpful when many triangles are grouped into an object with an associated bounding volume. If the bounding volume lies outside the view volume, then so do all the triangles that make up the object. For example, if we have 1000 triangles bounded by a single sphere with center c and radius r, we can check whether the sphere lies outside the clipping plane,
视图体积剔除,也称为视锥体剔除在将许多三角形分组为具有相关边界体积的对象时特别有用。如果边界体积位于视图体积之外,则构成该对象的所有三角形也位于视图体积之外。例如,如果我们有 1000 个三角形,它们被一个球体包围,球体中心为c ,半径为r ,我们可以检查该球体是否位于裁剪平面之外,
where a is a point on the plane, and p is a variable. This is equivalent to checking whether the signed distance from the center of the sphere c to the plane is greater than +r. This amounts to the check that
其中a是平面上的一个点, p是变量。这相当于检查球心c到平面的有符号距离是否大于 + r 。这相当于检查
Note that the sphere may overlap the plane even in a case where all the triangles do lie outside the plane. Thus, this is a conservative test. How conservative the test is depends on how well the sphere bounds the object.
请注意,即使所有三角形都位于平面之外,球体也可能与平面重叠。因此,这是一个保守的测试。测试的保守程度取决于球体与物体的边界有多好。
The same idea can be applied hierarchically if the scene is organized in one of the spatial data structures described in Chapter 12.
如果场景按照第 12 章中描述的空间数据结构之一进行组织,则可以分层地应用相同的想法。
When polygonal models are closed, i.e., they bound a closed space with no holes, they are often assumed to have outward facing normal vectors as discussed in Chapter 5. For such models, the polygons that face away from the eye are certain to be overdrawn by polygons that face the eye. Thus, those polygons can be culled before the pipeline even starts.
当多边形模型是封闭的,即它们包围一个没有洞的封闭空间时,它们通常被假定具有向外的法向量,如第 5 章所述。对于这样的模型,背对眼睛的多边形肯定会被面向眼睛的多边形覆盖。因此,这些多边形甚至可以在管道启动之前就被剔除。
I’ve often seen clipping discussed at length, and it is a much more involved process than that described in this chapter. What is going on here?
我经常看到对剪辑的详细讨论,而剪辑的过程比本章中描述的过程要复杂得多。这里发生了什么?
The clipping described in this chapter works, but lacks optimizations that an industrial-strength clipper would have. These optimizations are discussed in detail in Blinn’s definitive work listed in the chapter notes.
本章中描述的裁剪是可行的,但缺少工业级裁剪器应有的优化。这些优化在章节注释中列出的 Blinn 权威著作中进行了详细讨论。
How are polygons that are not triangles rasterized?
非三角形的多边形如何栅格化?
These can either be done directly scan-line by scan-line, or they can be broken down into triangles. The latter appears to be the more popular technique.
这些可以直接逐行扫描完成,也可以分解为三角形。后者似乎是更受欢迎的技术。
Is it always better to antialias?
抗锯齿总是更好吗?
No. Some images look crisper without antialiasing. Many programs use unantialiased “screen fonts” because they are easier to read.
不会。有些图像不经过抗锯齿处理看起来更清晰。许多程序使用未经过抗锯齿处理的“屏幕字体”,因为它们更易于阅读。
The documentation for my API talks about “scene graphs” and “matrix stacks.” Are these part of the graphics pipeline?
我的 API 文档讨论了“场景图”和“矩阵堆栈”。这些是图形管道的一部分吗?
The graphics pipeline is certainly designed with these in mind, and whether we define them as part of the pipeline is a matter of taste. This book delays their discussion until Chapter 12.
图形管道的设计当然考虑到了这些因素,至于我们是否将它们定义为管道的一部分,则取决于个人喜好。本书将对它们的讨论推迟到第 12 章。
Is a uniform distance z-buffer better than the standard one that includes perspective matrix nonlinearities?
均匀距离 z 缓冲区是否比包含透视矩阵非线性的标准缓冲区更好?
It depends. One “feature” of the nonlinearities is that the z-buffer has more resolution near the eye and less in the distance. If a level-of-detail system is used, then geometry in the distance is coarser and the “unfairness” of the z-buffer can be a good thing.
视情况而定。非线性的一个“特征”是 z 缓冲区在眼睛附近的分辨率更高,而在远处的分辨率更低。如果使用细节层次系统,那么远处的几何形状会更粗糙,z 缓冲区的“不公平”可能是一件好事。
Is a software z-buffer ever useful?
软件 z 缓冲区有用吗?
Yes. Most of the movies that use 3D computer graphics have used a variant of the software z-buffer developed by Pixar (Cook, Carpenter, & Catmull, 1987).
是的。大多数使用 3D 计算机图形的电影都使用了 Pixar 开发的软件 z-buffer 的变体(Cook、Carpenter 和 Catmull,1987 年)。
A wonderful book about designing a graphics pipeline is Jim Blinn’s Corner: A Trip Down the Graphics Pipeline (J. Blinn, 1996). Many nice details of the pipeline and culling are in 3D Game Engine Design (Eberly, 2000) and Real-Time Rendering (Akenine-Möller et al., 2008).
一本关于图形管道设计的精彩书籍是Jim Blinn 的 Corner:图形管道之旅(J. Blinn,1996)。管道和剔除的许多精彩细节都包含在3D 游戏引擎设计(Eberly,2000) 和实时渲染(Akenine-Möller 等,2008) 中。
1. Suppose that in the perspective transform, we have n = 1 and f = 2. Under what circumstances will we have a “reversal” where a vertex before and after the perspective transform flips from in front of to behind the eye or vice versa?
1.假设在透视变换中,我们有n = 1 和f = 2。在什么情况下会出现“逆转”,即透视变换前后的顶点从眼睛前面翻转到眼睛后面或反之亦然?
2. Is there any reason not to clip in x and y after the perspective divide (see Figure 11.2, stage 3)?
2.透视分割之后,有什么理由不剪辑x和y吗(参见图 11.2 ,第 3 阶段)?
3. Derive the incremental form of the midpoint line-drawing algorithm with colors at endpoints for 0 < m ≤ 1.
3.当 0 < m ≤ 1 时,推导中点画线算法的增量形式,其中端点带有颜色。
4. Modify the triangle-drawing algorithm so that it will draw exactly one pixel for points on a triangle edge which goes through (x, y) = (–1, –1).
4.修改三角形绘制算法,使得它在经过 ( x, y ) = (-1, -1) 的三角形边上的点上精确绘制一个像素。
5. Suppose you are designing an integer z-buffer for flight simulation where all of the objects are at least one meter thick, are never closer to the viewer than 4 m, and may be as far away as 100 km. How many bits are needed in the z-buffer to ensure there are no visibility errors? Suppose that visibility errors only matter near the viewer, i.e., for distances less than 100 m. How many bits are needed in that case?
5.假设您正在设计一个用于飞行模拟的整数 z 缓冲区,其中所有物体的厚度至少为 1 米,距离观察者的距离绝不会少于 4 米,并且可能远至 100 公里。z 缓冲区中需要多少位才能确保没有可见性错误?假设可见性错误仅在观察者附近(即距离小于 100 米)才重要。在这种情况下需要多少位?
In graphics, we often deal with functions of a continuous variable: an image is the first example you have seen, but you will encounter many more as you continue your exploration of graphics. By their nature, continuous functions can’t be directly represented in a computer; we have to somehow represent them using a finite number of bits. One of the most useful approaches to representing continuous functions is to use samples of the function: just store the values of the function at many different points and reconstruct the values in between when and if they are needed.
在图形学中,我们经常处理连续变量的函数:图像是你见过的第一个例子,但随着你继续探索图形,你会遇到更多的例子。就其性质而言,连续函数不能直接在计算机中表示;我们必须以某种方式使用有限数量的位来表示它们。表示连续函数最有用的方法之一是使用函数的样本:只需将函数的值存储在许多不同的点,并在需要时重建这些值。
You are by now familiar with the idea of representing an image using a two-dimensional grid of pixels—so you have already seen a sampled representation! Think of an image captured by a digital camera: the actual image of the scene that was formed by the camera’s lens is a continuous function of the position on the image plane, and the camera converted that function into a two-dimensional grid of samples. Mathematically, the camera converted a function of type ℝ2 → C (where C is the set of colors) to a two-dimensional array of color samples, or a function of type ℤ2 → C.
现在,您已经熟悉了使用二维像素网格表示图像的概念,因此您已经看到了采样表示!想象一下数码相机拍摄的图像:相机镜头形成的实际场景图像是图像平面位置的连续函数,相机将该函数转换为二维样本网格。从数学上讲,相机将类型 ℝ 2 → C的函数(其中C是颜色集)转换为二维颜色样本数组,或类型 ℤ 2 → C的函数。
Another example of a sampled representation is a 2D digitizing tablet, such as the screen of a tablet computer or a separate pen tablet used by an artist. In this case, the original function is the motion of the stylus, which is a time-varying 2D position, or a function of type ℝ → ℝ2. The digitizer measures the position of the stylus at many points in time, resulting in a sequence of 2D coordinates, or a function of type ℤ → ℝ2 . A motion capture system does exactly the same thing for a special marker attached to an actor’s body: it takes the 3D position of the marker over time (ℝ → ℝ3) and makes it into a series of instantaneous position measurements (ℤ → ℝ3).
采样表示的另一个示例是 2D 数字化板,例如平板电脑的屏幕或艺术家使用的单独手写板。在这种情况下,原始函数是手写笔的运动,它是随时间变化的 2D 位置,或 ℝ → ℝ 2类型的函数。数字化仪会在许多时间点测量手写笔的位置,从而得到一系列 2D 坐标,或 ℤ → ℝ 2类型的函数。运动捕捉系统对于附在演员身体上的特殊标记执行完全相同的操作:它会随时间获取标记的 3D 位置(ℝ → ℝ 3 ),并将其转化为一系列瞬时位置测量值(ℤ → ℝ 3 )。
Going up in dimension, a medical CT scanner, used to non-invasively examine the interior of a person’s body, measures density as a function of position inside the body. The output of the scanner is a 3D grid of density values: it converts the density of the body (ℝ3 → ℝ) to a 3D array of real numbers (ℤ3 → ℝ).
再往上看,医用 CT 扫描仪用于无创检查人体内部,它测量密度与体内位置的关系。扫描仪的输出是密度值的 3D 网格:它将人体密度 (ℝ 3 → ℝ) 转换为实数的 3D 数组 (ℤ 3 → ℝ)。
These examples seem different, but in fact they can all be handled using exactly the same mathematics. In all cases, a function is being sampled at the points of a lattice in one or more dimensions, and in all cases, we need to be able to reconstruct that original continuous function from the array of samples.
这些例子看起来不同,但实际上它们都可以用完全相同的数学来处理。在所有情况下,函数都在一个点处被采样一个或多个维度上的格子,并且在所有情况下,我们都需要能够从样本数组中重建原始连续函数。
From the example of a 2D image, it may seem that the pixels are enough, and we never need to think about continuous functions again once the camera has discretized the image. But what if we want to make the image larger or smaller on the screen, particularly by ion-integer scale factors? It turns out that the simplest algorithms to do this perform badly, introducing obvious visual artifacts known as aliasing. Explaining why aliasing happens and understanding how to prevent it require the mathematics of sampling theory. The resulting algorithms are rather simple, but the reasoning behind them, and the details of making them perform well, can be subtle.
从 2D 图像的示例来看,像素似乎足够了,一旦相机将图像离散化,我们就再也不需要考虑连续函数了。但是,如果我们想让屏幕上的图像变大或变小,特别是通过整数比例因子,该怎么办?事实证明,最简单的算法表现不佳,会引入明显的视觉伪影,即混叠。解释混叠发生的原因以及了解如何防止混叠需要采样理论的数学知识。由此产生的算法相当简单,但其背后的原因以及使它们表现良好的细节却可能很微妙。
Representing continuous functions in a computer is, of course, not unique to graphics; nor is the idea of sampling and reconstruction. Sampled representations are used in applications from digital audio to computational physics, and graphics is just one (and by no means the first) user of the related algorithms and mathematics. The fundamental facts about how to do sampling and reconstruction have been known in the field of communications since the 1920s and were stated in exactly the form we use them by the 1940s (Shannon & Weaver, 1964).
当然,在计算机中表示连续函数并不是图形学所独有的;采样和重构的概念也不是。采样表示法在从数字音频到计算物理的应用中都有使用,图形学只是相关算法和数学的一个(绝不是第一个)用户。关于如何进行采样和重构的基本事实自 20 世纪 20 年代以来在通信领域就已为人所知,并在 20 世纪 40 年代以与我们所使用的形式完全相同的形式表达出来(Shannon & Weaver,1964 年)。
This chapter starts by summarizing sampling and reconstruction using the concrete one-dimensional example of digital audio. Then, we go on to present the basic mathematics and algorithms that underlie sampling and reconstruction in one and two dimensions. Finally, we go into the details of the frequency-domain viewpoint, which provides many insights into the behavior of these algorithms.
本章首先使用具体的一维数字音频示例总结采样和重构。然后,我们继续介绍一维和二维采样和重构的基本数学和算法。最后,我们详细介绍了频域观点,这为了解这些算法的行为提供了许多见解。
Although sampled representations had already been in use for years in telecommunications, the introduction of the compact disc in 1982, following the increased use of digital recording for audio in the previous decade, was the first highly visible consumer application of sampling.
尽管采样表示法已在电信领域使用多年,但随着前十年音频数字录音的广泛使用,1982 年光盘的推出是采样的第一个备受瞩目的消费者应用。
In audio recording, a microphone converts sound, which exists as pressure waves in the air, into a time-varying voltage that amounts to a measurement of the changing air pressure at the point where the microphone is located. This electrical signal needs to be stored somehow so that it may be played back at a later time and sent to a loudspeaker that converts the voltage back into pressure waves by moving a diaphragm in synchronization with the voltage.
在录音过程中,麦克风将声音(以空气中的压力波形式存在)转换为随时间变化的电压,相当于麦克风所在位置气压变化的测量值。需要以某种方式存储此电信号,以便稍后播放并发送到扬声器,扬声器通过与电压同步移动振膜将电压转换回压力波。
The digital approach to recording the audio signal (Figure 10.1) uses sampling: an analog-to-digital converter (A/D converter,or ADC) measures the voltage many thousand times per second, generating a stream of integers that can easily be stored on any number of media, say a disk on a computer in the recording studio, or transmitted to another location, say the memory in a portable audio player. At playback time, the data are read out at the appropriate rate and sent to a digital-to-analog converter (D/A converter,or DAC). The DAC produces a voltage according to the numbers it receives, and, provided we take enough samples to fairly represent the variation in voltage, the resulting electrical signal is, for all practical purposes, identical to the input.
录制音频信号的数字方法(图 10.1 )采用采样:模拟/数字转换器( A/D 转换器( ADC )每秒测量电压数千次,生成整数流,这些整数流可以轻松存储在任意数量的介质上,例如录音室计算机上的磁盘,或传输到另一个位置,例如便携式音频播放器中的内存。在播放时,数据以适当的速率读出并发送到数模转换器( D/A 转换器(又称DAC )。DAC 根据接收到的数字产生电压,并且,只要我们采集足够的样本来公平地表示电压的变化,那么所产生的电信号在所有实际用途上都与输入相同。
Figure 10.1. Sampling and reconstruction in digital audio.
图 10.1.数字音频中的采样和重建。
It turns out that the number of samples per second required to end up with a good reproduction depends on how high-pitched the sounds are that we are trying to record. A sample rate that works fine for reproducing a string bass or a kick drum produces bizarre-sounding results if we try to record a piccolo or a cymbal; but those sounds are reproduced just fine with a higher sample rate. To avoid these undersampling artifacts, the digital audio recorder filters the input to the ADC to remove high frequencies that can cause problems.
事实证明,要获得良好的再现效果,每秒所需的采样数取决于我们试图录制的声音的高音调。如果我们尝试录制短笛或钹,那么用于再现弦乐贝司或底鼓的采样率会产生奇怪的声音;但这些声音在更高的采样率下可以很好地再现。为了避免这些欠采样伪影,数字录音机会对 ADC 的输入进行滤波,以消除可能导致问题的高频。
Another kind of problem arises on the output side. The DAC produces a voltage that changes whenever a new sample comes in, but stays constant until the next sample, producing a stair-step- shaped graph. These stair-steps act like noise, adding a high-frequency, signal-dependent buzzing sound. To remove this reconstruction artifact, the digital audio player filters the output from the DAC to smooth out the waveform.
另一种问题出现在输出端。DAC 产生的电压在每次新样本输入时都会发生变化,但直到下一个样本输入时才会发生变化,从而产生阶梯状的图形。这些阶梯状图形就像噪音一样,增加了高频、信号相关的嗡嗡声。为了消除这种重建伪影,数字音频播放器会过滤 DAC 的输出以平滑波形。
The digital audio recording chain can serve as a concrete model for the sampling and reconstruction processes that happen in graphics. The same kind of under-sampling and reconstruction artifacts also happens with images or other sampled signals in graphics, and the solution is the same: filtering before sampling and filtering again during reconstruction.
数字音频记录链可以作为图形中采样和重建过程的具体模型。同样的欠采样和重建伪影也发生在图形中的图像或其他采样信号中,解决方案是相同的:在采样之前进行过滤,在重建期间再次进行过滤。
A concrete example of the kind of artifacts that can arise from too-low sample frequencies is shown in Figure 10.2. Here, we are sampling a simple sine wave using two different sample frequencies: 10.8 samples per cycle on the top and 1.2 samples per cycle on the bottom. The higher rate produces a set of samples that obviously capture the signal well, but the samples resulting from the lower sample rate are indistinguishable from samples of a low-frequency sine wave—in fact, faced with this set of samples the low-frequency sinusoid seems the more likely interpretation.
图 10.2显示了采样频率过低可能产生的伪影的具体示例。在这里,我们使用两个不同的采样频率对一个简单的正弦波进行采样:顶部为每周期 10.8 个样本,底部为每周期 1.2 个样本。较高的采样率产生的一组样本显然可以很好地捕捉信号,但较低采样率产生的样本与低频正弦波的样本难以区分——事实上,面对这组样本,低频正弦波似乎更有可能被解释为正弦波。
Figure 10.2. A sine wave (blue curve) sampled at two different rates. (a) At a high sample rate, the resulting samples (black dots) represent the signal well. (b) A lower sample rate produces an ambiguous result: the samples are exactly the same as would result from sampling a wave of much lower frequency (dashed curve).
图 10.2.以两种不同速率采样的正弦波(蓝色曲线)。(a)在高采样率下,所得样本(黑点)很好地代表了信号。(b)较低的采样率会产生模棱两可的结果:样本与采样频率低得多的波(虚线)的结果完全相同。
Once the sampling has been done, it is impossible to know which of the two signals—the fast or the slow sine wave—was the original, and therefore, there is no single method that can properly reconstruct the signal in both cases. Because the high-frequency signal is “pretending to be” a low-frequency signal, this phenomenon is known as aliasing.
一旦采样完成,就不可能知道两个信号(快速正弦波或慢速正弦波)中的哪一个是原始信号,因此,没有一种方法可以正确地重建这两种情况下的信号。由于高频信号“假装”是低频信号,这种现象称为混叠。
Aliasing shows up whenever flaws in sampling and reconstruction lead to artifacts at surprising frequencies. In audio, aliasing takes the form of odd-sounding extra tones—a bell ringing at 10 KHz, after being sampled at 8 KHz, turns into a 6 KHz tone. In images, aliasing often takes the form of moiré patterns that result from the interaction of the sample grid with regular features in an image, for instance, the window blinds in Figure 10.34.
每当采样和重构过程中的缺陷导致意外频率出现伪影时,就会出现混叠。在音频中,混叠表现为听起来奇怪的额外音调——以 10 KHz 响起的铃声,在以 8 KHz 采样后变为 6 KHz 音调。在图像中,混叠通常表现为莫尔条纹,这是采样网格与图像中的常规特征相互作用的结果,例如图 10.34中的百叶窗。
Another example of aliasing in a synthetic image is the familiar stair-stepping on straight lines that are rendered with only black and white pixels (Figure 10.34). This is an example of small-scale features (the sharp edges of the lines) creating artifacts at a different scale (for shallow-slope lines, the stair steps are very long).
合成图像中混叠的另一个例子是常见的直线阶梯状,这些直线仅用黑白像素渲染(图 10.34 )。这是小尺度特征(线条的锐利边缘)在不同尺度上产生伪影的一个例子(对于坡度较小的线条,阶梯状非常长)。
The basic issues of sampling and reconstruction can be understood simply based on features being too small or too large, but some more quantitative questions are harder to answer:
采样和重建的基本问题可以简单地根据特征太小或太大来理解,但一些更定量的问题更难回答:
What sample rate is high enough to ensure good results?
什么样的采样率足够高才能确保良好的结果?
What kinds of filters are appropriate for sampling and reconstruction?
什么类型的过滤器适合采样和重建?
What degree of smoothing is required to avoid aliasing?
需要什么程度的平滑才能避免混叠?
Solid answers to these questions will have to wait until we have developed the theory fully in Section 10.5.
这些问题的确切答案必须等到我们在第 10.5 节中充分阐述该理论之后才能给出。
Before we discuss algorithms for sampling and reconstruction, we’ll first examine the mathematical concept on which they are based—convolution. Convolution is a simple mathematical concept that underlies the algorithms that are used for sampling, filtering, and reconstruction. It also is the basis of how we will analyze these algorithms throughout this chapter.
在讨论采样和重构算法之前,我们先来了解一下它们所基于的数学概念——卷积。卷积是一个简单的数学概念,是采样、滤波和重构算法的基础。它也是本章分析这些算法的基础。
Convolution is an operation on functions: it takes two functions and combines them to produce a new function. In this book, the convolution operator is denoted by a star: the result of applying convolution to the functions f and g is f * g. We say that f is convolved with g,and f * g is the convolution of f and g.
卷积是函数的一种运算:它取两个函数并将它们组合起来产生一个新函数。在本书中,卷积运算符用星号表示:对函数f和g进行卷积的结果是f * g 。我们称f与g卷积,而f * g是f与g的卷积。
Convolution can be applied either to continuous functions (functions f (x) that are defined for any real argument x) or to discrete sequences (functions a[i] that are defined only for integer arguments i). It can also be applied to functions defined on one-dimensional, two-dimensional, or higher-dimensional domains (i.e., functions of one, two, or more arguments). We will start with the discrete, one-dimensional case first and then continue to continuous functions and two- and three-dimensional functions.
卷积既可以应用于连续函数(针对任何实数参数x定义的函数f ( x )),也可以应用于离散序列(仅针对整数参数i定义的函数a [ i ])。它还可以应用于在一维、二维或更高维域上定义的函数(即具有一个、两个或更多个参数的函数)。我们将首先从离散的一维情况开始,然后继续讨论连续函数以及二维和三维函数。
For convenience in the definitions, we generally assume that the functions’ domains go on forever, although of course in practice they will have to stop somewhere, and we have to handle the endpoints in a special way.
为了定义方便,我们通常假设函数的定义域永远存在,尽管在实践中它们必须在某个地方停止,而且我们必须以特殊的方式处理端点。
To get a basic picture of convolution, consider the example of smoothing a 1D function using a moving average (Figure 10.3). To get a smoothed value at any point, we compute the average of the function over a range extending a distance r in each direction. The distance r, called the radius of the smoothing operation, is a parameter that controls how much smoothing happens.
为了对卷积有一个基本的了解,我们来看看使用移动平均线平滑一维函数的例子(图 10.3 )。为了得到任意一点的平滑值,我们计算函数在每个方向上延伸距离r 的范围内的平均值。距离r称为平滑操作的半径,是控制平滑程度的参数。
We can state this idea mathematically for discrete or continuous functions. If we’re smoothing a continuous function g(x), averaging means integrating g over an interval and then dividing by the length of the interval:
我们可以用数学的方式将这个想法表达为离散函数或连续函数。如果我们要平滑连续函数g ( x ),则平均意味着在区间内对g进行积分,然后除以区间的长度:
Figure 10.3. Smoothing using a moving average.
图 10.3.使用移动平均线进行平滑。
On the other hand, if we’re smoothing a discrete function a[i], averaging means summing a for a range of indices and dividing by the number of values:
另一方面,如果我们要平滑离散函数a [ i ],则平均值意味着对一系列索引求和并除以值的数量:
In each case, the normalization constant is chosen so that if we smooth a constant function, the result will be the same function.
在每种情况下,都会选择标准化常数,以便如果我们平滑一个常数函数,结果将是相同的函数。
This idea of a moving average is the essence of convolution; the only difference is that in convolution, the moving average is a weighted average.
移动平均的思想是卷积的本质;唯一的区别是在卷积中,移动平均是加权平均。
We will start with the most concrete case of convolution: convolving a discrete sequence a[i] with another discrete sequence b[i]. The result is a discrete sequence (a * b)[i]. The process is just like smoothing a with a moving average, but this time instead of equally weighting all samples within a distance r,weuseasecond sequence b to give a weight to each sample (Figure 10.4). The value b[i – j] gives the weight for the sample at position j, which is at a distance i – j from the index i where we are evaluating the convolution. Here is the definition of (a ⋆ b), expressed as a formula:
我们将从卷积的最具体情况开始:将离散序列a [ i ] 与另一个离散序列b [ i ] 卷积。结果是一个离散序列 ( a*b )[ i ]。该过程就像使用移动平均值平滑a一样,但这次我们不是对距离r内的所有样本赋予相同权重,而是使用第二个序列b为每个样本赋予权重(图 10.4 )。值b [ i - j ] 给出位置j处样本的权重,该样本与我们正在评估卷积的索引i的距离为i - j 。以下是 (a⋆b )的定义,表示为公式:
Figure 10.4. Computing one value in the discrete convolution of a sequence a with a filter b that has support five samples wide. Each sample in a ⋆ b is an average of nearby samples in a, weighted by the values of b.
图 10.4.计算序列a与滤波器 b 的离散卷积中的一个值,滤波器b具有五个样本宽度的支持。 a ⋆ b中的每个样本都是a中附近样本的平均值,由b的值加权。
By omitting bounds on j, we indicate that this sum runs over all integers (i.e., from –∞ to +∞). Figure 10.4 illustrates how one output sample is computed, using the example of —that is, , etc.
通过省略j的边界,我们表明此和遍历所有整数(即从 -∞ 到 +∞)。图 10.4说明了如何计算一个输出样本,使用 b=116[…0146410…] 的示例,即 b[0]=616b[±1]=416,等等。
In graphics, one of the two functions will usually have finite support (as does the example in Figure 10.4), which means that it is nonzero only over a finite interval of argument values. If we assume that b has finite support, there is some radius r such that b[k] = 0 whenever |k| > r. In that case, we can write the sum above as
在图形学中,这两个函数中的一个通常具有有限支撑(如图 10.4中的示例所示),这意味着它仅在参数值的有限区间内为非零值。如果我们假设b具有有限支撑,则存在某个半径r ,使得每当|k| > r时, b [ k ] = 0。在这种情况下,我们可以将上面的和写成
and we can express the definition in code as
我们可以在代码中表达如下定义
function convolve(sequence a, filter b,int i)
函数卷积(序列a ,过滤器b ,int i )
s = 0
s = 0
r = b.radius
r = b .半径
for j = i – r to i + r do
对于j = i – r到i + r执行
s = s + a[j]b[i – j]
s = s + a [ j ] b [ i - j ]
return s
返回
Convolution is important because we can use it to perform filtering. Looking back at our first example of filtering, the moving average, we can now reinterpret that smoothing operation as convolution with a particular sequence. When we compute an average over some limited range of indices, that is the same as weighting the points in the range all identically and weighting the rest of the points with zeros. This kind of filter, which has a constant value over the interval where it is nonzero, is known as a box filter (because it looks like a rectangle if you draw its graph—see Figure 10.5). For a box filter of radius r, the weight is 1/(2r +1):
卷积很重要,因为我们可以用它来执行过滤。回顾我们的第一个过滤示例,即移动平均线,我们现在可以将平滑操作重新解释为与特定序列的卷积。当我们计算某个有限范围内的指标的平均值时,这相当于对范围内的点赋予相同的权重,对其余点赋予零权重。这种在非零区间具有恒定值的过滤器称为盒子滤波器(因为如果画出它的图形,它看起来像一个矩形——见图 10.5 )。对于半径为r的盒子滤波器,权重为 1 / (2 r +1):
Figure 10.5. A discrete box filter.
图 10.5.离散盒式滤波器。
If you substitute this filter into Equation (10.2), you will find that it reduces to the moving average in Equation (10.1).
如果将此滤波器代入公式(10.2),您会发现它简化为公式(10.1)中的移动平均值。
As in this example, convolution filters are usually designed so that they sum to 1. That way, they don’t affect the overall level of the signal.
如本例所示,卷积滤波器通常设计为总和为 1。这样,它们就不会影响信号的整体水平。
Example 20 (Convolution of a box and a step)
示例 20(盒子和台阶的卷积)
For a simple example of filtering, let the signal be the step function
举一个简单的过滤示例,让信号成为阶跃函数
and the filter be the five-point box filter centered at zero,
过滤器是以零为中心的五点盒过滤器,
What is the result of convolving a and b? At a particular index i, as shown in Figure 10.6, the result is the average of the step function over the range from i – 2 to i + 2. If i < –2, we are averaging all zeros and the result is zero. If i ≥ 2, we are averaging all ones and the result is one. In between, there are i +3 ones, resulting in the value . The output is a linear ramp that goes from 0 to 1 over five samples:.
卷积a和b的结果是什么?如图 10.6所示,在特定索引i处,结果是从i – 2 到i + 2 范围内阶跃函数的平均值。如果i < –2,则对所有零求平均值,结果为零。如果i ≥ 2,则对所有一求平均值,结果为一。在这两者之间,有i +3 个一,结果为我+ 3 5 。输出是一个线性斜坡,在五个样本中从 0 变为 1:15[…00123455…]。
Figure 10.6. Discrete convolution of a box function with a step function.
图 10.6.盒函数与阶跃函数的离散卷积。
The way we’ve written it so far, convolution seems like an asymmetric operation: a is the sequence we’re smoothing, and b provides the weights. But one of the nice properties of convolution is that it actually doesn’t make any difference which is which: the filter and the signal are interchangeable. To see this, just rethink the sum in Equation (10.2) with the indices counting from the origin of the filter b, rather than from the origin of a. That is, we replace j with i – k. The result of this change of variable is
到目前为止,我们编写的卷积似乎是一种不对称运算: a是我们要平滑的序列, b提供权重。但卷积的一个优点是它实际上没有任何区别:滤波器和信号是可以互换的。要了解这一点,只需重新考虑等式 (10.2) 中的和,其中索引从滤波器b的原点开始计算,而不是从a的原点开始计算。也就是说,我们用i – k替换j 。这种变量变化的结果是
This is exactly the same as Equation (10.2) but with a acting as the filter and b acting as the signal. So for any sequences a and b, (a * b) = (b * a), and we say that convolution is a commutative operation.1
这与公式 (10.2) 完全相同,但a充当过滤器, b充当信号。因此,对于任何序列a和b ,( a * b ) = ( b * a ),我们称卷积为交换运算。1
More generally, convolution is a “multiplication-like” operation. Like multiplication or addition of numbers or functions, neither the order of the arguments nor the placement of parentheses affects the result. Also, convolution relates to addition in the same way that multiplication does. To be precise, convolution is commutative and associative, and it is distributive over addition.
更一般地讲,卷积是一种“类似乘法”的运算。与数字或函数的乘法或加法一样,参数的顺序和括号的位置都不会影响结果。此外,卷积与加法的关系与乘法的关系相同。确切地说,卷积是交换律和结合律,并且对加法具有分配律。
These properties are very natural if we think of convolution as being like multiplication, and they are very handy to know about because they can help us save work by simplifying convolutions before we actually compute them. For instance, suppose we want to take a sequence a and convolve it with three filters, b1, b2, and b3—that is, we want ((a * b1) * b2) * b3. If the sequence is long and the filters are short (that is, they have small radii), it is much faster to first convolve the three filters together (computing b1 * b2 * b3) and finally to convolve the result with the signal, computing a * (b1 * b2 * b3), which we know from associativity gives the same result.
如果我们将卷积视为乘法,那么这些属性非常自然,并且了解它们非常方便,因为它们可以在我们实际计算卷积之前简化卷积,从而帮助我们节省工作。 例如,假设我们要取一个序列a并将其与三个滤波器b 1 、 b 2和b 3进行卷积,也就是说,我们需要 (( a * b 1 ) * b 2 ) * b 3 。 如果序列很长而滤波器很短(即它们的半径很小),那么先将三个滤波器卷积在一起(计算b 1 * b 2 * b 3 ),最后将结果与信号进行卷积计算a * ( b 1 * b 2 * b 3 ) 会快得多,我们从结合律知道这会给出相同的结果。
A very simple filter serves as an identity for discrete convolution: it is the discrete filter of radius zero, or the sequence d[i] = ..., 0, 0, 1, 0, 0,... (Figure 10.7). If we convolve d with a signal a, there will be only one nonzero term in the sum:
一个非常简单的滤波器可用作离散卷积的恒等式:它是半径为零的离散滤波器,或序列d [ i ] = ..., 0, 0, 1, 0, 0, ... (图 10.7 )。如果我们将d与信号a卷积,则和中只有一个非零项:
Figure 10.7. The discrete identity filter.
图 10.7.离散身份过滤器。
1 You may have noticed that one of the functions in the convolution sum seems to be flipped over—that is, b[k] gives the weight for the sample k units earlier in the sequence, while b[–k] gives the weight for the sample k units later in the sequence. The reason for this has to do with ensuring associativity; see Exercise 4. Most of the filters we use are symmetric, so you hardly ever need to worry about this.
1您可能已经注意到,卷积和中的一个函数似乎被翻转了——也就是说, b [ k ] 给出序列中前k个单位的样本的权重,而b [- k ] 给出序列中后k个单位的样本的权重。这样做的原因是为了确保结合性;参见练习 4。我们使用的大多数过滤器都是对称的,因此您几乎不需要担心这个问题。
So clearly, convolving a with d just gives back a again. The sequence d is known as the discrete impluse. It is occasionally useful in expressing a filter: for instance, the process of smoothing a signal a with a filter b and then subtracting that from the original could be expressed as a single convolution with the filter d – b:
显然,将a与d卷积只会返回a 。序列d称为离散脉冲。它有时在表达滤波器时很有用:例如,使用滤波器b平滑信号a ,然后从原始信号中减去该信号的过程可以表示为与滤波器d – b 的单次卷积:
There is a second, entirely equivalent, way of interpreting Equation (10.2). Looking at the samples of a⋆b one at a time leads to the weighted-average interpretation that we have already seen. But if we omit the [i], we can instead think of the sum as adding together entire sequences. One piece of notation is required to make this work: if b is a sequence, then the same sequence shifted to the right by j places is called b→j (Figure 10.8):
还有第二种完全等价的解释公式 (10.2) 的方法。逐个查看a ⋆ b的样本,可以得到我们已经见过的加权平均解释。但如果我们省略 [ i ],我们可以将和看作将整个序列相加。要做到这一点,需要一种符号:如果b是一个序列,则将相同的序列向右移动j位,称为b →j (图 10.8 ):
Figure 10.8. Shifting a sequence b to get b→j.
图 10.8.移位序列b得到b →j 。
Then, we can write Equation (10.2) as a statement about the whole sequence (a * b) rather than element-by-element:
然后,我们可以将公式 (10.2) 写成关于整个序列 ( a * b ) 的陈述,而不是逐个元素的陈述:
Looking at it this way, the convolution is a sum of shifted copies of b, weighted by the entries of a (Figure 10.9). Because of commutativity, we can pick either a or b as the filter; if we choose b, then we are adding up one copy of the filter for every sample in the input.
从这个角度来看,卷积是b的移位副本之和,由a的条目加权(图 10.9 )。由于交换性,我们可以选择a或b作为过滤器;如果我们选择b ,那么我们将为输入中的每个样本添加一个过滤器副本。
Figure 10.9. Discrete convolution as a sum of shifted copies of the filter.
图 10.9.离散卷积作为滤波器移位副本的总和。
While it is true that discrete sequences are what we actually work with in a computer program, these sampled sequences are supposed to represent continuous functions, and often we need to reason mathematically about the continuous functions in order to figure out what to do. For this reason, it is useful to define convolution between continuous functions and also between continuous and discrete functions.
虽然我们在计算机程序中实际处理的是离散序列,但这些采样序列应该表示连续函数,并且我们经常需要对连续函数进行数学推理,以便弄清楚该怎么做。因此,定义连续函数之间以及连续函数和离散函数之间的卷积很有用。
The convolution of two continuous functions is the obvious generalization of Equation (10.2), with an integral replacing the sum:
两个连续函数的卷积是方程 (10.2) 的明显推广,用积分代替了和:
One way of interpreting this definition is that the convolution of f and g, evaluated at the argument x, is the area under the curve of the product of the two functions after we shift g so that g(0) lines up with f (t). Just like in the discrete case, the convolution is a moving average, with the filter providing the weights for the average (see Figure 10.10).
解释此定义的一种方式是,在参数x处求值的f和g的卷积是将g移位以使g (0) 与f ( t ) 对齐后,两个函数乘积曲线下的面积。与离散情况一样,卷积是移动平均值,滤波器为平均值提供权重(见图10.10 )。
Figure 10.10. Continuous convolution.
图 10.10。连续卷积。
Like discrete convolution, convolution of continuous functions is commutative and associative, and it is distributive over addition. Also as with the discrete case, the continuous convolution can be seen as a sum of copies of the filter rather than the computation of weighted averages. Except, in this case, there are infinitely many copies of the filter g:
与离散卷积一样,连续函数的卷积具有交换性和结合性,并且对加法具有分配性。与离散情况一样,连续卷积可以看作是滤波器副本的总和,而不是加权平均值的计算。不过,在这种情况下,滤波器g有无数个副本:
Example 21 (Convolution of two box functions)
示例21(两个框函数的卷积)
Let f be a box function:
令f为框函数:
Then what is f * f ? The definition (Equation 10.3) gives
那么f * f是多少?定义(公式 10.3)给出
Figure 10.11 shows the two cases of this integral. The two boxes might have zero overlap, which happens when x ≤ –1 or x ≥ 1; in this case, the result is zero. When –1 < x < 1, the overlap depends on the separation between the two boxes, which is |x|; the result is 1 –|x|. So
图 10.11显示了此积分的两种情况。两个盒子可能没有重叠,这种情况发生在x ≤ –1 或x ≥ 1 时;在这种情况下,结果为零。当 –1 < x < 1 时,重叠取决于两个盒子之间的间隔,即|x| ;结果为 1 – |x| 。所以
Figure 10.11. Convolving two boxes yields a tent function.
图 10.11.卷积两个盒子产生一个帐篷函数。
This function, known as the tent function, is another common filter (see Section 10.3.1).
该函数称为tent 函数,是另一种常见的过滤器(参见第 10.3.1 节)。
In discrete convolution, we saw that the discrete impulse d acted as an identity: d * a = a. In the continuous case, there is also an identity function, called the Dirac impulse or Dirac delta function, denoted δ(x).
在离散卷积中,我们看到离散脉冲d充当恒等式: d * a = a 。在连续情况下,也有一个恒等函数,称为狄拉克脉冲或狄拉克德尔塔函数,表示为δ( x )。
Intuitively, the delta function is a very narrow, very tall spike that has infinitesimal width but still has area equal to 1 (Figure 10.12). The key defining property of the delta function is that multiplying it by a function selects out the value exactly at zero:
直观地看,delta 函数是一个非常窄、非常高的尖峰,其宽度无穷小,但面积仍等于 1(图 10.12 )。delta 函数的关键定义属性是,将其乘以一个函数会选出恰好在零处的值:
Figure 10.12. The Dirac delta function δ(x).
图 10.12。狄拉克函数 δ( x )。
The delta function does not have a well-defined value at 0 (you can think of its value loosely as +∞), but it does have the value δ(x) = 0 for all x ≠ 0.
该 delta 函数在 0 处没有明确定义的值(您可以将其值大致视为 +∞),但对于所有x ≠ 0,它都有值 δ( x ) = 0。
From this property of selecting out single values, it follows that the delta function is the identity for continuous convolution (Figure 10.13), because convolving δ with any function f yields
从这个选择单个值的性质可以看出,delta函数是连续卷积的恒等式(图10.13 ),因为用任何函数f卷积δ都会得到
Figure 10.13. Convolving a function with δ(X) returns a copy of the same function.
图 10.13.将函数与 δ( X ) 卷积会返回同一函数的副本。
So δ * f = f (and because of commutativity f * δ = f also).
所以 δ * f = f (并且由于交换性, f * δ = f )。
There are two ways to connect the discrete and continuous worlds. One is sampling: we convert a continuous function into a discrete one by writing down the function’s value at all integer arguments and forgetting about the rest. Given a continuous function f (x), we can sample it to convert to a discrete sequence a[i]:
有两种方法可以连接离散世界和连续世界。一种是采样:我们将连续函数转换为离散函数,方法是记下函数在所有整数参数处的值,而忽略其余部分。给定一个连续函数f ( x ),我们可以对其进行采样以转换为离散序列a [ i ]:
Going the other way, from a discrete function, or sequence, to a continuous function, is called reconstruction. This is accomplished using yet another form of convolution, the discrete-continuous form. In this case, we are filtering a discrete sequence a[i] with a continuous filter f (x):
反过来,从离散函数或序列到连续函数,称为重建。这是使用另一种形式的卷积,即离散-连续形式来实现的。在本例中,我们使用连续滤波器f ( x ) 来过滤离散序列a [ i ]:
The value of the reconstructed function a * f at x is a weighted sum of the samples a[i] for values of i near x (Figure 10.14). The weights come from the filter f , which is evaluated at a set of points spaced one unit apart. For example, if x = 5.3 and f has radius 2, f is evaluated at 1.3, 0.3, –0.7, and –1.7. Note that for discrete-continuous convolution, we generally write the sequence first and the filter second, so that the sum is over integers.
重构函数a * f在x处的值是样本a [ i ] 在x附近的i值的加权和(图 10.14 )。权重来自滤波器f ,它在相隔一个单位的一组点处求值。例如,如果x = 5.3 且f的半径为 2,则f在 1.3、0.3、-0.7 和 -1.7 处求值。请注意,对于离散-连续卷积,我们通常先写出序列,然后再写出滤波器,这样和就超出了整数范围。
Figure 10.14. Discrete-continuous convolution.
图 10.14.离散-连续卷积。
As with discrete convolution, we can put bounds on the sum if we know the filter’s radius, r, eliminating all points where the difference between x and i is at least r:
与离散卷积一样,如果我们知道滤波器的半径r ,我们可以对总和设置界限,消除x和i之间的差异至少为r 的所有点:
Note, that if a point falls exactly at distance r from x (i.e., if x – r turns out to be an integer), it will be left out of the sum. This is in contrast to the discrete case, where we included the point at i – r.
请注意,如果某个点与x的距离恰好为r (即,如果x – r为整数),则该点将不包含在和中。这与离散情况相反,在离散情况下,我们将i – r处的点包括在内。
Expressed in code, this is
用代码来表达就是
function reconstruct(sequence a, filter f , real x)
函数重建(序列a ,过滤器f ,实数x )
s = 0
s = 0
r = f.radius
r = f.半径
for i = x – r to x + r do
对于i = x – r到x + r执行
s = s + a[i]f (x – i)
s = s + a [ i ] f ( x - i )
return s
返回
As with the other forms of convolution, discrete-continuous convolution may be seen as summing shifted copies of the filter (Figure 10.15):
与其他形式的卷积一样,离散-连续卷积可以看作是滤波器移位副本的总和(图 10.15 ):
Figure 10.15. Reconstruction (discrete-continuous convolution) as a sum of shifted copies of the filter.
图 10.15.重建(离散-连续卷积)作为滤波器移位副本的总和。
Discrete-continuous convolution is closely related to splines. For uniform splines (a uniform B-spline, for instance), the parameterized curve for the spline is exactly the convolution of the spline’s basis function with the control point sequence (see Section 15.6.2).
离散-连续卷积与样条函数密切相关。对于均匀样条函数(例如均匀 B 样条函数),样条函数的参数化曲线恰好是样条函数基函数与控制点序列的卷积(参见第 15.6.2 节)。
So far, everything we have said about sampling and reconstruction has been one-dimensional: there has been a single variable x or a single sequence index i. Many of the important applications of sampling and reconstruction in graphics, though, are applied to two-dimensional functions—in particular, to 2D images. Fortunately, the generalization of sampling algorithms and theory from 1D to 2D, 3D, and beyond is conceptually very simple.
到目前为止,我们讨论的关于采样和重构的所有内容都是一维的:只有一个变量x或一个序列索引i 。然而,采样和重构在图形学中的许多重要应用都适用于二维函数——特别是二维图像。幸运的是,从一维到二维、三维及更高维的采样算法和理论的推广在概念上非常简单。
Beginning with the definition of discrete convolution, we can generalize it to two dimensions by making the sum into a double sum:
从离散卷积的定义开始,我们可以将其推广到二维,将和变成二重和:
If b is a finitely supported filter of radius r (i.e., it has (2r +1)2 values), then we can write this sum with bounds (Figure 10.16):
如果b是半径为r的有限支持过滤器(即,它有 (2 r +1) 2 个值),那么我们可以写出有界限的和(图 10.16 ):
Figure 10.16. The weights for the nine input samples that contribute to the discrete convolution at point (i, j ) with a filter b of radius 1.
图 10.16在点 ( i , j ) 处对离散卷积有贡献的 9 个输入样本的权重,其中滤波器b的半径为 1。
2 Note that the term “Fourier transform” is used both for the function and for the operation that computes from f. Unfortunately, this rather ambiguous usage is standard.
2请注意,“傅里叶变换”一词既用于表示函数f ^并计算f ^来自f 。不幸的是,这种相当模糊的用法是标准的。
and express it in code:
并用代码表达:
function convolve2d(sequence2d a, filter2d b,int i,int j)
函数convolve2d(sequence2d a , filter2d b ,int i ,int j )
s = 0
s = 0
r = b.radius
r = b .半径
for i′ = i – r to i + r do
对于i′ = i – r至i + r执行
for j′ = j – r to j + r do
对于j′ = j – r到j + r做
s = s + a[i′][j′]b[i – i′][j – j′]
s = s + a [ i′ ][ j′ ] b [ i – i′ ][ j – j′ ]
return s
返回
This definition can be interpreted in the same way as in the 1D case: each output sample is a weighted average of an area in the input, using the 2D filter as a “mask” to determine the weight of each sample in the average.
这个定义可以按照与一维情况相同的方式解释:每个输出样本是输入中某个区域的加权平均值,使用二维过滤器作为“掩码”来确定平均值中每个样本的权重。
Continuing the generalization, we can write continuous-continuous (Figure 10.17) and discrete-continuous (Figure 10.18) convolutions in 2D as well:
继续概括,我们也可以以二维形式写出连续-连续(图 10.17 )和离散-连续(图 10.18 )卷积:
Figure 10.17. The weight for an infinitesimal area in the input signal resulting from continuous convolution at (x, y).
图 10.17.在 ( x, y ) 处进行连续卷积后输入信号中无穷小区域的权重。
Figure 10.18. The weights for the 16 input samples that contribute to the discrete-continuous convolution at point (x, y) for a reconstruction filter of radius 2.
图 10.18.对于半径为 2 的重建滤波器,对点 ( x, y ) 处的离散-连续卷积有贡献的 16 个输入样本的权重。
In each case, the result at a particular point is a weighted average of the input near that point. For the continuous-continuous case, it is a weighted integral over a region centered at that point, and in the discrete-continuous case, it is a weighted average of all the samples that fall near the point.
在每种情况下,特定点的结果都是该点附近输入的加权平均值。对于连续-连续情况,它是以该点为中心的区域上的加权积分;在离散-连续情况下,它是该点附近所有样本的加权平均值。
Once we have gone from 1D to 2D, it should be fairly clear how to generalize further to 3D or even to higher dimensions.
一旦我们从一维转到二维,就应该很清楚如何进一步推广到三维甚至更高的维度。
Now that we have the machinery of convolution, let’s examine some of the particular filters commonly used in graphics.
现在我们已经了解了卷积机制,让我们来研究一下图形中常用的一些特殊过滤器。
Each of the following filters has a natural radius, which is the default size to be used for sampling or reconstruction when samples are spaced one unit apart. In this section, filters are defined at this natural size: for instance, the box filter has a natural radius of , and the cubic filters have a natural radius of 2. We also arrange for each filter to integrate to 1: , as required for sampling and reconstruction without changing a signal’s average value.
以下每个滤波器都有一个自然半径,这是当样本间隔一个单位时用于采样或重建的默认大小。在本节中,滤波器以此自然尺寸定义:例如,箱式滤波器的自然半径为1 2 ,并且立方滤波器的自然半径为 2。我们还安排每个滤波器积分为 1: ∫十= 0 ∞ f (十) d十= 1 ,因为需要在不改变信号平均值的情况下进行采样和重构。
As we will see in Section 10.4.3, some applications require filters of different sizes, which can be obtained by scaling the basic filter. For a filter f (x), we can define a version of scale s:
正如我们将在10.4.3 节中看到的,一些应用需要不同大小的滤波器,这可以通过缩放基本滤波器来获得。对于滤波器f ( x ),我们可以定义一个版本的缩放s :
The filter is stretched horizontally by a factor of s and then squashed vertically by a factor so that its area is unchanged. A filter that has a natural radius of r and is used at scale s has a radius of support sr (see Figure 10.20).
过滤器在水平方向上被拉伸s倍,在垂直方向上被压缩1 s使得其面积不变。具有自然半径r且在比例s处使用的滤波器具有支持半径sr (参见图 10.20 )。
The box filter (Figure 10.19) is a piecewise constant function whose integral is equal to one. As a discrete filter, it can be written as
盒式滤波器(图 10.19 )是一个分段常数函数,其积分等于一。作为离散滤波器,它可以写成
Figure 10.19. The discrete and continuous box filters.
图 10.19.离散和连续盒式滤波器。
Note that for symmetry, we include both endpoints.
请注意,为了对称,我们包括两个端点。
As a continuous filter, we write
作为连续滤波器,我们写
In this case, we exclude one endpoint, which makes the box of radius 0.5 usable as a reconstruction filter. It is because the box filter is discontinuous that these boundary cases are important, and so for this particular filter, we need to pay attention to them. We write just fbox for the natural radius of .
在这种情况下,我们排除了一个端点,这使得半径为 0.5 的盒子可用作重建过滤器。由于盒子过滤器是不连续的,因此这些边界情况很重要,因此对于这个特定的过滤器,我们需要注意它们。我们只将f box写为r = 1 2 。
The tent, or linear filter (Figure 10.20), is a continuous, piecewise linear function:
帐篷滤波器或线性滤波器(图 10.20 )是一个连续的分段线性函数:
Figure 10.20. The tent filter and two scaled versions.
图 10.20.帐篷过滤器和两个缩放版本。
Its natural radius is 1. For filters, such as this one, that are at least C0 (i.e., there are no sudden jumps in the value, as there are with the box), we no longer need to separate the definitions of the discrete and continuous filters: the discrete filter is just the continuous filter sampled at the integers.
它的自然半径是 1。对于像这样的过滤器,如果其至少为C 0 (即,值不会像盒子那样出现突然的跳跃),我们不再需要分离离散过滤器和连续过滤器的定义:离散过滤器就是在整数处采样的连续过滤器。
The Gaussian function (Figure 10.21), also known as the normal distribution, is an important filter theoretically and practically. We’ll see more of its special properties as this chapter goes on:
高斯函数(图 10.21 ),又称正态分布,在理论和实践上都是一个重要的滤波器。本章后面我们将会看到更多它的特殊性质:
Figure 10.21. The Gaussian filter.
图 10.21.高斯滤波器。
The parameter σ is called the standard deviation. The Gaussian makes a good sampling filter because it is very smooth; we’ll make this statement more precise in Section 10.5.
参数 σ 称为标准差。高斯滤波器是一种很好的采样滤波器,因为它非常平滑;我们将在第 10.5 节中更精确地阐述这一表述。
The Gaussian filter does not have any particular natural radius; it is a useful sampling filter for a range of σ. The Gaussian also does not have a finite radius of support, although because of the exponential decay, its values rapidly become small enough to ignore. When necessary, then, we can trim the tails from the function by setting it to zero outside some radius r, resulting in a trimmed Gaussian. This means that the filter’s width and natural radius can vary depending on the application, and a trimmed Gaussian scaled by s is the same as an unscaled trimmed Gaussian with standard deviation sσ and radius sr. The best way to handle this in practice is to let σ and r be set as properties of the filter, fixed when the filter is specified, and then scale the filter just like any other when it is applied.
高斯滤波器没有任何特定的自然半径;它是 σ 范围的有用采样滤波器。高斯也没有有限的支撑半径,尽管由于指数衰减,其值会迅速变得足够小以至可以忽略。然后,在必要时,我们可以通过将函数的尾部设置为某个半径r之外的零来修剪函数的尾部,从而得到修剪高斯。这意味着滤波器的宽度和自然半径可以根据应用而变化,并且按s缩放的修剪高斯与标准差为sσ 、半径为sr 的未缩放修剪高斯相同。在实践中处理此问题的最佳方法是让 σ 和r设置为滤波器的属性,在指定滤波器时固定,然后在应用滤波器时像其他滤波器一样缩放滤波器。
Good starting points are σ = 1 and r = 3.
好的起点是 σ = 1 和r = 3。
Many filters are defined as piecewise polynomials, and cubic filters with four pieces (natural radius of 2) are often used as reconstruction filters. One such filter is known as the B-spline filter (Figure 10.22) because of its origins as a blending function for spline curves (see Chapter 15):
许多滤波器被定义为分段多项式,而具有四段(自然半径为 2)的三次滤波器通常用作重构滤波器。其中一种滤波器被称为 B 样条滤波器(图 10.22 ),因为它起源于样条曲线的混合函数(参见第 15 章):
Figure 10.22. The B-spline filter.
图 10.22. B 样条滤波器。
Among piecewise cubics, the B-spline is special because it has continuous first and second derivatives—that is, it is C2. A more concise way of defining this filter is fB = fbox * fbox * fbox * fbox; proving that the longer form above is equivalent is a nice exercise in convolution (see Exercise 3).
在分段三次函数中,B 样条函数比较特殊,因为它具有连续的一阶和二阶导数,即C 2 。定义此滤波器的更简洁的方法是f B = f box * f box * f box * f box ;证明上述较长形式是等效的是一个很好的卷积练习(参见练习 3)。
Another piecewise cubic filter named for a spline, the Catmull–Rom filter (Figure 10.23), has the value zero at x = –2, –1, 1,and 2, which means it will interpolate the samples when used as a reconstruction filter (Section 10.3.2):
另一个以样条函数命名的分段三次滤波器是 Catmull-Rom 滤波器(图 10.23 ),在x = –2、-1、1 和 2 处其值为零,这意味着它在用作重建滤波器时将对样本进行插值(第 10.3.2 节):
Figure 10.23. The Catmull–Rom filter.
图 10.23.Catmull -Rom 滤波器。
For the all-important application of resampling images, Mitchell and Netravali (1988) made a study of cubic filters and recommended one partway between the previous two filters as the best all-around choice (Figure 10.24). It is simply a weighted combination of the previous two filters:
对于重采样图像这一极为重要的应用,Mitchell 和 Netravali (1988) 对立方滤波器进行了研究,并推荐使用前两个滤波器之间的一个滤波器作为最佳的全面选择(图 10.24 )。它只是前两个滤波器的加权组合:
Figure 10.24. The Mitchell–Netravali filter.
图 10.24. Mitchell-Netravali 滤波器。
Filters have some traditional terminology that goes with them, which we use to describe the filters and compare them to one another.
过滤器有一些与之相关的传统术语,我们用这些术语来描述过滤器并将它们相互比较。
The impulse response of a filter is just another name for the function: it is the response of the filter to a signal that just contains an impulse (and recall that convolving with an impulse just gives back the filter).
这滤波器的脉冲响应只是函数的另一个名称:它是滤波器对仅包含脉冲的信号的响应(并且回想一下,与脉冲卷积只会返回滤波器)。
A continuous filter is interpolating if, when it is used to reconstruct a continuous function from a discrete sequence, the resulting function takes on exactly the values of the samples at the sample points—that is, it “connects the dots” rather than producing a function that only goes near the dots. Interpolating filters are exactly those filters f for which f (0) = 1 and f (i) = 0 for all nonzero integers i (Figure 10.25).
如果连续滤波器用于从离散序列重建连续函数,则生成的函数在采样点处精确取样的值(即,它“连接点”,而不是生成仅在点附近的函数),则该滤波器为插值滤波器。插值滤波器正是那些滤波器f ,对于所有非零整数i, f (0) = 1 和f ( i ) = 0(图 10.25 )。
Figure 10.25. An interpolating filter reconstructs the sample points exactly because it has the value zero at all nonzero integer offsets from the center.
图 10.25.插值滤波器准确地重建了样本点,因为它在距中心的所有非零整数偏移处都具有零值。
A filter that takes on negative values has ringing or overshoot: it will produce extra oscillations in the value around sharp changes in the value of the function being filtered.
取负值的滤波器会产生振铃或过冲:它会在被滤波函数值的急剧变化附近产生额外的值振荡。
For instance, the Catmull–Rom filter has negative lobes on either side, and if you filter a step function with it, it will exaggerate the step a bit, resulting in function values that undershoot 0 and overshoot 1 (Figure 10.26).
例如,Catmull-Rom 滤波器两侧均有负叶,如果用它来滤波阶跃函数,它会稍微夸大阶跃,导致函数值低于 0 并超过 1(图 10.26 )。
Figure 10.26. A filter with negative lobes will always produce some overshoot when filtering or reconstructing a sharp discontinuity.
图 10.26.具有负叶的滤波器在滤波或重建尖锐不连续性时总会产生一些过冲。
A continuous filter is ripple free if, when used as a reconstruction filter, it will reconstruct a constant sequence as a constant function (Figure 10.27). This is equivalent to the requirement that the filter sum to one on any integer-spaced grid:
如果连续滤波器用作重构滤波器时,它将重构常数序列作为常数函数(图 10.27 ),则该滤波器无波纹。这相当于要求滤波器在任何整数间隔网格上的总和为 1:
Figure 10.27. The tent filter of radius 1 is a ripple-free reconstruction filter; the Gaussian filter with standard deviation 1/2 is not.
图 10.27.半径为 1 的帐篷滤波器是无波纹重构滤波器;标准偏差为 1/2 的高斯滤波器则不是。
All the filters in Section 10.3.1 are ripple-free at their natural radii, except the Gaussian, but none of them are necessarily ripple-free when they are used at a non-integer scale. If it is necessary to eliminate ripple in discrete-continuous convolution, it is easy to do so: divide each computed sample by the sum of the weights used to compute it:
10.3.1 节中的所有滤波器在其自然半径上都是无波纹的(高斯滤波器除外),但在非整数尺度上使用时,它们都不一定无波纹。如果需要消除离散-连续卷积中的波纹,则很容易做到:将每个计算样本除以用于计算它的权重之和:
This expression can still be interpreted as convolution between a and a filter (see Exercise 6).
这个表达式仍然可以解释为与滤波器之间的卷积f ¯ (见练习6)。
A continuous filter has a degree of continuity, which is the highest-order derivative that is defined everywhere. A filter, like the box filter, that has sudden jumps in its value is not continuous at all. A filter that is continuous but has sharp corners (discontinuities in the first derivative), such as the tent filter, has order of continuity zero, and we say it is C0. A filter that has a continuous derivative (no sharp corners), such as the piecewise cubic filters in the previous section, is C1; if its second derivative is also continuous, as is true of the B-spline filter, it is C2. The order of continuity of a filter is particularly important for a reconstruction filter because the reconstructed function inherits the continuity of the filter.
连续过滤器具有连续度,它是处处都有定义的最高阶导数。像盒子滤波器这样值有突然跳跃的滤波器根本不连续。连续但有尖角(一阶导数不连续)的滤波器,如帐篷滤波器,其连续阶为零,我们称其为C 0 。具有连续导数(没有尖角)的滤波器,如上一节中的分段三次滤波器,为C 1 ;如果它的二阶导数也是连续的,就像 B 样条滤波器一样,则为C 2 。滤波器的连续阶对于重构滤波器尤为重要,因为重构函数继承了滤波器的连续性。
So far we have only discussed filters for 1D convolution, but for images and other multidimensional signals, we need filters too. In general, any 2D function could be a 2D filter, and occasionally it is useful to define them this way. But, in most cases, we can build suitable 2D (or higher-dimensional) filters from the 1D filters we have already seen.
到目前为止,我们仅讨论了 1D 卷积的滤波器,但对于图像和其他多维信号,我们也需要滤波器。一般来说,任何 2D 函数都可以是 2D 滤波器,有时以这种方式定义它们很有用。但在大多数情况下,我们可以从我们已经看到的 1D 滤波器构建合适的 2D(或更高维度)滤波器。
The most useful way of doing this is by using a separable filter. The value of a separable filter f2(x, y) at a particular x and y is simply the product of f1 (the 1D filter) evaluated at x and at y:
最有用的方法是使用可分离滤波器。可分离滤波器f 2 ( x, y ) 在特定x和y处的值只是f 1 (一维滤波器) 在x和y处求值的乘积:
Similarly, for discrete filters,
类似地,对于离散滤波器,
Any horizontal or vertical slice through f2 is a scaled copy of f1. The integral of f2 is the square of the integral of f1, so in particular, if f1 is normalized, then so is f2.
任何水平或垂直切分f 2都是f 1的缩放副本。f 2的积分是f 1积分的平方,因此,如果f 1是标准化的,则f 2也是标准化的。
Example 22 (The separable tent filter)
实施例22 (可分离帐篷过滤器)
If we choose the tent function for f1, the resulting piecewise bilinear function (Figure 10.28) is
如果我们为f 1选择帐篷函数,则得到的分段双线性函数(图 10.28 )为
Figure 10.28. The separable 2D tent filter.
图 10.28.可分离的 2D 帐篷滤波器。
The profiles along the coordinate axes are tent functions, but the profiles along the diagonals are quadratics (for instance, along the line x = y in the positive quadrant, we see the quadratic function (1 – x)2).
沿坐标轴的轮廓是帐篷函数,但沿对角线的轮廓是二次函数(例如,沿正象限的直线x = y ,我们看到二次函数 (1 – x ) 2 )。
Example 23 (The 2D Gaussian filter)
示例 23(2D 高斯滤波器)
If we choose the Gaussian function for f1, the resulting 2D function (Figure 10.29) is
如果我们选择高斯函数作为f 1 ,则得到的二维函数(图 10.29 )为
Figure 10.29. The 2D Gaussian filter, which is both separable and radially symmetric.
图 10.29. 2D 高斯滤波器,可分离且径向对称。
Notice that this is (up to a scale factor) the same function we would get if we revolved the 1D Gaussian around the origin to produce a circularly symmetric function. The property of being both circularly symmetric and separable at the same time is unique to the Gaussian function. The profiles along the coordinate axes are Gaussians, but so are the profiles along any direction at any offset from the center.
请注意,如果我们将一维高斯函数绕原点旋转以产生圆对称函数,则该函数(最多一个比例因子)与我们得到的函数相同。圆对称和可分离的特性是高斯函数所独有的。沿坐标轴的轮廓是高斯函数,但沿任何方向偏离中心的轮廓也是高斯函数。
The key advantage of separable filters over other 2D filters has to do with efficiency in implementation. Let’s substitute the definition of a2 into the definition of discrete convolution:
可分离滤波器相对于其他 2D 滤波器的主要优势在于实现效率。让我们将2的定义代入离散卷积的定义中:
Note that b1[i–i′] does not depend on j′ and can be factored out of the inner sum:
请注意, b 1 [ i–i ′] 不依赖于j ′,可以从内部和中分解出来:
Let’s abbreviate the inner sum as S[i′]:
我们将内部和简化为S [ i ′]:
With the equation in this form, we can first compute and store S[i′] for each value of i′ and then compute the outer sum using these stored values. At first glance, this does not seem remarkable, since we still had to do work proportional to (2r +1)2 to compute all the inner sums. However, it’s quite different if we want to compute the value at many points [i, j].
利用这种形式的方程,我们可以先计算并存储每个i ′ 值的S [ i ′],然后使用这些存储的值计算外层和。乍一看,这似乎并不奇怪,因为我们仍然需要做与 (2 r +1) 2 成比例的工作来计算所有内层和。然而,如果我们想计算多个点 [ i, j ] 的值,情况就大不相同了。
Suppose we need to compute a ⋆ b2 at [2, 2] and [3, 2], and b1 has a radius of 2. Examining Equation 10.5, we can see that we will need S[0],...,S[4] to compute the result at [2, 2], and we will need S[1],...,S[5] to compute the result at [3, 2]. So, in the separable formulation, we can just compute all six values of S and share S[1],...,S[4] (Figure 10.30).
假设我们需要在 [2, 2] 和 [3, 2] 处计算a ⋆ b 2 ,且b 1的半径为 2 。检查公式 10.5 ,我们可以看到,我们需要S [0], ...,S [4] 来计算 [2, 2] 处的结果,而我们需要S [1], ...,S [5] 来计算 [3, 2] 处的结果。因此,在可分离公式中,我们只需计算S的所有六个值并共享S [1], ...,S [4] (图 10.30 )。
Figure 10.30. Computing two output points using separate 2D arrays of 25 samples (a) vs. filtering once along the columns and then using separate 1D arrays of five samples (b).
图 10.30.使用 25 个样本的独立 2D 数组计算两个输出点 (a) vs. 沿列过滤一次,然后使用 5 个样本的独立 1D 数组计算两个输出点 (b)。
This savings has great significance for large filters. Filtering an image with a filter of radius r in the general case requires computation of (2r +1)2 products per pixel, while filtering the image with a separable filter of the same size requires 2(2r +1) products (at the expense of some intermediate storage). This change in asymptotic complexity from O(r2) to O(r) enables the use of much larger filters.
这种节省对于大型过滤器来说意义重大。一般情况下,使用半径为r的过滤器过滤图像需要计算每个像素 (2 r +1) 2 个乘积,而使用相同大小的可分离过滤器过滤图像则需要 2(2 r +1) 个乘积(以牺牲一些中间存储为代价)。这种从O ( r 2 ) 到O ( r ) 的渐近复杂度变化使我们能够使用更大的过滤器。
The algorithm is
该算法是
function filterImage(image I, filter b)
函数filterImage(图像I ,过滤器b )
r = b.radius
r = b.半径
nx = I.width
n x = I.宽度
ny = I.height
n y = I.高度
allocate storage array S[0 ... nx – 1]
分配存储数组S [0 ... n x – 1]
allocate image Iout[r ... nx – r – 1,r ... ny – r – 1]
分配图像I out [ r ... n x – r – 1, r ... n y – r – 1]
initialize S and Iout to all zero
将S和I初始化为全零
for j = r to ny – r – 1 do
对于j = r至n y – r – 1执行
for i′ = 0 to nx – 1 do
对于i′ = 0 至n x – 1
S[i′] = 0
S [ i′ ] = 0
for j′ = j – r to j + r do
对于j′ = j – r到j + r做
S[i′] = S[i′]+ I[i′,j′]b[j – j′]
S [ i′ ] = S [ i′ ] + I [ i′ , j′ ] b [ j – j′ ]
for i = r to nx – r – 1 do
对于i = r至n x – r – 1执行
for i′ = i – r to i + r do
对于i′ = i – r至i + r执行
Iout[i, j] = Iout[i, j]+ S[i′]b[i – i′]
I出[ i, j ] = I出[ i, j ] + S [ i′ ] b [ i – i′ ]
return Iout
返回我
For simplicity, this function avoids all questions of boundaries by trimming r pixels off all four sides of the output image. In practice, there are various ways to handle the boundaries; see Section 10.4.3.
为简单起见,此函数通过修剪输出图像四边的r 个像素来避免所有边界问题。实际上,有多种方法可以处理边界;请参阅第 10.4.3 节。
We have discussed sampling, filtering, and reconstruction in the abstract so far, using mostly 1D signals for examples. But as we observed at the beginning of this chapter, the most important and most common application of signal processing in graphics is for sampled images. Let us look carefully at how all this applies to images.
到目前为止,我们已经抽象地讨论了采样、滤波和重构,主要使用一维信号作为例子。但正如我们在本章开头所观察到的,信号处理在图形中最重要和最常见的应用是采样图像。让我们仔细看看这一切如何应用于图像。
Perhaps the simplest application of convolution is processing images using discrete convolution. Some of the most widely used features of image manipulation programs are simple convolution filters. Blurring of images can be achieved by convolving with many common low-pass filters, ranging from the box to the Gaussian (Figure 10.31). A Gaussian filter creates a very smooth-looking blur and is commonly used for this purpose.
卷积最简单的应用可能是使用离散卷积处理图像。图像处理程序中最广泛使用的功能之一是简单的卷积滤波器。图像模糊可以通过与许多常见的低通滤波器进行卷积来实现,这些滤波器的范围从框到高斯(图 10.31 )。高斯滤波器可产生非常平滑的模糊效果,通常用于此目的。
Figure 10.31. Blurring an image by convolution with each of three different filters.
图 10.31.通过与三个不同的过滤器进行卷积来模糊图像。
The opposite of blurring is sharpening, and one way to do this is by using the “unsharp mask” procedure: subtract a fraction α of a blurred image from the original. With a rescaling to avoid changing the overall brightness, we have
模糊的反义词是锐化,其中一种方法是使用“反锐化蒙版”程序:从原始图像中减去模糊图像的分数α。通过重新缩放以避免改变整体亮度,我们有
where fg,σ is the Gaussian filter of width σ. Using the discrete impluse d and the distributive property of convolution, we were able to write this whole process as a single filter that depends on both the width of the blur and the degree of sharpening (Figure 10.32).
其中f g,σ是宽度为 σ 的高斯滤波器。利用离散脉冲d和卷积的分布特性,我们可以将整个过程写成一个取决于模糊宽度和锐化程度的滤波器(图 10.32 )。
Figure 10.32. Sharpening an image using a convolution filter.
图 10.32.使用卷积滤波器锐化图像。
Another example of combining two discrete filters is a drop shadow. It’s common to take a blurred, shifted copy of an object’s outline to create a soft drop shadow (Figure 10.33). We can express the shifting operation as convolution with an off-center impulse:
另一个将两个离散滤镜组合在一起的例子是阴影。通常采用模糊、移位的对象轮廓副本来创建柔和的阴影(图 10.33 )。我们可以将移位操作表示为与偏心脉冲的卷积:
Shifting, then blurring, is achieved by convolving with both filters:
通过对两个过滤器进行卷积,可以实现移位和模糊:
Here, we have used associativity to group the two operations into a single filter with three parameters.
在这里,我们使用结合性将两个操作分组为一个具有三个参数的过滤器。
In image synthesis, we often have the task of producing a sampled representation of an image for which we have a continuous mathematical formula (or at least a procedure we can use to compute the color at any point, not just at integer pixel positions). Ray tracing is a common example; more about ray tracing and the specific methods for antialiasing is in Chapter 4. In the language of signal processing, we have a continuous 2D signal (the image) that we need to sample on a regular 2D lattice. If we go ahead and sample the image without any special measures, the result will exhibit various aliasing artifacts (Figure 10.34). At sharp edges in the image, we see stair-step artifacts known as “jaggies.” In areas where there are repeating patterns, we see wide bands known as moiré patterns.
在图像合成中,我们经常需要对一幅图像进行采样表示,对于该图像,我们有一个连续的数学公式(或者至少有一个可以用来计算任意点的颜色的过程,而不仅仅是整数像素位置的颜色)。光线追踪是一个常见的例子;有关光线追踪和抗锯齿的具体方法的更多信息请参见第 4 章。用信号处理的语言来说,我们有一个连续的二维信号(图像),需要在规则的二维晶格上对其进行采样。如果我们继续对图像进行采样而不采取任何特殊措施,结果将会出现各种混叠伪影(图 10.34 )。在图像的锐利边缘,我们会看到称为“锯齿”的阶梯状伪影。在有重复图案的区域,我们会看到称为莫尔纹的宽带。
Figure 10.34. Two artifacts of aliasing in images: moiré patterns in periodic textures (a), and “jaggies” on straight lines (b).
图 10.34图像中的两种混叠伪影:周期性纹理中的莫尔条纹 (a) 和直线上的“锯齿状” (b)。
The problem here is that the image contains too many small-scale features; we need to smooth it out by filtering it before sampling. Looking back at the definition of continuous convolution in Equation (10.3), we need to average the image over an area around the pixel location, rather than just taking the value at a single point. The specific methods for doing this are discussed in Chapter 4. A simple filter like a box will improve the appearance of sharp edges, but it still produces some moiré patterns (Figure 10.35). The Gaussian filter, which is very smooth, is much more effective against the moiré patterns, at the expense of overall somewhat more blurring. These two examples illustrate the tradeoff between sharpness and aliasing that is fundamental to choosing antialiasing filters.
这里的问题是图像包含太多小尺度特征;我们需要在采样之前通过滤波使其变得平滑。回顾公式 (10.3) 中连续卷积的定义,我们需要对图像在像素位置周围的区域求平均值,而不是仅取单个点的值。执行此操作的具体方法将在第 4 章中讨论。像盒子这样的简单过滤器将改善锐利边缘的外观,但它仍然会产生一些莫尔条纹(图 10.35 )。高斯滤波器非常平滑,对莫尔条纹更有效,但代价是整体上会更模糊。这两个示例说明了清晰度和混叠之间的权衡,这是选择抗锯齿滤波器的基础。
Figure 10.35. A comparison of three different sampling filters being used to antialias a difficult test image that contains circles that are spaced closer and closer as they get larger.
图 10.35.比较了三种不同的采样过滤器对一个困难的测试图像进行抗锯齿处理的情况,该图像中包含的圆圈随着圆圈变大而间距越来越近。
One of the most common image operations where careful filtering is crucial is resampling—changing the sample rate, or changing the image size.
最常见的图像操作之一是重采样,即更改采样率或更改图像大小,其中仔细过滤至关重要。
Suppose we have taken an image with a digital camera that is 3000 by 2000 pixels in size, and we want to display it on a monitor that has only 1280 by 1024 pixels. In order to make it fit, while maintaining the 3:2 aspect ratio, we need to resample it to 1278 by 852 pixels. How should we go about this?
假设我们用数码相机拍摄了一张 3000 x 2000 像素的图像,并且想要在只有 1280 x 1024 像素的显示器上显示它。为了使其适合,同时保持 3:2 的宽高比,我们需要将其重新采样为 1278 x 852 像素。我们应该怎么做呢?
One way to approach this problem is to think of the process as dropping pixels: the size ratio is between 2 and 3, so we’ll have to drop out one or two pixels between pixels that we keep. It’s possible to shrink an image in this way, but the quality of the result is low—the images in Figure 10.34 were made using pixel dropping. Pixel dropping is very fast, however, and it is a reasonable choice to make a preview of the resized image during an interactive manipulation.
解决这个问题的一种方法是将该过程视为丢弃像素:尺寸比在 2 到 3 之间,因此我们必须在保留的像素之间丢弃一两个像素。可以用这种方式缩小图像,但结果质量很低——图 10.34中的图像就是使用像素丢弃制作的。但是,像素丢弃非常快,在交互式操作期间预览调整大小后的图像是一个合理的选择。
The way to think about resizing images is as a resampling operation: we want a set of samples of the image on a particular grid that is defined by the new image dimensions, and we get them by sampling a continuous function that is reconstructed from the input samples (Figure 10.36). Looking at it this way, it’s just a sequence of standard image processing operations: first, we reconstruct a continuous function from the input samples, and then, we sample that function just as we would sample any other continuous image. To avoid aliasing artifacts, appropriate filters need to be used at each stage.
调整图像大小可以看作是重新采样操作:我们需要一组由新图像尺寸定义的特定网格上的图像样本,我们通过对从输入样本重建的连续函数进行采样来获得它们(图 10.36 )。从这个角度来看,它只是一系列标准图像处理操作:首先,我们从输入样本重建一个连续函数,然后,我们像对任何其他连续图像进行采样一样对该函数进行采样。为了避免混叠伪影,每个阶段都需要使用适当的过滤器。
Figure 10.36. Resampling an image consists of two logical steps that are combined into a single operation in code. First, we use a reconstruction filter to define a smooth, continuous function from the input samples. Then, we sample that function on a new grid to get the output samples.
图 10.36。对图像进行重新采样包括两个逻辑步骤,这两个步骤在代码中合并为一个操作。首先,我们使用重建滤波器从输入样本中定义一个平滑、连续的函数。然后,我们在新的网格上对该函数进行采样以获取输出样本。
A small example is shown in Figure 10.37: if the original image is 12 × 9 pixels and the new one is 8 × 6 pixels, there are 2/3 as many output pixels as input pixels in each dimension, so their spacing across the image is 3/2 the spacing of the original samples.
图 10.37显示了一个小例子:如果原始图像为 12×9 像素,而新图像为 8×6 像素,则在每个维度上,输出像素的数量是输入像素的 2/3,因此它们在图像上的间距是原始样本间距的 3/2。
Figure 10.37. The sample locations for the input and output grids in resampling a 12 by 9 image to make an8by6one.
图 10.37.将 12×9 图像重新采样为 8×6 图像时输入和输出网格的采样位置。
In order to come up with a value for each of the output samples, we need to somehow compute values for the image in between the samples. The pixel-dropping algorithm gives us one way to do this: just take the value of the closest sample in the input image and make that the output value. This is exactly equivalent to reconstructing the image with a 1-pixel-wide (radius one-half) box filter and then point sampling.
为了得出每个输出样本的值,我们需要以某种方式计算样本之间的图像值。像素丢弃算法为我们提供了一种方法:只需取输入图像中最接近的样本的值并将其作为输出值。这完全等同于使用 1 像素宽(半径一半)的盒式过滤器重建图像,然后进行点采样。
Of course, if the main reason for choosing pixel dropping or other very simple filtering is performance, one would never implement that method as a special case of the general reconstruction-and-resampling procedure. In fact, because of the discontinuities, it’s difficult to make box filters work in a general framework. But, for high-quality resampling, the reconstruction/sampling framework provides valuable flexibility.
当然,如果选择像素丢弃或其他非常简单的过滤的主要原因是性能,那么人们永远不会将该方法作为一般重建和重采样过程的特殊情况来实现。事实上,由于不连续性,很难让盒式过滤器在一般框架中工作。但是,对于高质量的重采样,重建/采样框架提供了宝贵的灵活性。
To work out the algorithmic details, it’s simplest to drop down to 1D and discuss resampling a sequence. The simplest way to write an implementation is in terms of the reconstruct function we defined in Section 10.2.5.
要弄清算法细节,最简单的方法是降到 1D 并讨论对序列进行重采样。编写实现的最简单方法是根据我们在10.2.5 节中定义的重构函数。
function resample(sequence a,float x0,float Δx,int n, filter f )
函数重采样(序列a ,浮点数x 0 ,浮点数 Δ x ,整数n ,过滤器f )
create sequence b of length n
创建长度为n的序列b
for i = 0 to n – 1 do
对于i = 0 至n – 1,执行
b[i]= reconstruct(a, f, x0 + iΔx)
b [ i ]= 重建( a, f, x 0 + i Δ x )
return b
返回b
The parameter x0 gives the position of the first sample of the new sequence in terms of the samples of the old sequence. That is, if the first output sample falls midway between samples 3 and 4 in the input sequence, x0 is 3.5.
参数x 0给出新序列中第一个样本相对于旧序列样本的位置。也就是说,如果第一个输出样本位于输入序列中样本 3 和样本 4 的中间,则x 0为 3.5。
This procedure reconstructs a continuous image by convolving the input sequence with a continuous filter and then point samples it. That’s not to say that these two operations happen sequentially—the continuous function exists only in principle, and its values are computed only at the sample points. But mathematically, this function computes a set of point samples of the function a * f.
此过程通过将输入序列与连续滤波器进行卷积来重建连续图像,然后对其进行点采样。这并不是说这两个操作是按顺序发生的——连续函数仅在理论上存在,并且其值仅在采样点处计算。但从数学上讲,此函数计算函数a * f的一组点样本。
This point sampling seems wrong, though, because we just finished saying that a signal should be sampled with an appropriate smoothing filter to avoid aliasing. We should be convolving the reconstructed function with a sampling filter g and point sampling g * (f * a). But since this is the same as (g * f) * a, we can roll the sampling filter together with the reconstruction filter; one convolution operation is all we need (Figure 10.38). This combined reconstruction and sampling filter is known as a resampling filter.
不过,这种点采样似乎是错误的,因为我们刚刚说过,应该用适当的平滑滤波器对信号进行采样以避免混叠。我们应该用采样滤波器g和点采样g * ( f * a ) 卷积重构函数。但由于这与 ( g * f ) * a相同,我们可以将采样滤波器与重构滤波器一起滚动;我们只需要一个卷积运算(图 10.38 )。这种组合的重构和采样滤波器称为重采样滤波器。
Figure 10.38. Resampling involves filtering for reconstruction and for sampling. Since two convolution filters applied in sequence can be replaced with a single filter, we only need one resampling filter, which serves the roles of reconstruction and sampling.
图 10.38。重采样涉及重建和采样的过滤。由于两个按顺序应用的卷积滤波器可以用一个滤波器代替,因此我们只需要一个重采样滤波器,它起到重建和采样的作用。
When resampling images, we usually specify a source rectangle in the units of the old image that specifies the part we want to keep in the new image. For example, using the pixel sample positioning convention from Chapter 3, the rectangle we’d use to resample the entire image is . Given a source rectangle (xl, xh) × (yl, yh), the sample spacing for the new image is in x and in y. The lower-left sample is positioned at (xl +Δx/2,yl +Δy/2).
重新采样图像时,我们通常会以旧图像的单位指定源矩形,以指定我们想要在新图像中保留的部分。例如,使用第 3 章中的像素样本定位约定,我们用于重新采样整个图像的矩形是 (−0.5nxold−0.5)×(−0.5nyold−0.5)。给定源矩形 (x, x h ) × (y, y h ),新图像的样本间距为Δ十= (十时长−十升) / n十新的在x和Δ是= (是时长−是升) / n是新的在y中。左下角的样本位于 (x +Δ x/ 2,y +Δ y/ 2)。
Modifying the 1D pseudocode to use this convention and expanding the call to the reconstruct function into the double loop that is implied, we arrive at
修改 1D 伪代码以使用此约定,并将对重建函数的调用扩展为隐含的双循环,我们得到
function resample(sequence a,float xl,float xh,int n, filter f )
函数重采样(序列a ,浮点数 x,浮点数x h ,整数n ,过滤器f )
create sequence b of length n
创建长度为n的序列b
r = f.radius
r = f.半径
x0 = xl +Δx/2
x 0 = x +Δx / 2
for i = 0 to n – 1 do
对于i = 0 至n – 1,执行
s = 0
s = 0
x = x0 + iΔx
x = x 0 + i Δ x
for j = x – r to x + r do
对于j = x – r到x + r做
s = s + a[j]f (x – j)
s = s + a [ j ] f ( x - j )
b[i]= s
b [ i ]= s
return b
返回b
This routine contains all the basics of resampling an image. One last issue that remains to be addressed is what to do at the edges of the image, where the simple version here will access beyond the bounds of the input sequence. There are several things we might do:
此例程包含对图像进行重新采样的所有基础知识。最后一个需要解决的问题是如何处理图像的边缘,这里的简单版本将访问输入序列的边界之外的内容。我们可以做几件事:
Just stop the loop at the ends of the sequence. This is equivalent to padding the image with zeros on all sides.
只需在序列的末尾停止循环即可。这相当于在图像的所有边上都用零填充。
Clip all array accesses to the end of the sequence—that is, return a[0] when we would want to access a[–1]. This is equivalent to padding the edges of the image by extending the last row or column.
将所有数组访问限制到序列末尾 — 即,当我们想要访问[ -1] 时,返回[ 0]。这相当于通过扩展最后一行或最后一列来填充图像的边缘。
Modify the filter as we approach the edge so that it does not extend beyond the bounds of the sequence.
当我们接近边缘时修改过滤器,以使其不超出序列的边界。
The first option leads to dim edges when we resample the whole image, which is not really satisfactory. The second option is easy to implement; the third is probably the best performing. The simplest way to modify the filter near the edge of the image is to renormalize it: divide the filter by the sum of the part of the filter that falls within the image. This way, the filter always adds up to 1 over the actual image samples, so it preserves image intensity. For performance, it is desirable to handle the band of pixels within a filter radius of the edge (which require this renormalization) separately from the center (which contains many more pixels and does not require renormalization).
第一个选项会导致当我们重新采样整个图像时边缘变得暗淡,这确实不令人满意。第二个选项很容易实现;第三个选项可能是性能最好的。修改图像边缘附近的过滤器的最简单方法是重新规范化它:将过滤器除以过滤器落在图像内的部分的总和。这样,过滤器在实际图像样本上的总和总是 1,因此它可以保留图像强度。为了提高性能,最好将边缘的过滤器半径内的像素带(需要这种重新规范化)与中心(包含更多像素并且不需要重新规范化)分开处理。
The choice of filter for resampling is important. There are two separate issues: the shape of the filter and the size (radius). Because the filter serves both as a reconstruction filter and a sampling filter, the requirements of both roles affect the choice of filter. For reconstruction, we would like a filter smooth enough to avoid aliasing artifacts when we enlarge the image, and the filter should be ripple-free. For sampling, the filter should be large enough to avoid undersampling and smooth enough to avoid moiré artifacts. Figure 10.39 illustrates these two different needs.
选择用于重采样的滤波器非常重要。有两个独立的问题:滤波器的形状和大小(半径)。由于滤波器既用作重建滤波器,又用作采样滤波器,因此这两个角色的要求会影响滤波器的选择。对于重建,我们希望滤波器足够平滑,以避免在放大图像时出现混叠伪影,并且滤波器应该无波纹。对于采样,滤波器应该足够大以避免欠采样,并且足够平滑以避免莫尔条纹伪影。图10.39说明了这两种不同的需求。
Figure 10.39. The effects of using different sizes of a filter for upsampling (enlarging) or down-sampling (reducing) an image.
图 10.39.使用不同大小的过滤器对图像进行上采样(放大)或下采样(缩小)的效果。
Generally, we will choose one filter shape and scale it according to the relative resolutions of the input and output. The lower of the two resolutions determines the size of the filter: when the output is more coarsely sampled than the input (downsampling, or shrinking the image), the smoothing required for proper sampling is greater than the smoothing required for reconstruction, so we size the filter according to the output sample spacing (radius 3 in Figure 10.39). On the other hand, when the output is more finely sampled (upsampling, or enlarging the image), the smoothing required for reconstruction dominates (the reconstructed function is already smooth enough to sample at a higher rate than it started), so the size of the filter is determined by the input sample spacing (radius 1 in Figure 10.39).
通常,我们会选择一种滤波器形状,并根据输入和输出的相对分辨率对其进行缩放。两个分辨率中较低的一个决定了滤波器的大小:当输出比输入采样更粗(下采样或缩小图像)时,正确采样所需的平滑大于重建所需的平滑,因此我们根据输出样本间距(图 10.39中的半径 3)确定滤波器的大小。另一方面,当输出采样更精细(上采样或放大图像)时,重建所需的平滑占主导地位(重建函数已经足够平滑,可以以比开始时更高的速率进行采样),因此滤波器的大小由输入样本间距(图 10.39中的半径 1)决定。
Choosing the filter itself is a tradeoff between speed and quality. Common choices are the box filter (when speed is paramount), the tent filter (moderate quality), or a piecewise cubic (excellent quality). In the piecewise cubic case, the degree of smoothing can be adjusted by interpolating between fB and fC; the Mitchell–Netravali filter is a good choice.
选择滤波器本身就是在速度和质量之间进行权衡。常见的选择是箱式滤波器(当速度至关重要时)、帐篷滤波器(中等质量)或分段三次滤波器(极好质量)。在分段三次滤波器的情况下,可以通过在 f B和f C之间进行插值来调整平滑度;Mitchell-Netravali 滤波器是一个不错的选择。
Just as with image filtering, separable filters can provide a significant speedup. The basic idea is to resample all the rows first, producing an image with changed width but not height, then to resample the columns of that image to produce the final result (Figure 10.40). Modifying the pseudocode given earlier so that it takes advantage of this optimization is reasonably straightforward.
和图像过滤一样,可分离过滤器可以显著提高速度。基本思想是先对所有行重新采样,生成宽度改变但高度不变的图像,然后对该图像的列重新采样以生成最终结果(图 10.40 )。修改前面给出的伪代码以利用此优化相当简单。
Figure 10.40. Resampling an image using a separable approach.
图 10.40.使用可分离方法对图像进行重新采样。
If you are only interested in implementation, you can stop reading here; the algorithms and recommendations in the previous sections will let you implement programs that perform sampling and reconstruction and achieve excellent results. However, there is a deeper mathematical theory of sampling with a history reaching back to the first uses of sampled representations in telecommunications. Sampling theory answers many questions that are difficult to answer with reasoning based strictly on scale arguments.
如果您只对实现感兴趣,那么可以在此处停止阅读;前面几节中的算法和建议将让您实现执行采样和重构的程序并取得出色的结果。然而,采样有一个更深层次的数学理论,其历史可以追溯到电信中首次使用采样表示。采样理论回答了许多难以通过严格基于尺度论证的推理来回答的问题。
But most important, sampling theory gives valuable insight into the workings of sampling and reconstruction. It gives the student who learns it an extra set of intellectual tools for reasoning about how to achieve the best results with the most efficient code.
但最重要的是,抽样理论为抽样和重构的工作原理提供了宝贵的见解。它为学习它的学生提供了一套额外的智力工具,用于推理如何用最有效的代码实现最佳结果。
The Fourier transform, along with convolution, is the main mathematical concept that underlies sampling theory. You can read about the Fourier transform in many math books on analysis, as well as in books on signal processing.
傅里叶变换与卷积一起是采样理论的主要数学概念。您可以在许多有关分析的数学书籍以及有关信号处理的书籍中阅读有关傅里叶变换的内容。
The basic idea behind the Fourier transform is to express any function by adding together sine waves (sinusoids) of all frequencies. By using the appropriate weights for the different frequencies, we can arrange for the sinusoids to add up to any (reasonable) function we want.
傅里叶变换的基本思想是将所有频率的正弦波(正弦曲线)相加来表达任何函数。通过对不同频率使用适当的权重,我们可以将正弦曲线相加,得到我们想要的任何(合理)函数。
As an example, the square wave in Figure 10.41 can be expressed by a sequence of sine waves:
例如,图 10.41中的方波可以用一系列正弦波来表示:
Figure 10.41. Approximating a square wave with finite sums of sines.
图 10.41.用有限正弦和近似方波。
This Fourier series starts with a sine wave (sin 2πx) that has frequency 1.0—same as the square wave—and the remaining terms add smaller and smaller corrections to reduce the ripples and, in the limit, reproduce the square wave exactly. Note that all the terms in the sum have frequencies that are integer multiples of the frequency of the square wave. This is because other frequencies would produce results that don’t have the same period as the square wave.
此傅里叶级数以频率为 1.0(与方波相同)的正弦波(sin 2π x )开始,其余项添加越来越小的校正以减少波纹,并在极限情况下精确再现方波。请注意,总和中的所有项的频率都是方波频率的整数倍。这是因为其他频率会产生与方波周期不同的结果。
A surprising fact is that a signal does not have to be periodic in order to be expressed as a sum of sinusoids in this way: a non-periodic signal just requires more sinusoids. Rather than summing over a discrete sequence of sinusoids, we will instead integrate over a continuous family of sinusoids. For instance, a box function can be written as the integral of a family of cosine waves:
一个令人惊讶的事实是,信号不必是周期性的,才能以这种方式表示为正弦波的总和:非周期性信号只需要更多的正弦波。我们不会对离散的正弦波序列求和,而是对连续的正弦波系列进行积分。例如,盒子函数可以写成余弦波系列的积分:
This integral in Equation (10.6) is adding up infinitely many cosines, weighting the cosine of frequency u by the weight (sin πu)/πu. The result, as we include higher and higher frequencies, converges to the box function (see Figure 10.42). When a function f is expressed in this way, this weight, which is a function of the frequency u, is called the Fourier transform of f , denoted . The function tells us how to build f by integrating over a family of sinusoids:
方程 (10.6) 中的积分是将无穷多个余弦相加,用权重 (sin π u ) /πu加权频率u的余弦。随着我们纳入越来越高的频率,结果收敛到盒子函数(见图10.42 )。当函数f以这种方式表示时,这个权重(即频率u的函数)称为f的傅里叶变换,表示为f ^ . 函数f ^告诉我们如何通过对一系列正弦曲线进行积分来构建f :
Figure 10.42. Approximating a box function with integrals of cosines up to each of four cutoff frequencies.
图 10.42.使用四个截止频率的余弦积分来近似一个盒函数。
Equation (10.7) is known as the inverse Fourier transform (IFT) because it starts with the Fourier transform of f and ends up with f.2
方程 (10.7) 被称为逆傅里叶变换(IFT),因为它从f的傅里叶变换开始,到f 2结束。
Note that in Equation (10.7), the complex exponential e2πiux has been substituted for the cosine in the previous equation. Also, is a complex-valued function. The machinery of complex numbers is required to allow the phase, as well as the frequency, of the sinusoids to be controlled; this is necessary to represent any functions that are not symmetric across zero. The magnitude of is known as the Fourier spectrum, and, for our purposes, this is sufficient—we won’t need to worry about phase or use any complex numbers directly.
请注意,在方程 (10.7) 中,复指数e 2 π iux已取代前一个方程中的余弦。另外, f ^是一个复值函数。复数机制需要允许控制正弦波的相位和频率;这对于表示任何不对称于零的函数都是必要的。 f ^被称为傅里叶谱,对于我们的目的来说,这已经足够了——我们不需要担心相位或直接使用任何复数。
It turns out that computing from f looks very much like computing f from :
事实证明,计算f ^从f看起来很像从计算f f ^ :
Equation (10.8) is known as the (forward) Fourier transform (FT). The sign in the exponential is the only difference between the forward and inverse Fourier transforms, and it is really just a technical detail. For our purposes, we can think of the FT and IFT as the same operation.
方程 (10.8) 称为(正向)傅里叶变换(FT)。指数中的符号是正向和逆傅里叶变换之间的唯一区别,这实际上只是一个技术细节。就我们的目的而言,我们可以将 FT 和 IFT 视为相同的操作。
Sometimes, the f – notation is inconvenient, and then, we will denote the Fourier transform of f by and the inverse Fourier transform of by .
有时, f – f ^符号不方便,因此,我们将f的傅里叶变换表示为ℱ { f }以及逆傅里叶变换f ^经过ℱ − 1 { f ^ } 。
A function and its Fourier transform are related in many useful ways. A few facts (most of them easy to verify) that we will use later in this chapter are
函数和其傅里叶变换在许多有用的方面相关。我们将在本章后面使用的一些事实(其中大多数很容易验证)是
A function and its Fourier transform have the same squared integral:
一个函数和它的傅里叶变换具有相同的平方积分:
The physical interpretation is that the two have the same energy (Figure 10.43).
物理解释是两者具有相同的能量(图10.43 )。
Figure 10.43. The Fourier transform preserves the squared integral of the signal.
图 10.43.傅里叶变换保留了信号的平方积分。
In particular, scaling a function up by a also scales its Fourier transform by a. That is, .
具体来说,将函数放大也会将其傅里叶变换放大。也就是说, ℱ {一个f } =一个ℱ ( f ) 。
Stretching a function along the x-axis squashes its Fourier transform along the u-axis by the same factor (Figure 10.44):
沿x轴拉伸函数会以相同的倍数挤压其沿u轴的傅里叶变换(图 10.44 ):
Figure 10.44. Scaling a signal along the x-axis in the space domain causes an inverse scale along the u-axis in the frequency domain.
图 10.44.在空间域中沿x轴缩放信号会导致在频域中沿u轴进行反向缩放。
(The renormalization by b is needed to keep the energy the same.)
(需要通过b进行重正化才能保持能量相同。)
This means that if we are interested in a family of functions of different width and height (say all box functions centered at zero), then we only need to know the Fourier transform of one canonical function (say the box function with width and height equal to one), and we can easily know the Fourier transforms of all the scaled and dilated versions of that function. For example, we can instantly generalize Equation (10.6) to give the Fourier transform of a box of width b and height a:
这意味着,如果我们对宽度和高度不同的函数系列感兴趣(比如所有以零为中心的盒子函数),那么我们只需要知道一个标准函数的傅里叶变换(比如宽度和高度等于一的盒子函数),我们就可以很容易地知道该函数所有缩放和扩张版本的傅里叶变换。例如,我们可以立即推广公式 (10.6) 以给出宽度为b和高度为 a的盒子的傅里叶变换:
The average value of f is equal to (0). This makes sense since (0) is supposed to be the zero-frequency component of the signal (the DC component if we are thinking of an electrical voltage).
f的平均值等于f ^ (0)。这是有道理的,因为f ^ (0) 被认为是信号的零频率分量(如果我们考虑电压,则是直流分量)。
If f is real (which it always is for us), is an even function—that is, . Likewise, if f is an even function, then will be real (this is not usually the case in our domain, but remember that we really are only going to care about the magnitude of ).
如果f是实数(对我们来说它总是实数), f ^是偶函数,即f ^ (你) = f ^ ( −你) 。同样,如果f是偶函数,则f ^将是真实的(在我们的领域中通常不是这种情况,但请记住,我们真正关心的只是f ^ )。
One final property of the Fourier transform that deserves special mention is its relationship to convolution (Figure 10.45). Briefly,
傅里叶变换值得特别提及的最后一个特性是它与卷积的关系(图 10.45 )。简而言之,
Figure 10.45. A commutative diagram to show visually the relationship between convolution and multiplication. If we multiply f and g in space, then transform to frequency, we end up in the same place as if we transformed f and g to frequency and then convolved them. Likewise, if we convolve f and g in space and then transform into frequency, we end up in the same place as if we transformed f and g to frequency, then multiplied them.
图 10.45。交换图直观地显示了卷积和乘法之间的关系。如果我们在空间中将f和g相乘,然后将其转换为频率,我们最终得到的结果与我们将f和g转换为频率然后对其进行卷积的结果相同。同样,如果我们在空间中将f和g卷积,然后将其转换为频率,我们最终得到的结果与我们将f和g转换为频率然后对其进行乘法的结果相同。
The Fourier transform of the convolution of two functions is the product of the Fourier transforms. Following the by now familiar symmetry,
两个函数卷积的傅里叶变换是傅里叶变换的乘积。根据现在熟悉的对称性,
The convolution of two Fourier transforms is the Fourier transform of the product of the two functions. These facts are fairly straightforward to derive from the definitions.
两个傅里叶变换的卷积是两个函数乘积的傅里叶变换。从定义中可以相当直接地推导出这些事实。
This relationship is the main reason Fourier transforms are useful in studying the effects of sampling and reconstruction. We’ve seen how sampling, filtering, and reconstruction can be seen in terms of convolution; now the Fourier transform gives us a new domain—the frequency domain—in which these operations are simply products.
这种关系是傅里叶变换在研究采样和重构效果方面非常有用的主要原因。我们已经了解了如何从卷积的角度看待采样、滤波和重构;现在傅里叶变换为我们提供了一个新的域——频域——在这些域中,这些操作只是乘积。
Now that we have some facts about Fourier transforms, let’s look at some examples of individual functions. In particular, we’ll look at some filters from Section 10.3.1, which are shown with their Fourier transforms in Figure 10.46. We have already seen the box function:
现在我们已经了解了一些有关傅里叶变换的事实,让我们看一些单个函数的例子。特别是,我们将查看第 10.3.1 节中的一些过滤器,它们及其傅里叶变换如图 10.46所示。我们已经看到了 box 函数:
Figure 10.46. The Fourier transforms of the box, tent, B-spline, and Gaussian filters.
图 10.46.盒子、帐篷、B 样条和高斯滤波器的傅里叶变换。
The function3 sin x/x is important enough to have its own name, sinc x.
函数3 sin x/x非常重要,有自己的名字,sinc x 。
The tent function is the convolution of the box with itself, so its Fourier transform is just the square of the Fourier transform of the box function:
帐篷函数是盒子与自身的卷积,因此它的傅里叶变换只是盒子函数的傅里叶变换的平方:
3 You may notice that sin πu/πu is undefined for u = 0. It is, however, continuous across zero, and we take it as understood that we use the limiting value of this ratio, 1, at u = 0.
3您可能会注意到,当u = 0 时,sin π u /π u没有定义。然而,它在零点处是连续的,并且我们认为在u = 0 时我们使用这个比率的极限值 1。
We can continue this process to get the Fourier transform of the B-spline filter (see Exercise 3):
我们可以继续这个过程来得到 B 样条滤波器的傅里叶变换(参见练习 3):
The Gaussian has a particularly nice Fourier transform:
高斯有一个特别好的傅里叶变换:
It is another Gaussian! The Gaussian with standard deviation 1.0 becomes a Gaussian with standard deviation 1/2π.
这是另一个高斯!标准差为 1.0 的高斯变为标准差为 1 / 2π 的高斯。
The reason impulses are useful in sampling theory is that we can use them to talk about samples in the context of continuous functions and Fourier transforms. We represent a sample, which has a position and a value, by an impulse translated to that position and scaled by that value. A sample at position a with value b is represented by bδ(x – a). This way we can express the operation of sampling the function f (x) at a as multiplying f by δ(x – a). The result is f (a)δ(x – a).
脉冲在采样理论中很有用,因为我们可以用它们在连续函数和傅里叶变换的背景下讨论样本。我们用一个脉冲来表示一个具有位置和值的样本,该脉冲平移到该位置并按该值缩放。位置a处值为b 的样本用bδ ( x – a ) 表示。这样,我们可以将在a处对函数f ( x ) 进行采样的操作表示为将f乘以 δ( x – a )。结果是f ( a )δ( x – a )。
Sampling a function at a series of equally spaced points is therefore expressed as multiplying the function by the sum of a series of equally spaced impulses, called an impulse train (Figure 10.47). An impulse train with period T , meaning that the impulses are spaced a distance T apart, is
因此,在一系列等距点上对函数进行采样可以表示为将函数乘以一系列等距脉冲的总和,这称为脉冲序列(图 10.47 )。周期为T 的脉冲序列(即脉冲间隔距离为T )是
Figure 10.47. Impulse trains. The Fourier transform of an impulse train is another impulse train. Changing the period of the impulse train in space causes an inverse change in the period in frequency.
图 10.47。脉冲序列。脉冲序列的傅里叶变换是另一个脉冲序列。改变空间中脉冲序列的周期会导致频率周期的反向变化。
The Fourier transform of s1 is the same as s1: a sequence of impulses at all integer frequencies. You can see why this should be true by thinking about what happens when we multiply the impulse train by a sinusoid and integrate. We wind up adding up the values of the sinusoid at all the integers. This sum will exactly cancel to zero for non-integer frequencies, and it will diverge to +∞ for integer frequencies.
s 1的傅里叶变换与s 1相同:所有整数频率的脉冲序列。通过思考当我们将脉冲序列乘以正弦波并积分时会发生什么,您可以明白为什么这应该是正确的。我们最终将所有整数的正弦波值相加。对于非整数频率,此和将恰好抵消为零,对于整数频率,它将发散到 +∞。
Because of the dilation property of the Fourier transform, we can guess that the Fourier transform of an impulse train with period T (which is like a dilation of s1) is an impulse train with period 1/T . Making the sampling finer in the space domain makes the impulses farther apart in the frequency domain.
由于傅里叶变换的膨胀特性,我们可以猜测周期为T 的脉冲序列的傅里叶变换(类似于s 1的膨胀)是周期为 1 /T的脉冲序列。在空间域中使采样更精细会使脉冲在频域中相距更远。
Now that we have built the mathematical machinery, we need to understand the sampling and reconstruction process from the viewpoint of the frequency domain. The key advantage of introducing Fourier transforms is that it makes the effects of convolution filtering on the signal much clearer, and it provides more precise explanations of why we need to filter when sampling and reconstructing.
现在我们已经建立了数学机制,我们需要从频域的角度来理解采样和重构过程。引入傅里叶变换的关键优势在于它使卷积滤波对信号的影响更加清晰,并更精确地解释了为什么我们在采样和重构时需要滤波。
We start the process with the original, continuous signal. In general, its Fourier transform could include components at any frequency, although for most kinds of signals (especially images), we expect the content to decrease as the frequency gets higher. Images also tend to have a large component at zero frequency—remember that the zero-frequency, or DC, component is the integral of the whole image, and since images are all positive values this tends to be a large number.
我们从原始的连续信号开始这个过程。一般来说,它的傅里叶变换可以包含任何频率的分量,尽管对于大多数类型的信号(尤其是图像),我们预计内容会随着频率的升高而减少。图像在零频率处也往往有一个很大的分量——请记住,零频率或 DC 分量是整个图像的积分,由于图像都是正值,所以这个数字往往很大。
Let’s see what happens to the Fourier transform if we sample and reconstruct without doing any special filtering (Figure 10.48). When we sample the signal, we model the operation as multiplication with an impulse train; the sampled signal is fsT . Because of the multiplication-convolution property, the FT of the sampled signal is .
让我们看看如果我们在不进行任何特殊滤波的情况下进行采样和重构,傅里叶变换会发生什么情况(图 10.48 )。当我们对信号进行采样时,我们将运算建模为与脉冲序列相乘;采样信号为fs T 。由于乘法卷积特性,采样信号的 FT 为f ^ * s电视^ = f ^ * s 1 /电视。
Figure 10.48. Sampling and reconstruction with no filtering. Sampling produces alias spectra that overlap and mix with the base spectrum. Reconstruction with a box filter collects even more information from the alias spectra. The result is a signal that has serious aliasing artifacts.
图 10.48。无滤波的采样和重构。采样会产生与基谱重叠和混合的混叠谱。使用盒式滤波器进行重构可以从混叠谱中收集更多信息。结果是信号具有严重的混叠伪影。
Recall that δ is the identity for convolution. This means that
回想一下,δ 是卷积的恒等式。这意味着
that is, convolving with the impulse train makes a whole series of equally spaced copies of the spectrum of f . A good intuitive interpretation of this seemingly odd result is that all those copies just express the fact (as we saw back in Section 10.1.1) that frequencies that differ by an integer multiple of the sampling frequency are indistinguishable once we have sampled—they will produce exactly the same set of samples. The original spectrum is called the base spectrum, and the copies are known as alias spectra.
也就是说,与脉冲序列进行卷积会产生一系列等间距的f频谱副本。对这个看似奇怪的结果,一个很好的直观解释是,所有这些副本都只是表达了一个事实(正如我们在10.1.1 节中看到的那样),即一旦我们采样,相差采样频率整数倍的频率是无法区分的——它们将产生完全相同的样本集。原始频谱称为基 光谱, 其 副本 称为别名 光谱.
The trouble begins if these copies of the signal’s spectrum overlap, which will happen if the signal contains any significant content beyond half the sample frequency. When this happens, the spectra add, and the information about different frequencies is irreversibly mixed up. This is the first place aliasing can occur, and if it happens here, it’s due to undersampling—using too low a sample frequency for the signal.
如果信号频谱的这些副本重叠,问题就开始了。如果信号包含任何超过采样频率一半的重要内容,就会发生这种情况。当这种情况发生时,频谱会叠加,不同频率的信息会不可逆地混合在一起。这是混叠可能发生的第一个地方,如果混叠发生在这里,那是由于采样不足——对信号使用的采样频率太低。
Suppose we reconstruct the signal using the nearest-neighbor technique. This is equivalent to convolving with a box of width 1. (The discrete-continuous convolution used to do this is the same as a continuous convolution with the series of impulses that represent the samples.) The convolution-multiplication property means that the spectrum of the reconstructed signal will be the product of the spectrum of the sampled signal and the spectrum of the box. The resulting reconstructed Fourier transform contains the base spectrum (though somewhat attenuated at higher frequencies), plus attenuated copies of all the alias spectra. Because the box has a fairly broad Fourier transform, these attenuated bits of alias spectra are significant, and they are the second form of aliasing, due to an inadequate reconstruction filter. These alias components manifest themselves in the image as the pattern of squares that is characteristic of nearest-neighbor reconstruction.
假设我们使用最近邻技术重建信号。这相当于与宽度为 1 的盒子进行卷积。(用于执行此操作的离散-连续卷积与与代表样本的一系列脉冲进行连续卷积相同。)卷积乘法属性意味着重建信号的频谱将是采样信号的频谱与盒子频谱的乘积。由此产生的重建傅里叶变换包含基本频谱(尽管在较高频率下有所衰减),以及所有混叠频谱的衰减副本。由于盒子具有相当宽的傅里叶变换,这些混叠频谱的衰减位很重要,它们是混叠的第二种形式,这是由于重建滤波器不足造成的。这些混叠分量在图像中表现为正方形图案,这是最近邻重建的特征。
To do high-quality sampling and reconstruction, we have seen that we need to choose sampling and reconstruction filters appropriately. From the standpoint of the frequency domain, the purpose of low-pass filtering when sampling is to limit the frequency range of the signal so that the alias spectra do not overlap the base spectrum. Figure 10.49 shows the effect of sample rate on the Fourier transform of the sampled signal. Higher sample rates move the alias spectra farther apart, and eventually, whatever overlap is left does not matter.
为了进行高质量的采样和重构,我们已经看到,我们需要适当地选择采样和重构滤波器。从频域的角度来看,采样时低通滤波的目的是限制信号的频率范围,使混叠频谱不与基频谱重叠。图 10.49显示了采样率对采样信号的傅里叶变换的影响。较高的采样率会使混叠频谱相距更远,最终,无论剩下什么重叠都无关紧要。
Figure 10.49. The effect of sample rate on the frequency spectrum of the sampled signal. Higher sample rates push the copies of the spectrum apart, reducing problems caused by overlap.
图 10.49.采样率对采样信号频谱的影响。较高的采样率会将频谱副本拉开,从而减少重叠引起的问题。
The key criterion is that the width of the spectrum must be less than the distance between the copies—that is, the highest frequency present in the signal must be less than half the sample frequency. This is known as the Nyquist criterion, and the highest allowable frequency is known as the Nyquist frequency or Nyquist limit. The Nyquist–Shannon sampling theorem states that a signal whose frequencies do not exceed the Nyquist limit (or, said another way, a signal that is bandlimited to the Nyquist frequency) can, in principle, be reconstructed exactly from samples.
关键标准是频谱宽度必须小于副本之间的距离,也就是说,信号中存在的最高频率必须小于采样频率的一半。这被称为奈奎斯特标准,最高允许频率称为奈奎斯特频率或奈奎斯特极限。奈奎斯特-香农采样定理指出,频率不超过奈奎斯特极限的信号(或者换句话说,带宽限制为奈奎斯特频率的信号)原则上可以从样本中精确重建。
With a high enough sample rate for a particular signal, we don’t need to use a sampling filter. But if we are stuck with a signal that contains a wide range of frequencies (such as an image with sharp edges in it), we must use a sampling filter to bandlimit the signal before we can sample it. Figure 10.50 shows the effects of three low-pass (smoothing) filters in the frequency domain, and Figure 10.51 shows the effect of using these same filters when sampling. Even if the spectra overlap without filtering, convolving the signal with a low-pass filter can narrow the spectrum enough to eliminate overlap and produce a well-sampled representation of the filtered signal. Of course, we have lost the high frequencies, but that’s better than having them get scrambled with the signal and turn into artifacts.
如果特定信号的采样率足够高,我们就不需要使用采样滤波器。但是,如果我们遇到包含很宽频率范围的信号(例如,其中有锐利边缘的图像),则必须使用采样滤波器对信号进行带宽限制,然后才能对其进行采样。图 10.50显示了频域中三个低通(平滑)滤波器的效果,图 10.51显示了在采样时使用这些相同滤波器的效果。即使频谱在没有滤波的情况下重叠,使用低通滤波器对信号进行卷积也可以将频谱缩小到足以消除重叠并产生滤波信号的良好采样表示。当然,我们失去了高频,但这比让它们与信号混杂在一起并变成伪影要好。
Figure 10.50. Applying low-pass (smoothing) filters narrows the frequency spectrum of a signal.
图 10.50.应用低通(平滑)滤波器可缩小信号的频谱。
Figure 10.51. How the low-pass filters from Figure 10.50 prevent aliasing during sampling. Low-pass filtering narrows the spectrum so that the copies overlap less, and the high frequencies from the alias spectra interfere less with the base spectrum.
图 10.51。图 10.50中的低通滤波器如何防止采样期间出现混叠。低通滤波使频谱变窄,从而使副本重叠更少,混叠频谱中的高频对基频谱的干扰更少。
From the frequency domain perspective, the job of a reconstruction filter is to remove the alias spectra while preserving the base spectrum. In Figure 10.48, we can see that the crudest reconstruction filter, the box, does attenuate the alias spectra. Most important, it completely blocks the DC spike for all the alias spectra. This is a characteristic of all reasonable reconstruction filters: they have zeroes in frequency space at all multiples of the sample frequency. This turns out to be equivalent to the ripple-free property in the space domain.
从频域角度来看,重构滤波器的作用是去除混叠频谱,同时保留基频。在图 10.48中,我们可以看到最粗糙的重构滤波器(盒子)确实会衰减混叠频谱。最重要的是,它完全阻止了所有混叠频谱的直流尖峰。这是所有合理重构滤波器的一个特性:它们在频率空间中在采样频率的所有倍数处都有零。这实际上等同于空间域中的无波纹特性。
So a good reconstruction filter needs to be a good low-pass filter, with the added requirement of completely blocking all multiples of the sample frequency. The purpose of using a reconstruction filter different from the box filter is to more completely eliminate the alias spectra, reducing the leakage of high-frequency artifacts into the reconstructed signal, while disturbing the base spectrum as little as possible. Figure 10.52 illustrates the effects of different filters when used during reconstruction. As we have seen, the box filter is quite “leaky” and results in plenty of artifacts even if the sample rate is high enough. The tent filter, resulting in linear interpolation, attenuates high frequencies more, resulting in milder artifacts, and the B-spline filter is very smooth, controlling the alias spectra very effectively. It also smooths the base spectrum some—this is the tradeoff between smoothing and aliasing that we saw earlier.
因此,一个好的重建滤波器需要是一个好的低通滤波器,另外还要求完全阻止所有采样频率的倍数。使用不同于盒式滤波器的重建滤波器的目的是更彻底地消除混叠频谱,减少高频伪影泄漏到重建信号中,同时尽可能少地干扰基频。图 10.52说明了在重建过程中使用不同滤波器的效果。正如我们所见,盒式滤波器非常“泄漏”,即使采样率足够高也会产生大量伪影。帐篷滤波器导致线性插值,对高频的衰减更大,导致伪影更温和,而 B 样条滤波器非常平滑,可以非常有效地控制混叠频谱。它还会平滑一些基频 - 这就是我们之前看到的平滑和混叠之间的权衡。
Figure 10.52. The effects of different reconstruction filters in the frequency domain. A good reconstruction filter attenuates the alias spectra effectively while preserving the base spectrum.
图 10.52.频域中不同重构滤波器的效果。良好的重构滤波器可有效衰减混叠频谱,同时保留基本频谱。
When the operations of reconstruction and sampling are combined in resampling, the same principles apply, but with one filter doing the work of both reconstruction and sampling. Figure 10.53 illustrates how a resampling filter must remove the alias spectra and leave the spectrum narrow enough to be sampled at the new sample rate.
当重建和采样操作在重采样中结合在一起时,同样的原理也适用,但一个滤波器同时完成重建和采样的工作。图 10.53说明了重采样滤波器必须如何去除混叠频谱并使频谱足够窄以便以新的采样率进行采样。
Figure 10.53. Resampling viewed in the frequency domain. The resampling filter both reconstructs the signal (removes the alias spectra) and bandlimits it (reduces its width) for sampling at the new rate.
图 10.53.在频域中查看重采样。重采样滤波器既重建信号(删除混叠频谱),又对其进行带宽限制(减小其宽度),以便以新的速率进行采样。
Following the frequency domain analysis to its logical conclusion, a filter that is exactly a box in the frequency domain is ideal for both sampling and reconstruction. Such a filter would prevent aliasing at both stages without diminishing the frequencies below the Nyquist frequency at all.
根据频域分析得出的逻辑结论,频域中恰好是盒子的滤波器对于采样和重构都是理想的。这样的滤波器可以防止两个阶段的混叠,而不会降低奈奎斯特频率以下的频率。
Recall that the inverse and forward Fourier transforms are essentially identical, so the spatial domain filter that has a box as its Fourier transform is the function sin πx/πx = sinc πx.
回想一下,逆傅里叶变换和正傅里叶变换本质上是相同的,因此以框作为傅里叶变换的空间域滤波器是函数 sin π x /π x = sinc π x 。
However, the sinc filter is not generally used in practice, either for sampling or for reconstruction, because it is impractical and because, even though it is optimal according to the frequency domain criteria, it doesn’t produce the best results for many applications.
然而,sinc 滤波器在实践中通常不用于采样或重建,因为它不切实际,并且即使根据频域标准它是最佳的,但对于许多应用来说它并不能产生最佳结果。
For sampling, the infinite extent of the sinc filter, and its relatively slow rate of decrease with distance from the center, is a liability. Also, for some kinds of sampling, the negative lobes are problematic. A Gaussian filter makes an excellent sampling filter even for difficult cases where high-frequency patterns must be removed from the input signal, because its Fourier transform falls off exponentially, with no bumps that tend to let aliases leak through. For less difficult cases, a tent filter generally suffices.
对于采样,sinc 滤波器的无限范围及其随着远离中心而相对较慢的下降率是一个缺点。此外,对于某些类型的采样,负叶是个问题。高斯滤波器即使在必须从输入信号中去除高频模式的困难情况下也可以成为出色的采样滤波器,因为它的傅立叶变换呈指数下降,没有容易让混叠泄漏的凸起。对于不太困难的情况,帐篷滤波器通常就足够了。
For reconstruction, the size of the sinc function again creates problems, but even more importantly, the many ripples create “ringing” artifacts in reconstructed signals.
对于重建,sinc 函数的大小再次产生了问题,但更重要的是,许多波纹会在重建信号中产生“振铃”伪影。
1. Show that discrete convolution is commutative and associative. Do the same for continuous convolution.
1.证明离散卷积是交换律和结合律。对连续卷积也做同样的证明。
2. Discrete-continuous convolution can’t be commutative, because its arguments have two different types. Show that it is associative, though.
2.离散-连续卷积不能交换,因为其参数有两种不同类型。但请证明它是结合律。
3. Prove that the B-spline is the convolution of four box functions.
3.证明B样条是四个盒函数的卷积。
4. Show that the “flipped” definition of convolution is necessary by trying to show that convolution is commutative and associative using this (incorrect) definition (see the footnote on page 214):
4.尝试用这个(错误的)定义证明卷积是可交换的、可结合的,从而证明卷积的“翻转”定义是必要的(参见第 214 页的脚注):
5. Prove that and .
5.证明ℱ { f *克} = f ^克^和f ^ *克^ = ℱ { f克} 。
6. Equation 10.4 can be interpreted as the convolution of a with a filter . Write a mathematical expression for the “de-rippled” filter . Plot the filter that results from de-rippling the box, tent, and B-spline filters scaled to s = 1.25.
6.公式 10.4 可以解释为与滤波器的卷积f ¯ . 写出“去波纹”滤波器的数学表达式f ¯ . 绘制由箱式、帐篷式和 B 样条滤波器去波纹得到的滤波器,缩放至s = 1.25。
When trying to replicate the look of the real world, one quickly realizes that hardly any surfaces are featureless. Wood grows with grain; skin grows with wrinkles; cloth shows its woven structure; and paint shows the marks of the brush or roller that laid it down. Even smooth plastic is made with bumps molded into it, and smooth metal shows the marks of the machining process that made it. Materials that were once featureless quickly become covered with marks, dents, stains, scratches, fingerprints, and dirt.
当试图复制现实世界的外观时,人们很快就会意识到几乎没有任何表面是毫无特征的。木材随着纹理生长;皮肤随着皱纹生长;布料显示出其编织结构;油漆显示出刷子或滚筒涂上的痕迹。即使是光滑的塑料也是由模压而成的凸起,光滑的金属显示出制造它的加工过程的痕迹。曾经毫无特征的材料很快就会布满痕迹、凹痕、污渍、划痕、指纹和污垢。
In computer graphics, we lump all these phenomena under the heading of “spatially varying surface properties”—attributes of surfaces that vary from place to place but don’t really change the shape of the surface in a meaningful way. To allow for these effects, all kinds of modeling and rendering systems provide some means for texture mapping: using an image, called a texture map, texture image, or just a texture, to store the details that you want to go on a surface and then mathematically “mapping” the image onto the surface.
在计算机图形学中,我们将所有这些现象归结为“空间变化的表面属性”——表面的属性随位置而变化,但实际上不会以有意义的方式改变表面的形状。为了实现这些效果,各种建模和渲染系统都提供了一些纹理映射方法:使用图像(称为纹理图、纹理图像或纹理)来存储您想要在表面上显示的细节,然后以数学方式将图像“映射”到表面上。
This is mapping in the sense of Section 2.1.
这就是2.1节意义上的映射。
As it turns out, once the mechanism to map images onto surfaces exists, there are many less obvious ways it can be used that go beyond the basic purpose of introducing surface detail. Textures can be used to make shadows and reflections, to provide illumination, even to define surface shape. In sophisticated interactive programs, textures are used to store all kinds of data that doesn’t even have anything to do with pictures!
事实证明,一旦将图像映射到表面上的机制存在,它就可以用在很多不太明显的地方,超出引入表面细节的基本目的。纹理可用于制作阴影和反射、提供照明,甚至定义表面形状。在复杂的交互式程序中,纹理用于存储各种与图片无关的数据!
This chapter discusses the use of textures for representing surface detail, shadows, and reflections. While the basic ideas are simple, several practical problems complicate the use of textures. First of all, textures easily become distorted, and designing the functions that map textures onto surfaces is challenging. Also, texture mapping is a resampling process, just like rescaling an image, and as we saw in Chapter 10, resampling can very easily introduce aliasing artifacts. The use of texture mapping and animation together readily produces truly dramatic aliasing, and much of the complexity of texture mapping systems is created by the antialiasing measures that are used to tame these artifacts.
本章讨论了如何使用纹理来表示表面细节、阴影和反射。虽然基本思想很简单,但一些实际问题使纹理的使用变得复杂。首先,纹理很容易变形,设计将纹理映射到表面上的函数具有挑战性。此外,纹理映射是一个重新采样过程,就像重新缩放图像一样,正如我们在第 10 章中看到的那样,重新采样很容易引入混叠伪影。纹理映射和动画一起使用很容易产生真正戏剧性的混叠,纹理映射系统的大部分复杂性是由可以使用抗锯齿措施来消除这些瑕疵。
To start off, let’s consider a simple application of texture mapping. We have a scene with a wood floor, and we would like the diffuse color of the floor to be controlled by an image showing floorboards with wood grain. Regardless of whether we are using ray tracing or rasterization, the shading code that computes the color for a ray–surface intersection point or for a fragment generated by the rasterizer needs to know the color of the texture at the shading point, in order to use it as the diffuse color in the Lambertian shading model from Chapter 5.
首先,让我们考虑一下纹理映射的一个简单应用。我们有一个木地板的场景,我们希望地板的漫反射颜色由显示带有木纹的地板的图像控制。无论我们使用光线追踪还是光栅化,计算光线与表面交点颜色或光栅化器生成的片段颜色的着色代码都需要知道着色点处纹理的颜色,以便将其用作第 5 章中 Lambertian 着色模型中的漫反射颜色。
To get this color, the shader performs a texture lookup: it figures out the location, in the coordinate system of the texture image, that corresponds to the shading point, and it reads out the color at that point in the image, resulting in the texture sample. That color is then used in shading, and since the texture lookup happens at a different place in the texture for every pixel that sees the floor, a pattern of different colors shows up in the image. The code might look like this:
为了获得这种颜色,着色器执行纹理查找:它在纹理图像的坐标系中找出与着色点相对应的位置,并读出图像中该点的颜色,从而得到纹理样本。然后该颜色用于着色,并且由于纹理查找发生在纹理中每个看到地板的像素的不同位置,因此图像中会显示不同颜色的图案。代码可能如下所示:
Color texture_lookup(Texture t, float u, float v) { int i = round(u ⋆ t.width() - 0.5) int j = round(v ⋆ t.height() - 0.5) return t.get_pixel(i,j) } Color shade_surface_point(Surface s, Point p, Texture t) { Vector normal = s.get_normal(p) (u,v) = s.get_texcoord(p) Color diffuse_color = texture_lookup(u,v) // compute shading using diffuse_color and normal // return shading result }
In this code, the shader asks the surface where to look in the texture, and somehow every surface that we want to shade using a texture needs to be able to answer this query. This brings us to the first key ingredient of texture mapping: we need a function that maps from the surface to the texture that we can easily compute for every pixel. This is the texture coordinate function (Figure 11.1), and we say that it assigns texture coordinates to every point on the surface. Mathematically, it is a mapping from the surface S to the domain of the texture, T:
在此代码中,着色器询问表面在纹理中查看的位置,并且我们想要使用纹理着色的每个表面都需要能够以某种方式回答此查询。这给我们带来了纹理映射的第一个关键要素:我们需要一个从表面映射到纹理的函数,我们可以轻松地为每个像素计算该函数。这是纹理坐标函数(图 11.1 ),我们称它为表面上的每个点分配纹理坐标。从数学上讲,它是从表面S到纹理域T的映射:
Figure 11.1. Just like the viewing projection π maps every point on an object’s surface, S, to a point in the image, the texture coordinate function ϕ maps every point on the object’s surface to a point in the texture map, T. Appropriately defining this function ϕ is fundamental to all applications of texture mapping.
图 11.1。就像观察投影 π 将物体表面的每个点S映射到图像中的某个点一样,纹理坐标函数φ将物体表面的每个点映射到纹理图T中的某个点。适当地定义此函数φ是纹理映射所有应用的基础。
The set T, often called “texture space,” is usually just a rectangle that contains the image; it is common to use the unit square (u, v) ∈ [0,1]2 (in this book, we’ll use the names u and v for the two texture coordinates). In many ways, it’s similar to the viewing projection discussed in Chapter 8, called π in this chapter, which maps points on surfaces in the scene to points in the image; both are 3D-to-2D mappings, and both are needed for rendering—one to know where to get the texture value from, and one to know where to put the shading result in the image. But there are some important differences, too: π is almost always a perspective or orthographic projection, whereas ϕ can take on many forms; and there is only one viewing projection for an image, whereas each object in the scene is likely to have a completely separate texture coordinate function.
集合T ,通常称为“纹理空间”,通常只是包含图像的一个矩形;通常使用单位正方形 ( u, v ) ∈ [0,1] 2 (在本书中,我们将使用u和v表示两个纹理坐标)。在许多方面,它类似于第 8 章讨论的观看投影,在本章中称为 π ,它将场景中表面上的点映射到图像中的点;两者都是 3D 到 2D 的映射,并且两者都是渲染所必需的——一个知道从哪里获取纹理值,另一个知道将着色结果放在图像中的什么位置。但也存在一些重要的区别:π 几乎总是透视或正交投影,而ϕ可以采用多种形式;并且一幅图像只有一个观看投影,而场景中的每个物体可能都有一个完全独立的纹理坐标函数。
It may seem surprising that ϕ is a mapping from the surface to the texture, when our goal is to put the texture onto the surface, but this is the function we need.
当我们的目标是将纹理放到表面上时, φ是从表面到纹理的映射,这似乎令人惊讶,但这是我们需要的函数。
So … the first thing you have to learn is how to think backwards?
那么...你要学会的第一件事就是如何逆向思考?
For the case of the wood floor, if the floor happens to be at constant z and aligned to the x and y axes, we could just use the mapping
对于木地板的情况,如果地板恰好位于恒定的z轴上并与x 轴和y轴对齐,我们可以使用映射
for some suitably chosen scale factors a and b, to assign texture coordinates (u,v) to the point (x,y,z)floor, and then use the value of the texture pixel, or texel, closest to (u,v) as the texture value at (x,y). In this way we rendered the image in Figure 11.2.
对于某些适当选择的比例因子a和b ,将纹理坐标 ( u,v ) 分配给点 ( x,y,z ) floor ,然后使用最接近 ( u,v ) 的纹理像素或纹素的值作为 ( x,y ) 处的纹理值。通过这种方式,我们渲染了图 11.2中的图像。
Figure 11.2. A wood floor, textured using a texture coordinate function that simply uses the x and y coordinates of points directly.
图 11.2.木地板,使用纹理坐标函数进行纹理处理,该函数直接使用点的x和y坐标。
This is pretty limiting, though: what if the room is modeled at an angle to the x and y axes, or what if we want the wood texture on the curved back of a chair? We will need some better way to compute texture coordinates for points on the surface.
但这非常有限:如果房间是与x轴和y轴成一定角度建模的,或者如果我们想要椅子弯曲的背面有木质纹理怎么办?我们需要一些更好的方法来计算表面上点的纹理坐标。
Another problem that arises from the simplest form of texture mapping is illustrated dramatically by rendering at a high contrast texture from a very grazing angle into a low-resolution image. Figure 11.3 shows a larger plane textured using the same approach but with a high contrast grid pattern and a view toward the horizon. You can see it contains aliasing artifacts (stairsteps in the foreground, wavy and glittery patterns in the distance) similar to the ones that arise in image resampling (Chapter 10) when appropriate filters are not used. Although it takes an extreme case to make these artifacts so obvious in a tiny still image printed in a book, in animations these patterns move around and are very distracting even when they are much more subtle.
最简单的纹理映射形式还会产生另一个问题,这一点可以通过将高对比度的纹理从非常倾斜的角度渲染到低分辨率图像中来明显地体现出来。图 11.3显示了使用相同方法进行纹理化的较大平面,但其具有高对比度的网格图案并且视图朝向地平线。您可以看到它包含混叠伪影(前景中的阶梯状、远处的波浪和闪光图案),类似于在未使用适当的过滤器的情况下在图像重采样(第 10 章)中出现的伪影。虽然只有在极端情况下才能使这些伪影在书中印刷的微小静止图像中如此明显,但在动画中,这些图案会移动,即使它们更加微妙,也会非常分散注意力。
Figure 11.3. A large horizontal plane, textured in the same way as in Figure 11.2 and displaying severe aliasing artifacts.
图 11.3.一个大的水平面,其纹理与图 11.2相同,并显示出严重的混叠伪影。
We have now seen the two primary issues in basic texture mapping:
现在我们已经看到了基本纹理映射中的两个主要问题:
Defining texture coordinate functions, and
定义纹理坐标函数,以及
Looking up texture values without introducing too much aliasing.
查找纹理值而不引入太多混叠。
These two concerns are fundamental to all kinds of applications of texture mapping and are discussed in Sections 11.2 and 11.3. Once you understand them and some of the solutions to them, you understand texture mapping. The rest is just how to apply the basic texturing machinery for a variety of different purposes, which is discussed in Section 11.4.
这两个问题是所有纹理映射应用的基础,将在11.2和11.3节中讨论。一旦您理解了它们以及它们的一些解决方案,您就理解了纹理映射。剩下的只是如何将基本的纹理机制应用于各种不同的目的,这将在11.4 节中讨论。
Designing the texture coordinate function ϕ well is a key requirement for getting good results with texture mapping. You can think of this as deciding how you are going to deform a flat, rectangular image so that it conforms to the 3D surface you want to draw. Or alternatively, you are taking the surface and gently flattening it, without letting it wrinkle, tear, or fold, so that it lies flat on the image. Sometimes, this is easy: maybe the 3D surface is already a flat rectangle! In other cases, it’s very tricky: the 3D shape might be very complicated, like the surface of a character’s body.
设计好纹理坐标函数 ϕ 是获得良好纹理映射效果的关键要求。您可以将其视为决定如何变形平面矩形图像,使其符合要绘制的 3D 表面。或者,您要轻轻地将表面压平,不要让其起皱、撕裂或折叠,以便它平放在图像上。有时,这很容易:也许 3D 表面已经是一个平面矩形!在其他情况下,这非常棘手:3D 形状可能非常复杂,例如角色身体的表面。
The problem of defining texture coordinate functions is not new to computer graphics. Exactly, the same problem is faced by cartographers when designing maps that cover large areas of the Earth’s surface: the mapping from the curved globe to the flat map inevitably causes distortion of areas, angles, and/or distances that can easily make maps very misleading. Many map projections have been proposed over the centuries, all balancing the same competing concerns—of minimizing various kinds of distortion while covering a large area in one contiguous piece—that are faced in texture mapping.
定义纹理坐标函数的问题对计算机图形学来说并不陌生。确切地说,制图师在设计覆盖地球表面大面积区域的地图时也面临同样的问题:从曲面地球到平面地图的映射不可避免地会导致区域、角度和/或距离的扭曲,这很容易使地图非常具有误导性。几个世纪以来,人们提出了许多地图投影,它们都在平衡纹理映射中面临的相同相互竞争的问题——在一块连续的区域中覆盖大面积区域的同时,尽量减少各种扭曲。
In some applications (some examples are in Section 11.2.1), there’s a clear reason to use a particular map. But in most cases, designing the texture coordinate map is a delicate task of balancing competing concerns, which skilled modelers put considerable effort into.
在某些应用中(第 11.2.1 节中有一些示例),使用特定映射是有明确理由的。但在大多数情况下,设计纹理坐标映射是一项平衡相互冲突的关注点的微妙任务,熟练的建模者需要付出相当大的努力。
“UV mapping” or “surface parameterization” are other names you may encounter for the texture coordinate function.
您可能会遇到的纹理坐标函数的其他名称是“UV 映射”或“表面参数化”。
You can define ϕ in just about any way you can dream up. But there are several competing goals to consider:
你可以用任何你能想到的方式来定义 ϕ。但有几个相互竞争的目标需要考虑:
Bijectivity. In most cases, you’d like ϕ to be bijective (see Section 2.1.1), so that each point on the surface maps to a different point in texture space. If several points map to the same texture space point, the value at one point in the texture will affect several points on the surface. In cases where you want a texture to repeat over a surface (think of wallpaper or carpet with their repeating patterns), it makes sense to deliberately introduce a many-to-one mapping from surface points to texture points, but you don’t want this to happen by accident.
双射性。在大多数情况下,您希望ϕ是双射的(参见第 2.1.1 节),以便表面上的每个点都映射到纹理空间中的不同点。如果多个点映射到同一个纹理空间点,则纹理中一个点的值将影响表面上的多个点。如果您希望纹理在表面上重复出现(想想具有重复图案的墙纸或地毯),那么故意引入从表面点到纹理点的多对一映射是有意义的,但您不希望这种情况意外发生。
Size distortion. The scale of the texture should be approximately constant across the surface. That is, close-together points anywhere on the surface that are about the same distance apart should map to points about the same distance apart in the texture. In terms of the function ϕ, the magnitude of the derivatives of ϕ should not vary too much.
尺寸失真。纹理的比例在整个表面上应大致恒定。也就是说,表面上任何位置相距大致相同的点应映射到纹理中相距大致相同的点。就函数φ而言, φ的导数的幅度不应变化太大。
Shape distortion. The texture should not be very distorted. That is, a small circle drawn on the surface should map to a reasonably circular shape in texture space, rather than an extremely squashed or elongated shape. In terms of ϕ, the derivative of ϕ should not be too different in different directions.
形状扭曲。纹理不应扭曲太多。也就是说,在表面上绘制的小圆圈应映射到纹理空间中合理的圆形,而不是极度挤压或拉长的形状。就φ而言, φ的导数在不同方向上不应有太大差异。
Continuity. There should not be too many seams: neighboring points on the surface should map to neighboring points in the texture. That is, ϕ should be continuous or have as few discontinuities as possible. In most cases, some discontinuities are inevitable, and we’d like to put them in inconspicuous locations.
连续性。接缝不应太多:表面上的相邻点应映射到纹理中的相邻点。也就是说, φ应该是连续的,或者不连续性尽可能少。在大多数情况下,某些不连续性是不可避免的,我们希望将它们放在不显眼的位置。
Surfaces that are defined by parametric equations (Section 2.7.8) come with a built-in choice for the texture coordinate function: simply invert the function that defines the surface, and use the two parameters of the surface as texture coordinates. These texture coordinates may or may not have desirable properties, depending on the surface, but they do provide a mapping.
由参数方程定义的表面(第 2.7.8 节)带有内置的纹理坐标函数选择:只需反转定义表面的函数,并使用表面的两个参数作为纹理坐标。这些纹理坐标可能具有或不具有所需的属性,具体取决于表面,但它们确实提供了映射。
But for surfaces that are defined implicitly, or are just defined by a triangle mesh, we need some other way to define the texture coordinates, without relying on an existing parameterization. Broadly speaking, the two ways to define texture coordinates are to compute them geometrically, from the spatial coordinates of the surface point, or, for mesh surfaces, to store values of the texture coordinates at vertices and interpolate them across the surface. Let’s look at these options one at a time.
但是对于隐式定义或仅由三角网格定义的表面,我们需要其他方法来定义纹理坐标,而不依赖于现有的参数化。广义上讲,定义纹理坐标的两种方法是从表面点的空间坐标以几何方式计算它们,或者对于网格表面,将纹理坐标的值存储在顶点并在整个表面上进行插值。让我们一次看一下这些选项。
Geometrically determined texture coordinates are used for simple shapes or special situations, as a quick solution, or as a starting point for designing a hand-tweaked texture coordinate map.
几何确定的纹理坐标用于简单形状或特殊情况,作为快速解决方案,或作为设计手动调整纹理坐标图的起点。
We will illustrate the various texture coordinate functions by mapping the test image in Figure 11.4 onto the surface. The numbers in the image let you read the approximate (u,v) coordinates out of the rendered image, and the grid lets you see how distorted the mapping is.
我们将通过将图 11.4中的测试图像映射到表面上来说明各种纹理坐标函数。图像中的数字可让您从渲染图像中读取近似的 ( u,v ) 坐标,网格可让您看到映射的扭曲程度。
Figure 11.4. Test image.
图 11.4.测试图像。
Probably, the simplest mapping from 3D to 2D is a parallel projection—the same mapping as used for orthographic viewing (Figure 11.5). The machinery we developed already for viewing (Section 8.1) can be reused directly for defining texture coordinates: just as orthographic viewing boils down to multiplying by a matrix and discarding the z component, generating texture coordinates by planar projection can be done with a simple matrix multiply:
可能最简单的从 3D 到 2D 的映射就是平行投影——与正交视图使用的映射相同(图 11.5 )。我们已经为视图开发的机制(第 8.1 节)可以直接用于定义纹理坐标:就像正交视图归结为乘以矩阵并丢弃 z 分量一样,通过平面投影生成纹理坐标可以通过简单的矩阵乘法来完成:
Figure 11.5. Planar projection makes a useful parameterization for objects or parts of objects that are nearly flat to start with, if the projection direction is chosen roughly along the overall normal.
图 11.5.如果投影方向大致沿着整体法线选择,平面投影可以对几乎平坦的物体或物体的一部分进行有用的参数化。
where the texturing matrix Mt represents an affine transformation, and the asterisk indicates that we don’t care what ends up in the third coordinate.
其中纹理矩阵M t表示仿射变换,星号表示我们不关心最终的第三个坐标是什么。
This works quite well for surfaces that are mostly flat, without too much variation in surface normal, and a good projection direction can be found by taking the average normal. For any kind of closed shape, though, a planar projection will not be injective: points on the front and back will map to the same point in texture space (Figure 11.6).
这种方法对于表面法线变化不大、基本平坦的表面非常有效,而且可以通过取平均法线找到一个好的投影方向。但是,对于任何类型的封闭形状,平面投影都不是单射的:正面和背面的点将映射到纹理空间中的同一点(图 11.6 )。
Figure 11.6. Using planar projection on a closed object will always result in a non-injective, one-to-many mapping, and extreme distortion near points where the projection direction is tangent to the surface.
图 11.6.在封闭物体上使用平面投影总是会导致非单射、一对多映射,以及投影方向与表面相切的点附近的极端扭曲。
By simply substituting perspective projection for orthographic, we get projective texture coordinates (Figure 11.7):
通过简单地用透视投影代替正交投影,我们得到投影纹理坐标(图 11.7 ):
Figure 11.7. A projective texture transformation uses a viewing-like transformation that projects toward a point.
图 11.7。投影纹理变换使用类似于观看的变换,向某个点进行投影。
Now the 4×4 matrix Pt represents a projective (not necessarily affine) transformation—that is, the last row may not be [0,0,0,1].
现在,4×4 矩阵P t表示射影(不一定是仿射)变换,也就是说,最后一行可能不是 [0,0,0,1]。
Projective texture coordinates are important in the technique of shadow mapping, discussed in Section 11.4.4.
投影纹理坐标在阴影映射技术中非常重要,第 11.4.4 节将对此进行讨论。
For spheres, the latitude/longitude parameterization is familiar and widely used. It has a lot of distortion near the poles, which can lead to difficulties, but it does cover the whole sphere with discontinuities only along one line of latitude.
对于球体,纬度/经度参数化很常见且应用广泛。它在极点附近有很多扭曲,这可能会导致困难,但它确实覆盖了整个球体,只有一条纬线不连续。
Surfaces that are roughly spherical in shape can be parameterized using a texture coordinate function that maps a point on the surface to a point on a sphere using radial projection: take a line from the center of the sphere through the point on the surface, and find the intersection with the sphere. The spherical coordinates of this intersection point are the texture coordinates of the point you started with on the surface.
形状大致为球形的表面可以使用纹理坐标函数进行参数化,该函数使用径向投影将表面上的点映射到球体上的点:从球体中心通过表面上的点画一条线,并找到与球体的交点。此交点的球面坐标就是表面上起始点的纹理坐标。
Another way to say this is that you express the surface point in spherical coordinates (ρ,θ,ϕ) and then discard the ρ coordinate and map θ and ϕ each to the range [0,1]. The formula depends on the spherical coordinates convention; using the convention of Section 2.7.8,
另一种说法是,用球面坐标 ( ρ , θ , ϕ ) 表示表面点,然后丢弃ρ坐标,将θ和ϕ分别映射到 [0,1] 范围内。该公式取决于球面坐标约定;使用第 2.7.8 节的约定,
This and other texture coordinate functions in this chapter for objects that are in the box [-1, 1]3 and centered at the origin.
本章中的该函数和其他纹理坐标函数适用于位于范围 [-1, 1] 3内并以原点为中心的对象。
A spherical coordinates map will be bijective everywhere except at the poles if the whole surface is visible from the center point. It inherits the same distortion near the poles as the latitude–longitude map on the sphere. Figure 11.8 shows an object for which spherical coordinates provide a suitable texture coordinate function.
如果整个表面从中心点可见,则球面坐标图在除极点之外的任何地方都是双射的。它继承了极点附近的相同扭曲,就像球面上的纬度-经度图一样。图 11.8显示了球面坐标为其提供合适纹理坐标函数的对象。
Figure 11.8. For this vaguely sphere-like object, projecting each point onto a sphere centered at the center of the object provides an injective mapping, which here is used to place the same map texture as was used for the globe images. Note that areas become magnified (surface points are crowded together in texture space) where the surface is far from the center, and areas shrink where the surface is closer to the center.
图 11.8。对于这个模糊的球形物体,将每个点投影到以物体中心为中心的球体上可形成一个单射映射,这里使用该映射放置与地球图像相同的地图纹理。请注意,当表面远离中心时,区域会放大(表面点在纹理空间中挤在一起),而当表面靠近中心时,区域会缩小。
For objects that are more columnar than spherical, projection outward from an axis onto a cylinder may work better than projection from a point onto a sphere (Figure 11.9). Analogously to spherical projection, this amounts to converting to cylindrical coordinates and discarding the radius:
对于柱状而非球形的物体,从轴向外投影到圆柱体可能比从点投影到球体效果更好(图 11.9 )。与球面投影类似,这相当于转换为圆柱坐标并丢弃半径:
Figure 11.9. A far-from-spherical vase for which spherical projection produces a lot of distortion (left) and cylindrical projection produces a very good result on the outer surface.
图 11.9.远非球形的花瓶,球面投影会产生很大的扭曲(左),而圆柱投影在外表面会产生非常好的效果。
Using spherical coordinates to parameterize a spherical or sphere-like shape leads to high distortion of shape and area near the poles, which often leads to visible artifacts that reveal that there are two special points where something is going wrong with the texture. A popular alternative is much more uniform at the cost of having more discontinuities. The idea is to project onto a cube, rather than a sphere, and then use six separate square textures for the six faces of the cube. The collection of six square textures is called a cubemap. This introduces discontinuities along all the cube edges, but it keeps distortion of shape and area low.
使用球面坐标来参数化球形或类球形会导致极点附近的形状和面积严重失真,这通常会导致可见的瑕疵,表明有两个特殊点的纹理出了问题。一种流行的替代方法是更加统一,但代价是有更多的不连续性。这个想法是投影到立方体而不是球体上,然后对立方体的六个面使用六个独立的方形纹理。六个方形纹理的集合称为立方体贴图。这会在所有立方体边缘引入不连续性,但它可以保持形状和面积的低失真。
Computing cubemap texture coordinates is also cheaper than for spherical coordinates, because projecting onto a plane just requires a division—essentially the same as perspective projection for viewing. For instance, for a point that projects onto the + z face of the cube:
计算立方体贴图纹理坐标也比计算球面坐标便宜,因为投影到平面上只需要除法——本质上与透视投影相同。例如,对于投影到立方体 + z面上的点:
A confusing aspect of cubemaps is establishing the convention for how the u and v directions are defined on the six faces. Any convention is fine, but the convention chosen affects the contents of textures, so standardization is important. Because cubemaps are very often used for textures that are viewed from the inside of the cube (see environment mapping in Section 11.4.5), the usual conventions have the u and v axes oriented so that u is clockwise from v as viewed from inside. The convention used by OpenGL is
立方体贴图的一个令人困惑的方面是建立在六个面上如何定义 u 和 v 方向的约定。任何约定都可以,但所选的约定会影响纹理的内容,因此标准化很重要。由于立方体贴图通常用于从立方体内部查看的纹理(请参阅第 11.4.5 节中的环境映射),因此通常的约定是将 u 和 v 轴定向为从内部查看时 u 相对于 v 顺时针方向。OpenGL 使用的约定是
The subscripts indicate which face of the cube each projection corresponds to. For example, ϕ−x is used for points that project to the face of the cube at x = +1. You can tell which face a point projects to by looking at the coordinate with the largest absolute value: for example, if |x| > |y| and |x| > |z|, the point projects to the +x or −x face, depending on the sign of x.
下标表示每个投影对应立方体的哪个面。例如, φ −x用于表示投影到立方体x = +1 处的面的点。您可以通过查看绝对值最大的坐标来判断点投影到哪个面:例如,如果| x |>| y | 且 | x |>| z |,则该点投影到 + x或−x面,具体取决于x的符号。
A texture to be used with a cube map has six square pieces. (See Figure 11.10.) Often they are packed together in a single image for storage, arranged as if the cube was unwrapped.
与立方体贴图一起使用的纹理有六个方形部分。(见图11.10 。)通常,它们被打包在一起存储在单个图像中,排列方式就像立方体被展开一样。
Figure 11.10. A surface being projected into a cubemap. Points on the surface project outward from the center, each mapping to a point on one of the six faces.
图 11.10。投影到立方体贴图中的表面。表面上的点从中心向外投影,每个点都映射到六个面之一上的一个点。
For more fine-grained control over the texture coordinate function on a triangle mesh surface, you can explicitly store the texture coordinates at each vertex, and interpolate them across the triangles using barycentric interpolation (Section 9.1.2). It works in exactly the same way as any other smoothly varying quantities you might define over a mesh: colors, normals, even the 3D position itself.
为了对三角形网格表面上的纹理坐标函数进行更细粒度的控制,您可以明确存储每个顶点的纹理坐标,并使用重心插值在三角形之间进行插值(第 9.1.2 节)。它的工作方式与您在网格上定义的任何其他平滑变化的量完全相同:颜色、法线,甚至 3D 位置本身。
The idea of interpolated texture coordinates is very simple—but it can be a bit confusing at first.
插值纹理坐标的概念非常简单——但一开始可能会有点令人困惑。
Let’s look at an example with a single triangle. Figure 11.11 shows a triangle texture mapped with part of the by now familiar test pattern. By looking at the pattern that appears on the rendered triangle, you can deduce that the texture coordinates of the three vertices are (0.2, 0.2), (0.8, 0.2), and (0.2, 0.8), because those are the points in the texture that appear at the three corners of the triangle. Just as with the geometrically determined mappings in the previous section, we control where the texture goes on the surface by giving the mapping from the surface to the texture domain, in this case by specifying where each vertex should go in texture space. Once you position the vertices, linear (barycentric) interpolation across triangles takes care of the rest.
让我们看一个只有一个三角形的例子。图 11.11显示了使用现在熟悉的测试图案的一部分映射的三角形纹理。通过查看渲染三角形上出现的图案,您可以推断出三个顶点的纹理坐标为 (0.2, 0.2)、(0.8, 0.2) 和 (0.2, 0.8),因为这些是纹理中出现在三角形三个角上的点。就像上一节中几何确定的映射一样,我们通过提供从表面到纹理域的映射来控制纹理在表面上的位置,在本例中通过指定每个顶点在纹理空间中的位置。一旦您定位了顶点,三角形之间的线性(重心)插值将处理其余部分。
Figure 11.11. A single triangle using linearly interpolated texture coordinates. (a) The triangle drawn in texture space; (b) the triangle rendered in a 3D scene.
图 11.11。使用线性插值纹理坐标的单个三角形。(a)在纹理空间中绘制的三角形;(b)在 3D 场景中渲染的三角形。
In Figure 11.12, we show a common way to visualize texture coordinates on a whole mesh: simply draw triangles in texture space with the vertices positioned at their texture coordinates. This visualization shows you what parts of the texture are being used by which triangles, and it is a handy tool for evaluating texture coordinates and for debugging all sorts of texture-mapping code.
在图 11.12中,我们展示了一种可视化整个网格上的纹理坐标的常用方法:只需在纹理空间中绘制三角形,并将顶点定位在其纹理坐标上。此可视化显示了哪些三角形使用了纹理的哪些部分,它是评估纹理坐标和调试各种纹理映射代码的便捷工具。
Figure 11.12. An icosahedron with its triangles laid out in texture space to provide zero distortion but with many seams.
图 11.12.二十面体及其三角形在纹理空间中的布局可实现零失真,但存在许多接缝。
The quality of a texture coordinate mapping that is defined by vertex texture coordinates depends on what coordinates are assigned to the vertices—that is, how the mesh is laid out in texture space. No matter what coordinates are assigned, as long as the triangles in the mesh share vertices (Section 12.1), the texture coordinate mapping is always continuous, because neighboring triangles will agree on the texture coordinate at points on their shared edge. But the other desirable qualities described above are not so automatic. Injectivity means the triangles don’t overlap in texture space—if they do, it means there’s some point in the texture that will show up at more than one place on the surface.
由顶点纹理坐标定义的纹理坐标映射的质量取决于分配给顶点的坐标 - 即网格在纹理空间中的布局方式。无论分配了什么坐标,只要网格中的三角形共享顶点(第 12.1 节),纹理坐标映射始终是连续的,因为相邻三角形将在其共享边缘上的点上就纹理坐标达成一致。但上面描述的其他理想品质并不是那么自动的。单射性意味着三角形在纹理空间中不重叠 - 如果它们重叠,则意味着纹理中的某个点将出现在表面上的多个位置。
Size distortion is low when the areas of triangles in texture space are in proportion to their areas in 3D. For instance, if a character’s face is mapped with a continuous texture coordinate function, one often ends up with the nose squeezed into a relatively small area in texture space, as shown in Figure 11.13. Although triangles on the nose are smaller than on the cheek, the ratio of sizes is more extreme in texture space. The result is that the texture is enlarged on the nose, because a small area of texture has to cover a large area of surface. Similarly, comparing the forehead to the temple, the triangles are similar in size in 3D, but the triangles around the temple are larger in texture space, causing the texture to appear smaller there.
当纹理空间中三角形的面积与其在 3D 中的面积成比例时,尺寸失真较小。例如,如果使用连续纹理坐标函数映射角色的脸部,则通常会将鼻子挤进纹理空间中相对较小的区域中,如图 11.13所示。尽管鼻子上的三角形比脸颊上的小,但纹理空间中的尺寸比更为极端。结果是纹理在鼻子上被放大,因为小面积的纹理必须覆盖大面积的表面。类似地,将前额与太阳穴进行比较,三角形在 3D 中的大小相似,但太阳穴周围的三角形在纹理空间中较大,导致那里的纹理看起来较小。
Figure 11.13. A face model, with texture coordinates assigned so as to achieve reasonably low shape distortion, but still showing moderate area distortion.
图 11.13.脸部模型,其纹理坐标已分配,从而实现了相当低的形状失真,但仍然显示出中等的区域失真。
Similarly, shape distortion is low when the shapes of triangles are similar in 3D and in texture space. The face example has fairly low shape distortion, but, for example, the sphere in Figure 11.15 has very large shape distortion near the poles.
类似地,当三角形的形状在 3D 和纹理空间中相似时,形状失真也很小。脸部示例的形状失真相当小,但是,例如,图 11.15中的球体在极点附近具有非常大的形状失真。
It’s often useful to allow texture coordinates to go outside the bounds of the texture image. Sometimes, this is a detail: rounding error in a texture coordinate calculation might cause a vertex that lands exactly on the texture boundary to be slightly outside, and the texture mapping machinery should not fail in that case. But it can also be a modeling tool.
允许纹理坐标超出纹理图像的边界通常很有用。有时,这是一个细节:纹理坐标计算中的舍入误差可能会导致恰好位于纹理边界上的顶点略微超出范围,并且纹理映射机制在这种情况下不应失败。但它也可以是一种建模工具。
If a texture is only supposed to cover part of the surface, but texture coordinates are already set up to map the whole surface to the unit square, one option is to prepare a texture image that is mostly blank with the content in a small area. But that might require a very high resolution texture image to get enough detail in the relevant area. Another alternative is to scale up all the texture coordinates so that they cover a larger range—[−4.5,5.5] × [−4.5,5.5] for instance, to position the unit square at one-tenth size in the center of the surface.
如果纹理仅应覆盖部分表面,但已设置纹理坐标以将整个表面映射到单位正方形,则一种选择是准备一个大部分空白且内容位于一小块区域中的纹理图像。但这可能需要非常高分辨率的纹理图像才能在相关区域获得足够的细节。另一种选择是放大所有纹理坐标,使其覆盖更大的范围 — 例如 [−4.5,5.5] × [−4.5,5.5],将单位正方形定位在表面中心十分之一大小的位置。
For a case like this, texture lookups outside the unit-square area that’s covered by the texture image should return a constant background color. One way to do this is to set a background color to be returned by texture lookups outside the unit square. If the texture image already has a constant background color (for instance, a logo on a white background), another way to extend this background automatically over the plane is to arrange for lookups outside the unit square to return the color of the texture image at the closest point on the edge, achieved by clamping the u and v coordinates to the range from the first pixel to the last pixel in the image.
对于这种情况,纹理图像覆盖的单位正方形区域之外的纹理查找应返回恒定的背景颜色。实现此目的的一种方法是设置单位正方形之外的纹理查找返回的背景颜色。如果纹理图像已经具有恒定的背景颜色(例如,白色背景上的徽标),则另一种将此背景自动扩展到平面上的方法是安排单位正方形之外的查找返回边缘上最近点的纹理图像颜色,通过将u 和 v 坐标限制在图像中从第一个像素到最后一个像素的范围内来实现。
Sometimes, we want a repeating pattern, such as a checkerboard, a tile floor, or a brick wall. If the pattern repeats on a rectangular grid, it would be wasteful to create an image with many copies of the same data. Instead, we can handle texture lookups outside the texture image using wraparound indexing—when the lookup point exits the right edge of the texture image, it wraps around to the left edge. This is handled very simply using the integer remainder operation on the pixel coordinates.
有时,我们需要一个重复的图案,例如棋盘、瓷砖地板或砖墙。如果图案在矩形网格上重复,那么使用相同数据的许多副本创建图像会很浪费。相反,我们可以使用环绕索引来处理纹理图像外部的纹理查找 - 当查找点离开纹理图像的右边缘时,它会环绕到左边缘。使用像素坐标上的整数余数运算可以非常简单地处理此问题。
Color texture_lookup_wrap(Texture t, float u, float v) { int i = round(u ⋆ t.width() - 0.5) int j = round(v ⋆ t.height() - 0.5) return t.get_pixel(i % t.width(), j % t.height()) } Color texture_lookup_wrap(Texture t, float u, float v) { int i = round(u ⋆ t.width() - 0.5) int j = round(v ⋆ t.height() - 0.5) return t.get_pixel(max(0, min(i, t.width()-1)), (max(0, min(j, t.height()-1)))) }
The choice between these two ways of handling out-of-bounds lookups is specified by selecting a wrapping mode from a list that includes tiling, clamping, and often combinations or variants of the two. With wrapping modes, we can freely think of a texture as a function that returns a color for any point in the infinite 2D plane (Figure 11.14). When we specify a texture using an image, these modes describe how the finite image data are supposed to be used to define this function. In Section 11.5, we’ll see that procedural textures can naturally extend across an infinite plane, since they are not limited by finite image data. Since both are logically infinite in extent, the two types of textures are interchangeable.
处理越界查找的这两种方式之间的选择是通过从包括平铺、夹紧以及两者的组合或变体的列表中选择包装模式来指定的。使用包装模式,我们可以自由地将纹理视为一个函数,它返回无限二维平面中任意点的颜色(图 11.14 )。当我们使用图像指定纹理时,这些模式描述了有限图像数据应该如何使用来定义此函数。在第 11.5 节中,我们将看到程序纹理可以自然地延伸到无限平面,因为它们不受有限图像数据的限制。由于两者在逻辑上都是无限的,因此这两种类型的纹理可以互换。
When adjusting the scale and placement of textures, it’s convenient to avoid actually changing the functions that generate texture coordinates, or the texture coordinate values stored at vertices of meshes, by instead applying a matrix transformation to the texture coordinates before using them to sample the texture:
当调整纹理的比例和位置时,可以很方便地避免实际改变生成纹理坐标的函数或存储在网格顶点的纹理坐标值,而是在使用它们对纹理坐标进行采样之前对其进行矩阵变换:
Figure 11.14. A wood floor texture tiled over texture space by wrapping texel coordinates.
图 11.14.通过包裹纹素坐标,将木地板纹理平铺在纹理空间上。
where ϕmodel is the texture coordinate function provided with the model, and MT is a 3 by 3 matrix representing an affine or projective transformation of the 2D texture coordinates using homogeneous coordinates. Such a transformation, sometimes limited just to scaling and/or translation, is supported by most renderers that use texture mapping.
其中φ model是随模型提供的纹理坐标函数, M T是一个 3×3 矩阵,表示使用齐次坐标对 2D 纹理坐标进行仿射或投影变换。大多数使用纹理映射的渲染器都支持这种变换,有时仅限于缩放和/或平移。
Although low distortion and continuity are nice properties to have in a texture coordinate function, discontinuities are often unavoidable. For any closed 3D surface, it’s a basic result of topology that there is no continuous, bijective function that maps the whole surface into a texture image. Something has to give, and by introducing seams—curves on the surface where the texture coordinates change suddenly—we can have low distortion everywhere else. Many of the geometrically determined mappings discussed above already contain seams: in spherical and cylindrical coordinates, the seams are where the angle computed by atan2 wraps around from π to - π, and in the cubemap, the seams are along the cube edges, where the mapping switches between the six square textures.
虽然低失真和连续性是纹理坐标函数的良好特性,但不连续性往往是不可避免的。对于任何封闭的 3D 表面,拓扑的基本结果是没有连续的双射函数将整个表面映射到纹理图像中。必须有所作为,通过引入接缝(纹理坐标突然改变的表面曲线),我们可以在其他地方实现低失真。上面讨论的许多几何确定的映射已经包含接缝:在球面和圆柱坐标中,接缝是 atan2 计算的角度从 π 绕到 -π 的地方,而在立方体贴图中,接缝沿着立方体边缘,映射在六个方形纹理之间切换。
With interpolated texture coordinates, seams require special consideration, because they don’t happen naturally. We observed earlier that interpolated texture coordinates are automatically continuous on shared-vertex meshes—the sharing of texture coordinates guarantees it. But this means that if a triangle spans a seam, with some vertices on one side and some on the other, the interpolation machinery will cheerfully provide a continuous mapping, but it will likely be highly distorted or fold over so that it’s not injective. Figure 11.15 illustrates this problem on a globe mapped with spherical coordinates. For example, there is a triangle near the bottom of the globe that has one vertex at the tip of New Zealand’s South Island, and another vertex in the Pacific about 400 km northeast of the North Island. A sensible pilot flying between these points would fly over New Zealand, but the path starts at longitude 167° E (+167) and ends at 179° W (i.e., longitude −179), so linear interpolation chooses a route that crosses South America on the way. This causes a backward copy of the entire map to be compressed into the strip of triangles that crosses the 180th meridian! The solution is to label the second vertex with the equivalent longitude of 181° E, but this just pushes the problem to the next triangle.
对于插值纹理坐标,需要特别考虑接缝,因为它们不会自然发生。我们之前观察到,插值纹理坐标在共享顶点网格上自动连续 — 纹理坐标的共享保证了这一点。但这意味着,如果一个三角形跨越接缝,一些顶点在一侧,一些顶点在另一侧,插值机制将愉快地提供连续映射,但它可能会高度扭曲或折叠,因此它不是单射的。图 11.15在使用球面坐标映射的地球上说明了这个问题。例如,在地球底部附近有一个三角形,它的一个顶点位于新西兰南岛的顶端,另一个顶点位于北岛东北约 400 公里的太平洋。明智的飞行员在这两个点之间飞行时会飞越新西兰,但路径始于东经 167° (+167),止于西经 179° (即经度 -179),因此线性插值会选择一条途经南美洲的路线。这会导致整个地图的反向副本被压缩成穿过 180 度子午线的三角形带!解决方案是将第二个顶点标记为等效经度 181° E,但这只会将问题推到下一个三角形。
Figure 11.15. Polygonal globes: on the left, with all shared vertices, the texture coordinate function is continuous, but necessarily has problems with triangles that cross the 180th meridian, because texture coordinates are interpolated from longitudes near 180 to longitudes near −180. On the right, some vertices are duplicated, with identical 3D positions but texture coordinates differing by exactly 360° in longitude, so that texture coordinates are interpolated across the meridian rather than all the way across the map.
图 11.15。多边形地球:左侧所有顶点都共享,纹理坐标函数是连续的,但对于跨越 180 度子午线的三角形,纹理坐标函数必然存在问题,因为纹理坐标是从 180 度附近的经度到 -180 度附近的经度进行插值的。右侧的一些顶点是重复的,3D 位置相同,但纹理坐标在经度上相差 360°,因此纹理坐标是跨子午线进行插值的,而不是跨整个地图。
The only way to create a clean transition is to avoid sharing texture coordinates at the seam: the triangle crossing New Zealand needs to interpolate to longitude +181, and the next triangle in the Pacific needs to continue starting from to longitude −179. To do this, we duplicate the vertices at the seam: for each vertex, we add a second vertex with an equivalent longitude, differing by 360°, and the triangles on opposite sides of the seam use different vertices. This solution is shown in the right half of Figure 11.15, in which the vertices at the far left and right of the texture space are duplicates, with the same 3D positions.
实现干净过渡的唯一方法是避免在接缝处共享纹理坐标:横跨新西兰的三角形需要插值到经度 +181,而太平洋的下一个三角形需要从经度 −179 开始继续。为此,我们在接缝处复制顶点:对于每个顶点,我们添加第二个具有等效经度的顶点,相差 360°,并且接缝两侧的三角形使用不同的顶点。该解决方案如图 11.15的右半部分所示,其中纹理空间最左侧和最右侧的顶点是重复的,具有相同的 3D 位置。
Textures are used in all kinds of rendering systems, and although the fundamentals are the same, the details are different for ray tracing and rasterization systems.
纹理用于各种渲染系统,虽然基本原理相同,但光线追踪和光栅化系统的细节不同。
Texture coordinates are part of the model being rendered, and the scene description needs to include enough information to define what they are. Mostly, this means storing texture coordinates as per-vertex attributes of all triangle meshes that will be used with textures. If the rendering system directly supports geometric primitives other than meshes, these primitives usually have pre-defined texture coordinates (e.g., latitude–longitude coordinates on spheres), possibly with a choice of mapping schemes for each primitive type.
纹理坐标是正在渲染的模型的一部分,场景描述需要包含足够的信息来定义它们是什么。大多数情况下,这意味着将纹理坐标存储为将与纹理一起使用的所有三角形网格的每个顶点属性。如果渲染系统直接支持网格以外的几何图元,这些图元通常具有预定义的纹理坐标(例如球体上的纬度-经度坐标),可能为每种图元类型选择映射方案。
In a ray tracing renderer, each type of surface that supports ray intersection must be able to compute not just the intersection point and surface normal, but also the texture coordinates of the intersection point. Like the other information about the intersection, texture coordinates can be stored in a hit record (see Section 4.4.3). In the common case of geometry represented by triangle meshes, the ray–triangle intersection code will compute texture coordinates by barycentric interpolation from the texture coordinates stored at the vertices, and for other types of geometry, the intersection code must compute the texture coordinates directly.
在光线追踪渲染器中,支持光线相交的每种类型的表面不仅必须能够计算相交点和表面法线,还必须能够计算相交点的纹理坐标。与有关相交的其他信息一样,纹理坐标可以存储在命中记录中(参见第 4.4.3 节)。在由三角形网格表示的几何体的常见情况下,光线三角形相交代码将通过重心插值从存储在顶点的纹理坐标计算纹理坐标,对于其他类型的几何体,相交代码必须直接计算纹理坐标。
In a rasterization-based system, triangles will normally be the only supported type of geometry, so all surfaces must be converted to this form. Texture coordinates can be read in with the model (the common case), or for triangle meshes that are generated in code, they can be computed and stored at the time the mesh is created. Alternatively, for texture coordinates that can be computed from other vertex data (for instance, where texture coordinates are computed from the 3D position), texture coordinates can also be computed in a vertex shader and passed on to the rasterizer. Texture coordinates are then interpolated by the rasterizer, so that every invocation of the fragment shader has the appropriate texture coordinates for its fragment.
在基于光栅化的系统中,三角形通常是唯一受支持的几何体类型,因此所有表面都必须转换为这种形式。纹理坐标可以随模型一起读入(常见情况),或者对于在代码中生成的三角形网格,可以在创建网格时计算并存储纹理坐标。或者,对于可以从其他顶点数据计算出的纹理坐标(例如,从 3D 位置计算出纹理坐标),也可以在顶点着色器中计算纹理坐标并将其传递给光栅化器。然后,光栅化器对纹理坐标进行插值,以便每次调用片段着色器时都有适合其片段的纹理坐标。
The second fundamental problem of texture mapping is antialiasing. Rendering a texture mapped image is a sampling process: mapping the texture onto the surface and then projecting the surface into the image produce a 2D function across the image plane, and we are sampling it at pixels. As we saw in Chapter 10, doing this using point samples will produce aliasing artifacts when the image contains detail or sharp edges—and since the whole point of textures is to introduce detail, they become a prime source of aliasing problems like the ones we saw in Figure 11.3.
纹理映射的第二个基本问题是抗锯齿。渲染纹理映射图像是一个采样过程:将纹理映射到表面上,然后将表面投影到图像中,在图像平面上产生一个 2D 函数,然后我们以像素为单位对其进行采样。正如我们在第 10 章中看到的那样,使用点采样执行此操作会在图像包含细节或锐利边缘时产生锯齿伪影——而且由于纹理的全部目的是引入细节,因此它们成为锯齿问题的主要来源,就像我们在图 11.3中看到的那样。
It’s a good idea to review the first half of Chapter 10 now.
现在回顾一下第 10 章的前半部分是个好主意。
Just as with antialiased rasterization of lines or triangles (Section 9.3), antialiased ray tracing, or downsampling images (Section 10.4), the solution is to make each pixel not a point sample but an area average of the image, over an area similar in size to the pixel. Using the same supersampling approach used for antialiased rasterization and ray tracing, with enough samples, excellent results can be obtained with no changes to the texture mapping machinery: many samples within a pixel’s area will land at different places in the texture map, and averaging the shading results computed using the different texture lookups is an accurate way to approximate the average color of the image over the pixel. However, with detailed textures it takes very many samples to get good results, which is slow. Computing this area average efficiently in the presence of textures on the surface is the first key topic in texture antialiasing.
就像线条或三角形的抗锯齿光栅化(第 9.3 节)、抗锯齿光线追踪或图像下采样(第 10.4 节)一样,解决方案是使每个像素不再是一个点样本,而是图像的区域平均值,覆盖与像素大小相似的区域。使用与抗锯齿光栅化和光线追踪相同的过采样方法,如果有足够多的样本,则可以在不改变纹理映射机制的情况下获得出色的结果:像素区域内的许多样本将落在纹理贴图中的不同位置,并且对使用不同纹理查找计算出的着色结果求平均值是近似像素上图像平均颜色的准确方法。但是,对于详细的纹理,需要非常多的样本才能获得良好的结果,这会很慢。在表面存在纹理的情况下有效地计算这个面积平均值是纹理抗锯齿的第一个关键主题。
Texture images are usually defined by raster images, so there is also a reconstruction problem to be considered, just as with upsampling images (Section 10.4). The solution is the same for textures: use a reconstruction filter to interpolate between texels.
纹理图像通常由光栅图像定义,因此也需要考虑重建问题,就像上采样图像一样(第 10.4 节)。纹理的解决方案是相同的:使用重建过滤器在纹素之间进行插值。
We expand on each of these topics in the following sections.
我们将在以下章节中详细讨论每个主题。
What makes antialiasing textures more complex than other kinds of antialiasing is that the relationship between the rendered image and the texture is constantly changing. Every pixel value should be computed as an average color over the area belonging to the pixel in the image, and in the common case that the pixel is looking at a single surface, this corresponds to averaging over an area on the surface. If the surface color comes from a texture, this in turn amounts to averaging over a corresponding part of the texture, known as the texture space footprint of the pixel. Figure 11.16 illustrates how the footprints of square areas (which could be pixel areas in a lower-resolution image) map to very different sized and shaped areas in the floor’s texture space.
抗锯齿纹理比其他类型的抗锯齿更复杂,因为渲染图像和纹理之间的关系在不断变化。每个像素值都应计算为图像中属于该像素的区域的平均颜色,在像素查看单个表面的常见情况下,这相当于对表面上的区域进行平均。如果表面颜色来自纹理,则这又相当于对纹理的相应部分进行平均,称为像素的纹理空间足迹。图 11.16说明了方形区域(可能是低分辨率图像中的像素区域)的足迹如何映射到地板纹理空间中大小和形状非常不同的区域。
Figure 11.16. The footprints in texture space of identically sized square areas in the image vary in size and shape across the image.
图 11.16.图像中相同大小的方形区域在纹理空间中的足迹在整个图像中的大小和形状各不相同。
Recall the three spaces involved in rendering with textures: the projection π that maps 3D points into the image and the texture coordinate function ϕ that maps 3D points into texture space. To work with pixel footprints, we need to understand the composition of these two mappings: first follow π backwards to get from the image to the surface and then follow ϕ forwards. This composition ψ = ϕ ∘ π-1 is what determines pixel footprints: the footprint of a pixel is the image of that pixel’s square area of the image under the mapping ψ.
回想一下纹理渲染涉及的三个空间:将 3D 点映射到图像中的投影 π 和将 3D 点映射到纹理空间中的纹理坐标函数ϕ 。要处理像素足迹,我们需要了解这两个映射的组成:首先沿 π 向后从图像到达表面,然后沿ϕ向前。此组合ψ = ϕ ∘ π -1决定了像素足迹:像素的足迹是该像素在映射ψ下的图像的方形区域的图像。
The core problem in texture antialiasing is computing an average value of the texture over the footprint of a pixel. To do this exactly in general could be a pretty complicated job: for a faraway object with a complicated surface shape, the footprint could be a complicated shape covering a large area, or possibly several disconnected areas, in texture space. But in the typical case, a pixel lands in a smooth area of surface that is mapped to a single area in the texture.
纹理抗锯齿的核心问题是计算纹理在像素覆盖范围内的平均值。通常来说,要准确做到这一点可能是一项相当复杂的工作:对于具有复杂表面形状的远距离物体,覆盖范围可能是覆盖纹理空间中大面积或可能是多个不连续区域的复杂形状。但在典型情况下,像素落在表面的平滑区域,该区域映射到纹理中的单个区域。
Because ψ contains both the mapping from image to surface and the mapping from surface to texture, the size and shape of the footprint depend on both the viewing situation and the texture coordinate function. When a surface is closer to the camera, pixel footprints will be smaller; when the same surface moves farther away, the footprint gets bigger. When surfaces are viewed at an oblique angle, the footprint of a pixel on the surface is elongated, which usually means it will be elongated in texture space also. Even with a fixed view, the texture coordinate function can cause variations in the footprint: if it distorts area, the size of footprints will vary, and if it distorts shape, they can be elongated even for head-on views of the surface.
因为ψ既包含从图像到表面的映射,也包含从表面到纹理的映射,所以足迹的大小和形状取决于观察情况和纹理坐标函数。当表面离相机较近时,像素足迹会变小;当同一表面移得较远时,足迹会变大。当以斜角观察表面时,表面上像素的足迹会被拉长,这通常意味着它在纹理空间中也会被拉长。即使在固定视图下,纹理坐标函数也会导致足迹发生变化:如果它扭曲了面积,足迹的大小就会发生变化,如果它扭曲了形状,即使是正面观察表面,它们也会被拉长。
However, to find an efficient algorithm for computing antialiased lookups, some substantial approximations will be needed. When a function is smooth, a linear approximation is often useful. In the case of texture antialiasing, this means approximating the mapping ψ from image space to texture space as a linear mapping from 2D to 2D:
然而,要找到一种计算抗锯齿查找的有效算法,需要进行一些实质性的近似。当函数平滑时,线性近似通常很有用。在纹理抗锯齿的情况下,这意味着将从图像空间到纹理空间的映射ψ近似为从 2D 到 2D 的线性映射:
In mathematicians’ terms, we have made a one-term Taylor series approximation to the function ψ.
用数学家的话来说,我们对函数ψ做了一项泰勒级数近似。
where the two-by-two matrix J is some approximation to the derivative of ψ. It has four entries, and if we denote the image-space position as x = (x,y) and the texture-space position as u = (u,v), then
其中,二乘二矩阵J是ψ导数的近似值。它有四个元素,如果我们将图像空间位置表示为x = ( x,y ),将纹理空间位置表示为u = ( u,v ),则
where the four derivatives describe how the texture point (u,v) that is seen at a point (x,y) in the image changes when we change x and y.
其中四个导数描述了当我们改变x和y时,在图像中的点 ( x,y ) 处看到的纹理点 ( u,v ) 如何变化。
A geometric interpretation of this approximation (Figure 11.17) is that it says a unit-sized square pixel area centered at x in the image will map approximately to a parallelogram in texture space, centered at ψ(x) and with its edges parallel to the vectors ux = (du/dx,dv/dx) and uy = (du/dy,dv/dy).
这种近似的几何解释(图 11.17 )是,图像中以x为中心的单位大小的方形像素区域将近似映射到纹理空间中的平行四边形,以ψ ( x ) 为中心,其边缘平行于向量u x = ( du/dx,dv/dx ) 和u y = ( du/dy,dv/dy )。
Figure 11.17. An approximation of the texture-space footprint of a pixel can be made using the derivative of the mapping from (x,y) to (u,v). The partial derivatives with respect to x and y are parallel to the images of the x and y isolines (blued) and span a parallelogram (shaded in orange) that approximates the curved shape of the exact footprint (outlined in black).
图 11.17.可以使用从 ( x,y ) 到 ( u,v ) 的映射导数来近似像素的纹理空间足迹。关于 x 和 y 的偏导数与x和y等值线的图像平行(蓝色),并跨越平行四边形(橙色阴影),近似于精确足迹的弯曲形状(黑色轮廓)。
The derivative matrix J is useful because it tells the whole story of variation in the (approximated) texture-space footprint across the image. Derivatives that are larger in magnitude indicate larger texture-space footprints, and the relationship between the derivative vectors ux and uy indicates the shape. When they are orthogonal and the same length, the footprint is square, and as they become skewed and/or very different in length, the footprint becomes elongated.
导数矩阵J很有用,因为它可以说明整个图像中(近似的)纹理空间足迹的变化情况。导数幅度越大,纹理空间足迹越大,导数向量u x和u y之间的关系表示形状。当它们正交且长度相同时,足迹为正方形,当它们变得倾斜和/或长度差异很大时,足迹就会变长。
The approach here uses a box filter to sample the image. Some systems instead use a Gaussian pixel filter, which becomes an elliptical Gaussian in texture space; this is elliptical weighted averaging (EWA).
此处的方法是使用盒式过滤器对图像进行采样。有些系统则使用高斯像素过滤器,该过滤器在纹理空间中变为椭圆高斯;这就是椭圆加权平均 (EWA)。
We’ve now reached the form of the problem that’s usually thought of as the “right answer”: a filtered texture sample at a particular image-space position should be the average value of the texture map over the parallelogram-shaped footprint defined by the texture coordinate derivatives at that point. This already has some assumptions baked into it—namely, that the mapping from image to texture is smooth—but it is sufficiently accurate for excellent image quality. However, this parallelogram area average is already too expensive to compute exactly, so various approximations are used. Approaches to texture antialiasing differ in the speed/quality tradeoffs they make in approximating this lookup. We discuss these in the following sections.
现在,我们已经找到了通常被认为是“正确答案”的问题形式:特定图像空间位置处的滤波纹理样本应为纹理图在该点处由纹理坐标导数定义的平行四边形覆盖面积上的平均值。这已经包含了一些假设 - 即从图像到纹理的映射是平滑的 - 但它足够准确,可以获得出色的图像质量。但是,这个平行四边形面积平均值的计算成本已经太高,因此使用了各种近似值。纹理抗锯齿方法在近似此查找时所做的速度/质量权衡方面有所不同。我们将在以下部分中讨论这些问题。
When the footprint is smaller than a texel, we are magnifying the texture as it is mapped into the image. This case is analogous to upsampling an image, and the main consideration is interpolating between texels to produce a smooth image in which the texel grid is not obvious. Just as in image upsampling, this smoothing process is defined by a reconstruction filter that is used to compute texture samples at arbitrary locations in texture space. (See Figure 11.18.)
当覆盖面积小于纹理像素时,我们会在将纹理映射到图像中时将其放大。这种情况类似于对图像进行上采样,主要考虑的是在纹理像素之间进行插值,以生成纹理像素网格不明显的平滑图像。与图像上采样一样,此平滑过程由重构滤波器定义,该滤波器用于计算纹理空间中任意位置的纹理样本。(见图11.18 。)
Figure 11.18. The dominant issues in texture filtering change with the footprint size. For small footprints (left) interpolating between pixels is needed to avoid blocky artifacts; for large footprints, the challenge is to efficiently find the average of many pixels.
图 11.18。纹理过滤中的主要问题随覆盖面积大小而变化。对于小覆盖面积(左),需要在像素之间进行插值以避免出现块状伪影;对于大覆盖面积,挑战在于高效地找到许多像素的平均值。
The considerations are pretty much the same as in image resampling, with one important difference. In image resampling, the task is to compute output samples on a regular grid and that regularity enabled an important optimization in the case of a separable reconstruction filter. In texture filtering, the pattern of lookups is not regular, and the samples have to be computed separately. This means large, high-quality reconstruction filters are very expensive to use, and for this reason the highest-quality filter normally used for textures is bilinear interpolation.
这些考虑与图像重采样中的考虑基本相同,但有一个重要的区别。在图像重采样中,任务是在规则网格上计算输出样本,并且这种规则性在可分离重建滤波器的情况下实现了重要的优化。在纹理过滤中,查找模式不规则,并且必须单独计算样本。这意味着使用大型、高质量的重建滤波器非常昂贵,因此通常用于纹理的最高质量的滤波器是双线性插值。
The calculation of a bilinearly interpolated texture sample is the same as computing one pixel in an image being upsampled with bilinear interpolation. First, we express the texture-space sample point in terms of (real-valued) texel coordinates, then we read the values of the four neighboring texels and average them. Textures are usually parameterized over the unit square, and the texels are located in the same way as pixels in any image, spaced a distance 1/nu apart in the u direction and 1/nv in v, with texel (0,0) positioned half a texel in from the edge for symmetry. (See Chapter 10 for the full explanation.)
双线性插值纹理样本的计算与计算使用双线性插值进行上采样的图像中的一个像素相同。首先,我们用(实值)纹素坐标表示纹理空间采样点,然后读取四个相邻纹素的值并取平均值。纹理通常在单位正方形上参数化,纹素的位置与任何图像中的像素相同,在 u 方向上相距 1/ n u ,在 v 方向上相距 1/ n v ,纹素 (0,0) 位于距边缘半个纹素的位置,以实现对称性。(有关完整解释,请参阅第 10 章。)
Color tex_sample_bilinear(Texture t, float u, float v) { u_p = u ⋆ t.width - 0.5 v_p = v ⋆ t.height - 0.5 iu0 = floor(u_p); iu1 = iu0 + 1 iv0 = floor(v_p); iv1 = iv0 + 1 a_u = (iu1 - u_p); b_u = 1 - a_u a_v = (iv1 - v_p); b_v = 1 - a_v return a_u ⋆ a_v ⋆ t[iu0][iv0] + a_u ⋆ b_v ⋆ t[iu0][iv1] + b_u ⋆ a_v ⋆ t[iu1][iv0] + b_u ⋆ b_v ⋆ t[iu1][iv1] }
In many systems, this operation becomes an important performance bottleneck, mainly because of the memory latency involved in fetching the four texel values from the texture data. The pattern of sample points for textures is irregular, because the mapping from image to texture space is arbitrary, but often coherent, since nearby image points tend to map to nearby texture points that may read the same texels. For this reason, high-performance systems have special hardware devoted to texture sampling that handles interpolation and manages caches of recently used texture data to minimize the number of slow data fetches from the memory where the texture data are stored.
在许多系统中,此操作会成为重要的性能瓶颈,主要是因为从纹理数据中提取四个纹素值会产生内存延迟。纹理的采样点模式是不规则的,因为从图像到纹理空间的映射是任意的,但通常是连贯的,因为附近的图像点往往会映射到可能读取相同纹素的附近纹理点。出于这个原因,高性能系统具有专门用于纹理采样的特殊硬件,用于处理插值并管理最近使用的纹理数据的缓存,以最大限度地减少从存储纹理数据的内存中缓慢提取数据的次数。
After reading Chapter 10, you may complain that linear interpolation may not be a smooth enough reconstruction for some demanding applications. However, it can always be made good enough by resampling the texture to a somewhat higher resolution using a better filter, so that the texture is smooth enough that bilinear interpolation works well.
读完第 10 章后,您可能会抱怨线性插值对于某些要求苛刻的应用来说可能不是足够平滑的重建。但是,通过使用更好的过滤器将纹理重新采样到稍高的分辨率,始终可以使其变得足够好,这样纹理就足够平滑,双线性插值就可以很好地工作。
Doing a good job of interpolation only suffices in situations where the texture is being magnified: where the pixel footprint is small compared to the spacing of texels. When a pixel footprint covers many texels, good antialiasing requires computing the average of many texels to smooth out the signal so that it can be sampled safely.
只有在纹理被放大的情况下,插值才能发挥良好的作用:像素覆盖面积与纹素间距相比较小。当像素覆盖面积覆盖许多纹素时,良好的抗锯齿效果需要计算许多纹素的平均值来平滑信号,以便可以安全地对其进行采样。
One very accurate way to compute the average texture value over the footprint would be to find all the texels within the footprint and add them up. However, this is potentially very expensive when the footprint is large—it could require reading many thousands of texel just for a single lookup. A better approach is to precompute and store the averages of the texture over various areas of different size and position.
计算覆盖范围内的平均纹理值的一种非常准确的方法是找到覆盖范围内的所有纹素并将它们相加。但是,当覆盖范围很大时,这种方法可能非常昂贵——一次查找就可能需要读取数千个纹素。更好的方法是预先计算并存储不同大小和位置的各个区域的纹理平均值。
The name “mip” stands for the Latin phrase multim in parvo meaning “much in a small space.”
“mip”这个名字代表拉丁语短语multim in parvo,意为“小空间里的大东西”。
A very popular version of this idea is known as “MIP mapping” or just mipmapping. A mipmap is a sequence of textures that all contain the same image but at lower and lower resolution. The original, full-resolution texture image is called the base level, or level 0, of the mipmap, and level 1 is generated by taking that image and downsampling it by a factor of 2 in each dimension, resulting in an image with one-fourth as many texels. The texels in this image are, roughly speaking, averages of square areas 2 by 2 texels in size in the level-0 image.
这个想法的一个非常流行的版本被称为“MIP 映射”或简称为 mipmapping。Mipmap 是一系列纹理,它们都包含相同的图像,但分辨率越来越低。原始的全分辨率纹理图像称为基础级别,即 mipmap 的 0 级,1 级是通过将该图像在每个维度上以 2 的倍数进行下采样而生成的,从而得到一个包含四分之一纹素的图像。粗略地说,此图像中的纹素是 0 级图像中大小为 2 x 2 纹素的方形区域的平均值。
This process can be continued to define as many mipmap levels as desired: the image at level k is computed by downsampling the image at level k − 1 by two. A texel at level k corresponds to a square area measuring 2k by 2k texels in the original texture. For instance, starting with a 1024 × 1024 texture image, we could generate a mipmap with 11 levels: level 0 is 1024 × 1024; level 1 is 512 × 512, and so on until level 10, which has just a single texel. This kind of structure, with images that represent the same content at a series of lower and lower sampling rates, is called an image pyramid, based on the visual metaphor of stacking all the smaller images on top of the original.
这个过程可以持续下去,以定义所需数量的 mipmap 级别:级别 k 的图像是通过将级别k - 1 的图像向下采样 2 来计算的。级别 k 的纹素对应于原始纹理中 2 k x 2 k纹素的方形区域。例如,从 1024 × 1024 的纹理图像开始,我们可以生成具有 11 个级别的 mipmap:级别 0 为 1024 × 1024;级别 1 为 512 × 512,依此类推,直到级别 10,它只有一个纹素。这种结构称为图像金字塔,其图像以一系列越来越低的采样率表示相同的内容,其视觉比喻是将所有较小的图像堆叠在原始图像之上。
With the mipmap, or image pyramid, in hand, texture filtering can be done much more efficiently than by accessing many texels individually. When we need a texture value averaged over a large area, we simply use values from higher levels of the mipmap, which are already averages over large areas of the image. The simplest and fastest way to do this is to look up a single value from the mipmap, choosing the level so that the size covered by the texels at that level is roughly the same as the overall size of the pixel footprint. Of course, the pixel footprint might be quite different in shape from the (always square) area represented by the texel, and we can expect that to produce some artifacts.
有了 mipmap 或图像金字塔,纹理过滤的效率就会比单独访问多个纹理像素高得多。当我们需要一个纹理值在大面积上的平均值时,我们只需使用 mipmap 更高级别的值,这些值已经是图像大面积的平均值。最简单、最快的方法是从 mipmap 中查找单个值,选择级别,使该级别的纹理像素覆盖的大小与像素覆盖面积的整体大小大致相同。当然,像素覆盖面积的形状可能与纹理像素所代表的(总是正方形)区域完全不同,我们可以预料到这会产生一些瑕疵。
Setting aside for a moment the question of what to do when the pixel footprint has an elongated shape, suppose the footprint is a square of width D, measured in terms of texels in the full-resolution texture. What level of the mipmap is it appropriate to sample? Since the texels at level k cover squares of width 2k, it seems appropriate to choose k so that
暂且不论像素覆盖区域具有细长形状时该怎么办,假设覆盖区域是宽度为D的正方形,以全分辨率纹理中的纹素来衡量。采样 mipmap 的哪个级别合适?由于级别k的纹素覆盖宽度为2k的正方形,因此选择 k 似乎合适,以便
so we let k = log2D. Of course, this will give non-integer values of k most of the time, and we only have stored mipmap images for integer levels. Two possible solutions are to look up a value only for the integer nearest to k (efficient but produces seams at the abrupt transitions between levels) or to look up values for the two nearest integers to k and linearly interpolate the values (twice the work, but smoother).
因此我们让k = log 2 D 。当然,这在大多数情况下会给出非整数的 k 值,并且我们只存储了整数级别的 mipmap 图像。两种可能的解决方案是仅查找最接近 k 的整数的值(高效但在级别之间的突然转换处产生接缝)或查找两个最接近 k 的整数的值并线性插入这些值(工作量增加一倍,但更平滑)。
Before we can actually write down the algorithm for sampling a mipmap, we have to decide how we will choose the “width” D when footprints are not square. Some possibilities might be to use the square root of the area or to find the longest axis of the footprint and call that the width. A practical compromise that is easy to compute is to use the length of the longest edge:
在我们真正写下采样 mipmap 的算法之前,我们必须决定当足迹不是正方形时如何选择“宽度”D。一些可能性可能是使用面积的平方根,或者找到足迹的最长轴并将其称为宽度。一个易于计算的实际折衷方法是使用最长边的长度:
Color mipmap_sample_trilinear(Texture mip[], float u, float v, matrix J) { D = max_column_norm(J) k = log2(D) k0 = floor(k); k1 = k0 + 1 a = k1 - k; b = 1 - a c0 = tex_sample_bilinear(mip[k0], u, v) c1 = tex_sample_bilinear(mip[k1], u, v) return a ⋆ c0 + b ⋆ c1 }
Basic mipmapping does a good job of removing aliasing, but because it’s unable to handle elongated, or anisotropic pixel footprints, it doesn’t perform well when surfaces are viewed at grazing angles. This is most commonly seen on large planes that represent a surface the viewer is standing on. Points on the floor that are far away are viewed at very steep angles, resulting in very anisotropic footprints that mipmapping approximates with much larger square areas. The resulting image will appear blurred in the horizontal direction.
基本 mipmapping 可以很好地消除混叠,但由于它无法处理细长或各向异性的像素覆盖区域,因此在以掠射角查看表面时效果不佳。这在表示观看者所站表面的大平面上最常见。地板上远处的点以非常陡峭的角度观看,导致非常各向异性的覆盖区域,而 mipmapping 会用更大的方形区域来近似。生成的图像在水平方向上会显得模糊。
A mipmap can be used with multiple lookups to approximate an elongated footprint better. The idea is to select the mipmap level based on the shortest axis of the footprint rather than the largest and then average together several lookups spaced along the long axis. (See Figure 11.19.)
mipmap 可以与多个查找一起使用,以更好地近似细长的覆盖范围。其思路是根据覆盖范围的最短轴(而不是最大轴)选择 mipmap 级别,然后对沿长轴间隔的多个查找进行平均。(参见图 11.19 。)
Figure 11.19. The results of antialiasing a challenging test scene (reference images showing detailed structure, at left) using three different strategies: simply taking a single point sample with nearest-neighbor interpolation; using a mipmap pyramid to average a square area in the texture for each pixel; using several samples from a mipmap to average an anisotropic region in the texture.
图 11.19.使用三种不同策略对具有挑战性的测试场景进行抗锯齿处理的结果(左侧为显示详细结构的参考图像):简单地使用最近邻插值进行单点采样;使用 mipmap 金字塔对纹理中每个像素的方形区域进行平均;使用 mipmap 中的多个样本对纹理中的各向异性区域进行平均。
Once you understand the idea of defining texture coordinates for a surface and the machinery of looking up texture values, this machinery has many uses. In this section, we survey a few of the most important techniques in texture mapping, but textures are a very general tool with applications limited only by what the programmer can think up.
一旦你理解了为表面定义纹理坐标的概念和查找纹理值的机制,这个机制就会有很多用途。在本节中,我们将介绍纹理映射中一些最重要的技术,但纹理是一种非常通用的工具,其应用仅限于程序员能想到的东西。
The most basic use of texture mapping is to introduce variation in color by making the diffuse color that is used in shading computations—whether in a ray tracer or in a fragment shader—dependent on a value looked up from a texture. A textured diffuse component can be used to paste decals, paint decorations, or print text on a surface, and it can also simulate the variation in material color, for example, for wood or stone.
纹理贴图的最基本用途是使着色计算(无论是在光线追踪器还是在片段着色器中)中使用的漫反射颜色依赖于从纹理中查找的值,从而引入颜色变化。纹理漫反射组件可用于在表面上粘贴贴花、绘制装饰或打印文字,它还可以模拟材料颜色的变化,例如木材或石材。
Nothing limits us to varying only the diffuse color, though. Any other parameters, such as the specular reflectance or specular roughness, can also be textured. For instance, a cardboard box with transparent packing tape stuck to it may have the same diffuse color everywhere but be shinier, with higher specular reflectance and lower roughness, where the tape is than elsewhere. In many cases, the maps for different parameters are correlated: for instance, a glossy white ceramic cup with a logo printed on it may be both rougher and darker where it is printed (Figure 11.20), and a book with its title printed in metallic ink might change in diffuse color, specular color, and roughness, all at once.
不过,没有什么可以限制我们只改变漫反射颜色。任何其他参数,如镜面反射率或镜面粗糙度,也可以被纹理化。例如,粘有透明包装胶带的纸板箱可能在各处具有相同的漫反射颜色,但在胶带所在的位置会比其他地方更亮,镜面反射率更高,粗糙度更低。在许多情况下,不同参数的贴图是相关的:例如,印有徽标的光泽白色陶瓷杯在印刷的地方可能更粗糙、更暗(图 11.20 ),而用金属墨水印刷书名的书的漫反射颜色、镜面反射颜色和粗糙度可能会同时发生变化。
Figure 11.20. A ceramic mug with specular roughness controlled by an inverted copy of the diffuse color texture.
图 11.20.具有镜面粗糙度的陶瓷杯,由漫反射颜色纹理的反转副本控制。
Another quantity that is important for shading is the surface normal. With interpolated normals (Section 9.2), we know that the shading normal does not have to be the same as the geometric normal of the underlying surface. Normal mapping takes advantage of this fact by making the shading normal depend on values read from a texture map. The simplest way to do this is just to store the normals in a texture, with three numbers stored at every texel that are interpreted, instead of as the three components of a color, as the 3D coordinates of the normal vector.
对于着色来说,另一个重要的量是表面法线。通过插值法线(第 9.2 节),我们知道着色法线不必与底层表面的几何法线相同。法线贴图利用了这一事实,使着色法线依赖于从纹理贴图中读取的值。最简单的方法是将法线存储在纹理中,每个纹素存储三个数字,这些数字被解释为法线矢量的 3D 坐标,而不是颜色的三个分量。
Before a normal map can be used, though, we need to know what coordinate system the normals read from the map are represented in. Storing normals directly in object space, in the same coordinate system used for representing the surface geometry itself, is simplest: the normal read from the map can be used in exactly the same way as the normal reported by the surface itself: in most cases, it will need to be transformed into world space for lighting calculations, just like a normal that came with the geometry.
然而,在使用法线贴图之前,我们需要知道从法线贴图中读取的法线所表示的坐标系。将法线直接存储在对象空间中,使用与表示表面几何形状本身相同的坐标系,这是最简单的:从法线贴图中读取的法线可以以与表面本身报告的法线完全相同的方式使用:在大多数情况下,它需要转换到世界空间中进行照明计算,就像几何图形附带的法线一样。
However, normal maps that are stored in object space are inherently tied to the surface geometry—even for the normal map to have no effect, to reproduce the result with the geometric normals, the contents of the normal map have to track the orientation of the surface. Furthermore, if the surface is going to deform, so that the geometric normal changes, the object-space normal map can no longer be used, since it would keep providing the same shading normals.
然而,存储在对象空间中的法线贴图本质上与表面几何形状相关 — 即使法线贴图没有效果,为了使用几何法线重现结果,法线贴图的内容也必须跟踪表面的方向。此外,如果表面变形,导致几何法线发生变化,则对象空间法线贴图将不再可用,因为它将继续提供相同的着色法线。
The solution is to define a coordinate system for the normals that is attached to the surface. Such a coordinate system can be defined based on the tangent space of the surface (see Section 2.7): select a pair of tangent vectors and use them to define an orthonormal basis (Section 2.4.5). The texture coordinate function itself provides a useful way to select a pair of tangent vectors: use the directions tangent to lines of constant u and v. These tangents are not generally orthogonal, but we can use the procedure from Section 2.4.7 to “square up” the orthonormal basis, or it can be defined using the surface normal and just one tangent vector.
解决方案是为附着在表面上的法线定义一个坐标系。可以基于表面的切线空间定义这样的坐标系(参见第 2.7 节):选择一对切线向量,并使用它们定义正交基(第 2.4.5 节)。纹理坐标函数本身提供了一种选择一对切线向量的有用方法:使用与 u 和 v 为常数的线相切的方向。这些切线通常不正交,但我们可以使用第 2.4.7 节中的过程来“平方”正交基,或者可以使用表面法线和一个切线向量来定义它。
When normals are expressed in this basis they vary a lot less; since they are mostly pointing near the direction of the normal to the smooth surface, they will be near the vector (0,0,1)T in the normal map.
当法线以此基础表示时,它们的变化要小得多;由于它们大多指向光滑表面的法线方向附近,因此它们将位于法线贴图中的向量 (0,0,1) T附近。
Where do normal maps come from? Often they are computed from a more detailed model to which the smooth surface is an approximation; other times, they can be measured directly from real surfaces. They can also be authored as part of the modeling process; in this case, it’s often nice to use a bump map to specify the normals indirectly. The idea is that a bump map is a height field: a function that give the local height of the detailed surface above the smooth surface. Where the values are high (where the map looks bright, if you display it as an image), the surface is protruding outside the smooth surface; where the values are low (where the map looks dark), the surface is receding below it. For instance, a narrow dark line in the bump map is a scratch, or a small white dot is a bump.
法线贴图从何而来?通常,它们是根据光滑表面近似的更详细模型计算得出的;其他时候,它们也可以直接从真实表面测量得出。它们也可以作为建模过程的一部分进行创作;在这种情况下,使用凹凸贴图间接指定法线通常很不错。凹凸贴图的理念是高度场:一个给出光滑表面上方详细表面局部高度的函数。当值较高时(如果将其显示为图像,则贴图看起来很亮),表面突出于光滑表面;当值较低时(贴图看起来很暗),表面凹陷在光滑表面之下。例如,凹凸贴图中的窄暗线是划痕,小白点是凹凸。
Deriving a normal map from a bump map is simple: the normal map (expressed in the tangent frame) is the derivative of the bump map.
从凹凸贴图导出法线贴图很简单:法线贴图(在切线框架中表示)是凹凸贴图的导数。
Figure 11.21 shows texture maps being used to create woodgrain color and to simulate increased surface roughness due to finish soaking into the more porous parts of the wood, together with a bump map to create an imperfect finish and gaps between boards, to make a realistic wood floor.
图 11.21显示了使用纹理贴图来创建木纹颜色并模拟由于表面浸入木材多孔部分而导致的表面粗糙度增加,再加上凹凸贴图来创建不完美的表面处理和板之间的间隙,从而制作出逼真的木地板。
Figure 11.21. A wood floor rendered using texture maps to control the shading. (a) Only the diffuse color is modulated by a texture map. (b) The specular roughness is also modulated by a second texture map. (c) The surface normal is modified by a bump map.
图 11.21.使用纹理贴图控制阴影渲染的木地板。(a)只有漫反射颜色由纹理贴图调制。(b)镜面粗糙度也由第二个纹理贴图调制。(c)表面法线由凹凸贴图修改。
A problem with normal maps is that they don’t actually change the surface at all; they are just a shading trick. This becomes obvious when the geometry implied by the normal map should cause noticeable effects in 3D. In still images, the first problem to be noticed is usually that the silhouettes of objects remain smooth despite the appearance of bumps in the interior. In animations, the lack of parallax gives away that the bumps, however convincing, are really just “painted” on the surface.
法线贴图的一个问题是,它们实际上根本不会改变表面;它们只是一种阴影技巧。当法线贴图所暗示的几何形状在 3D 中产生明显的效果时,这一点就变得显而易见。在静态图像中,首先要注意的问题通常是物体的轮廓保持平滑,尽管内部出现凹凸。在动画中,缺乏视差会暴露出凹凸,无论多么逼真,实际上只是“画”在表面上。
Textures can be used for more than just shading, though: they can be used to alter geometry. A displacement map is one of the simplest versions of this idea. The concept is the same as a bump map: a scalar (one-channel) map that gives the height above the “average terrain.” But the effect is different. Rather than deriving a shading normal from the height map while using the smooth geometry, a displacement map actually changes the surface, moving each point along the normal of the smooth surface to a new location. The normals are roughly the same in each case, but the surface is different.
不过,纹理不仅仅可以用于着色,还可以用于改变几何形状。位移图是这种想法最简单的版本之一。其概念与凹凸图相同:标量(单通道)图给出“平均地形”上方的高度。但效果不同。位移图不是在使用平滑几何形状时从高度图得出着色法线,而是实际改变表面,将每个点沿着平滑表面的法线移动到新位置。在每种情况下,法线大致相同,但表面不同。
The most common way to implement displacement maps is to tessellate the smooth surface with a large number of small triangles and then displace the vertices of the resulting mesh using the displacement map. In the graphics pipeline, this can be done using a texture lookup at the vertex stage and is particularly handy for terrain.
实现位移图的最常见方法是用大量小三角形对光滑表面进行细分,然后使用位移图对生成的网格的顶点进行位移。在图形管道中,这可以在顶点阶段使用纹理查找来完成,对于地形来说尤其方便。
Shadows are an important cue to object relationships in a scene, and as we have seen, they are simple to include in ray-traced images. However, it’s not obvious how to get shadows in rasterized renderings, because surfaces are considered one at a time, in isolation. Shadow maps are a technique for using the machinery of texture mapping to get shadows from point light sources.
阴影是场景中物体关系的重要线索,正如我们所见,它们很容易包含在光线追踪图像中。然而,如何在光栅化渲染中获得阴影并不明显,因为表面一次被视为一个,是孤立的。阴影贴图是一种使用纹理贴图机制从点光源获取阴影的技术。
The idea of a shadow map is to represent the volume of space that is illuminated by a point light source. Think of a source like a spotlight or video projector, which emits light from a point into a limited range of directions. The volume that is illuminated—the set of points where you would see light on your hand if you held it there—is the union of line segments joining the light source to the closest surface point along every ray leaving that point.
阴影贴图的理念是代表点光源照亮的空间体积。想象一下聚光灯或视频投影仪之类的光源,它们从一个点向有限的方向发射光线。被照亮的体积(如果你把手放在手上,你会看到光线的点集)是将光源连接到离开该点的每条射线的最近表面点的线段的并集。
Interestingly, this volume is the same as the volume that is visible to a perspective camera located at the same point as the light source: a point is illuminated by a source if and only if it is visible from the light source location. In both cases, there’s a need to evaluate visibility for points in the scene: for visibility, we needed to know whether a fragment was visible to the camera, to know whether to draw it in the image; and for shadowing, we need to know whether a fragment is visible to the light source, to know whether it’s illuminated by that source or not. (See Figure 11.22.)
有趣的是,这个体积与位于与光源相同位置的透视相机可见的体积相同:当且仅当从光源位置可见时,该点才被光源照亮。在这两种情况下,都需要评估场景中点的可见性:对于可见性,我们需要知道片段是否对相机可见,以便知道是否在图像中绘制它;对于阴影,我们需要知道片段是否对光源可见,以便知道它是否被该光源照亮。(见图11.22 。)
Figure 11.22. (a) The region of space illuminated by a point light. (b) That region as approximated by a 10-pixel-wide shadow map.
图 11.22。 (a) 点光源照亮的空间区域。 (b) 该区域由 10 像素宽的阴影图近似。
In both cases, the solution is the same: a depth map that tells the distance to the closest surface along a bunch of rays. In the visibility case, this is the z-buffer (Section 9.2.3), and for the shadowing case, it is called a shadow map. In both cases, visibility is evaluated by comparing the depth of a new fragment to the depth stored in the map, and the surface is hidden from the projection point (occluded or shadowed) if its depth is greater than the depth of the closest visible surface. A difference is that the z buffer is used to keep track of the closest surface seen so far and is updated during rendering, whereas a shadow map tells the distance to the closest surface in the whole scene.
在这两种情况下,解决方案都是相同的:深度图可以告诉您沿着一束射线到最近表面的距离。在可见性的情况下,这就是 z 缓冲区(第 9.2.3 节),而对于阴影情况,它称为阴影图。在这两种情况下,可见性都是通过将新片段的深度与存储在图中的深度进行比较来评估的,如果表面的深度大于最近的可见表面的深度,则该表面从投影点隐藏(被遮挡或被阴影)。不同之处在于,z 缓冲区用于跟踪迄今为止看到的最近表面,并在渲染期间更新,而阴影图则可以告诉您整个场景中到最近表面的距离。
A shadow map is calculated in a separate rendering pass ahead of time: simply rasterize the whole scene as usual, and retain the resulting depth map (there is no need to bother with calculating pixel values). Then, with the shadow map in hand, you perform an ordinary rendering pass, and when you need to know whether a fragment is visible to the source, you project its location in the shadow map (using the same perspective projection that was used to render the shadow map in the first place) and compare the looked-up value dmap with the actual distance d to the source. If the distances are the same, the fragment’s point is illuminated; if the d > dmap, that implies there is a different surface closer to the source, so it is shadowed.
阴影图是在单独的渲染过程中提前计算的:只需像平常一样栅格化整个场景,并保留生成的深度图(无需费心计算像素值)。然后,有了阴影图,您就可以执行普通的渲染过程,当您需要知道某个片段是否对源可见时,您可以将其位置投影到阴影图中(使用与最初用于渲染阴影图的相同透视投影)并将查找到的值d map与到源的实际距离d进行比较。如果距离相同,则片段的点被照亮;如果 d > d map ,则意味着有一个离源更近的不同表面,因此它被阴影覆盖。
The phrase “if the distances are the same” should raise some red flags in your mind: since all the quantities involved are approximations with limited precision, we can’t expect them to be exactly the same. For visible points, the d ≈ dmap but sometimes d will be a bit larger and sometimes a bit smaller. For this reason, a tolerance is required: a point is considered illuminated if d - dmap < ϵ. This tolerance ϵ is known as shadow bias.
“如果距离相同”这句话应该引起你的警惕:由于所涉及的所有数量都是精度有限的近似值,我们不能指望它们完全相同。对于可见点,d ≈ d映射,但有时 d 会大一点,有时会小一点。因此,需要一个公差:如果 d - d映射< ϵ,则认为该点被照亮。这个公差 ϵ 称为阴影偏差。
When looking up in shadow maps it doesn’t make a lot of sense to interpolate between the depth values recorded in the map. This might lead to more accurate depths (requiring less shadow bias) in smooth areas, but will cause bigger problems near shadow boundaries, where the depth value changes suddenly. Therefore, texture lookups in shadow maps are done using nearest-neighbor reconstruction. To reduce aliasing, multiple samples can be used, with the 1-or-0 shadow results (rather than the depths) averaged; this is known as percentage closer filtering.
在阴影贴图中查找时,在贴图中记录的深度值之间进行插值没有多大意义。这可能会在平滑区域产生更准确的深度(需要更少的阴影偏差),但在阴影边界附近会导致更大的问题,因为深度值会突然改变。因此,阴影贴图中的纹理查找是使用最近邻重建完成的。为了减少混叠,可以使用多个样本,对 1 或 0 的阴影结果(而不是深度)取平均值;这称为百分比更接近过滤。
Just as a texture is handy for introducing detail into the shading on a surface without having to add more detail to the model, a texture can also be used to introduce detail into the illumination without having to model complicated light source geometry. When light comes from far away compared to the size of objects in view, the illumination changes very little from point to point in the scene. It is handy to make the assumption that the illumination depends only on the direction you look and is the same for all points in the scene, and then to express this dependence of illumination on direction using an environment map.
纹理可以方便地将细节引入表面的阴影中而无需向模型添加更多细节,纹理也可用于将细节引入照明中而无需对复杂的光源几何体进行建模。当光线来自与视野中物体的大小相比较远的地方时,场景中各点的照明变化很小。可以方便地假设照明仅取决于您观察的方向,并且对于场景中的所有点都是相同的,然后使用环境贴图来表达照明对方向的这种依赖性。
The idea of an environment map is that a function defined over directions in 3D is a function on the unit sphere, so it can be represented using a texture map in exactly the same way as we might represent color variation on a spherical object. Instead of computing texture coordinates from the 3D coordinates of a surface point, we use exactly the same formulas to compute texture coordinates from the 3D coordinates of the unit vector that represents the direction from which we want to know the illumination.
环境贴图的理念是,在三维空间中定义在方向上的函数是单位球面上的函数,因此可以使用纹理贴图来表示它,其表示方式与我们表示球形物体上的颜色变化完全相同。我们不是根据表面点的三维坐标计算纹理坐标,而是使用完全相同的公式根据表示我们想要知道照明方向的单位向量的三维坐标来计算纹理坐标。
The simplest application of an environment map is to give colors to rays in a ray tracer that don’t hit any objects:
环境贴图最简单的应用是为光线追踪器中未击中任何物体的光线赋予颜色:
trace_ray(ray, scene) { if (surface = scene.intersect(ray)) { return surface.shade(ray) } else { u, v = spheremap_coords(r.direction) return texture_lookup(scene.env_map, u, v) } }
With this change to the ray tracer, shiny objects that reflect other scene objects will now also reflect the background environment.
通过对光线追踪器的这一改变,反射其他场景物体的闪亮物体现在也会反射背景环境。
A similar effect can be achieved in the rasterization context by adding a mirror reflection to the shading computation, which is computed in the same way as in a ray tracer, but simply looks up directly in the environment map with no regard for other objects in the scene:
通过在着色计算中添加镜面反射,可以在光栅化上下文中实现类似的效果,其计算方式与光线追踪器相同,但只是直接在环境图中查找,而不考虑场景中的其他物体:
shade_fragment(view_dir, normal) { out_color = diffuse_shading(k_d, normal) out_color += specular_shading(k_s, view_dir, normal) u, v = spheremap_coords(reflect(view_dir, normal)) out_color += k_m ⋆ texture_lookup(environment_map, u, v) }
This technique is known as reflection mapping.
这种技术称为反射映射。
A more advanced used of environment maps computes all the illumination from the environment map, not just the mirror reflection. This is environment lighting and can be computed in a ray tracer using Monte Carlo integration or in rasterization by approximating the environment with a collection of point sources and computing many shadow maps.
环境贴图的更高级用法是计算环境贴图中的所有照明,而不仅仅是镜面反射。这是环境照明,可以使用蒙特卡洛积分在光线追踪器中计算,或者通过使用点源集合近似环境并计算许多阴影图在光栅化中计算。
Environment maps can be stored in any coordinates that could be used for mapping a sphere. Spherical (longitude–latitude) coordinates are one popular option, though the compression of textures at the poles wastes texture resolution and can create artifacts at the poles. Cubemaps are a more efficient choice, widely used in interactive applications (Figure 11.23).
环境贴图可以存储在任何可用于映射球体的坐标中。球面(经度-纬度)坐标是一种流行的选择,尽管在极点处压缩纹理会浪费纹理分辨率并可能在极点处产生伪影。立方体贴图是一种更有效的选择,广泛用于交互式应用程序(图 11.23 )。
Figure 11.23. A cube map of St. Peter’s Basilica, with the six faces stored in on image in the unwrapped “horizontal cross” arrangement. (texture: Emil Persson)
图 11.23。圣彼得大教堂的立方体贴图,其中六个面以展开的“水平十字”排列存储在图像中。(纹理:Emil Persson)
In previous chapters, we used cr as the diffuse reflectance at a point on an object. For an object that does not have a solid color, we can replace this with a function cr(p) which maps 3D points to RGB colors (Peachey, 1985; Perlin, 1985). This function might just return the reflectance of the object that contains p. But for objects with texture, we should expect cr(p) to vary as p moves across a surface.
在前面的章节中,我们使用c r作为物体上某一点的漫反射率。对于没有纯色的物体,我们可以用函数c r ( p ) 代替它,该函数将 3D 点映射到 RGB 颜色(Peachey,1985;Perlin,1985)。此函数可能只返回包含p的物体的反射率。但对于具有纹理的物体,我们应该预期c r ( p ) 会随着p在表面上移动而变化。
An alternative to defining texture mapping functions that map from a 3D surface to a 2D texture domain is to create a 3D texture that defines an RGB value at every point in 3D space. We will only call it for points p on the surface, but it is usually easier to define it for all 3D points than a potentially strange 2D subset of points that are on an arbitrary surface. The good thing about 3D texture mapping is that it is easy to define the mapping function, because the surface is already embedded in 3D space, and there is no distortion in the mapping from 3D to texture space. Such a strategy is clearly suitable for surfaces that are “carved” from a solid medium, such as a marble sculpture.
定义从 3D 表面映射到 2D 纹理域的纹理映射函数的另一种方法是创建一个 3D 纹理,该纹理在 3D 空间中的每个点上定义一个 RGB 值。我们只会对表面上的点p调用它,但通常为所有 3D 点定义它比为任意表面上可能奇怪的 2D 点子集定义它更容易。3D 纹理映射的好处在于它很容易定义映射函数,因为表面已经嵌入 3D 空间,并且从 3D 到纹理空间的映射没有失真。这种策略显然适用于从固体介质“雕刻”的表面,例如大理石雕塑。
The downside to 3D textures is that storing them as 3D raster images or volumes consumes a great deal of memory. For this reason, 3D texture coordinates are most commonly used with procedural textures in which the texture values are computed using a mathematical procedure rather than by looking them up from a texture image. In this section, we look at a couple of the fundamental tools used to define procedural textures. These could also be used to define 2D procedural textures, though in 2D it is more common to use raster texture images.
3D 纹理的缺点是,将其存储为 3D 光栅图像或体积会占用大量内存。因此,3D 纹理坐标最常用于程序纹理,其中纹理值是使用数学过程计算的,而不是从纹理图像中查找。在本节中,我们将介绍用于定义程序纹理的几个基本工具。这些工具也可用于定义 2D 程序纹理,尽管在 2D 中更常见的是使用光栅纹理图像。
There are a surprising number of ways to make a striped texture. Let’s assume we have two colors c0 and c1 that we want to use to make the stripe color. We need some oscillating function to switch between the two colors. An easy one is a sine:
制作条纹纹理的方法多得令人吃惊。假设我们有两种颜色 c 0和 c 1 ,我们想用它们来制作条纹颜色。我们需要一些振荡函数来在两种颜色之间切换。一个简单的方法是正弦函数:
RGB stripe( point p)
RGB 条纹(点p )
if (sin(xp) > 0) then
如果(sin( x p ) > 0)则
return c0
返回c 0
else
别的
return c1
返回c 1
We can also make the stripe’s width w controllable:
我们还可以使条纹的宽度w可控:
RGB stripe( point p, real w)
RGB 条纹(点p ,实数w )
if (sin(πxp∕w) > 0) then
如果(sin( πxp ∕w )>0)则
return c0
返回c 0
else
别的
return c1
返回c 1
If we want to interpolate smoothly between the stripe colors, we can use a parameter t to vary the color linearly:
如果我们想在条纹颜色之间平滑地插值,我们可以使用参数t来线性改变颜色:
RGB stripe( point p, real w
RGB 条纹(点p ,实数w
t = (1 + sin(πpx/w))/2
t = (1 + sin( πp x / w ))/2
return (1 - t)c0 + tc1
返回(1 - t ) c 0 + tc 1
These three possibilities are shown in Figure 11.24.
图 11.24显示了这三种可能性。
Figure 11.24. Various stripe textures result from drawing a regular array of xy points while keeping z constant.
图 11.24.通过绘制xy点的规则阵列并保持z不变可以得到各种条纹纹理。
Although regular textures such as stripes are often useful, we would like to be able to make “mottled” textures such as we see on birds’ eggs. This is usually done by using a sort of “solid noise,” usually called Perlin noise after its inventor, who received a technical Academy Award for its impact in the film industry (Perlin, 1985).
虽然条纹等规则纹理通常很有用,但我们希望能够制作出像鸟蛋上看到的“斑驳”纹理。这通常是通过使用一种“固体噪音”来实现的,通常称为Perlin 噪声以其发明者的名字命名,该噪声对电影业产生了重大影响,并因此荣获奥斯卡技术奖(Perlin,1985 年)。
Getting a noisy appearance by calling a random number for every point would not be appropriate, because it would just be like “white noise” in TV static. We would like to make it smoother without losing the random quality. One possibility is to blur white noise, but there is no practical implementation of this. Another possibility is to make a large lattice with a random number at every lattice point and then interpolate these random points for new points between lattice nodes; this is just a 3D texture array as described in the last section with random numbers in the array. This technique makes the lattice too obvious. Perlin used a variety of tricks to improve this basic lattice technique so the lattice was not so obvious. This results in a rather baroque-looking set of steps, but essentially there are just three changes from linearly interpolating a 3D array of random values. The first change is to use Hermite interpolation to avoid mach bands, just as can be done with regular textures. The second change is the use of random vectors rather than values, with a dot product to derive a random number; this makes the underlying grid structure less visually obvious by moving the local minima and maxima off the grid vertices. The third change is to use a 1D array and hashing to create a virtual 3D array of random vectors. This adds computation to lower memory use. Here is his basic method:
通过为每个点调用一个随机数来获得嘈杂的外观并不合适,因为它就像电视静态图像中的“白噪声”一样。我们希望使其更平滑,而不会失去随机性。一种可能性是模糊白噪声,但这并没有实际的实现。另一种可能性是制作一个大格子,每个格子点都有一个随机数,然后在格子节点之间插入这些随机点作为新点;这只是上一节中描述的 3D 纹理数组,数组中有随机数。这种技术使格子太明显了。Perlin 使用各种技巧来改进这种基本的格子技术,使格子不那么明显。这导致了一组看起来相当巴洛克式的步骤,但本质上与线性插入随机值的 3D 数组只有三个变化。第一个变化是使用 Hermite 插值来避免马赫带,就像对常规纹理所做的那样。第二个变化是使用随机向量而不是值,并使用点积来得出随机数;通过将局部最小值和最大值移出网格顶点,这使得底层网格结构在视觉上不那么明显。第三个变化是使用一维数组和散列来创建随机向量的虚拟三维数组。这增加了计算量,降低了内存使用量。这是他的基本方法:
where (x,y,z) are the Cartesian coordinates of x, and
其中 ( x,y,z ) 是x的笛卡尔坐标,并且
and ω(t) is the cubic weighting function:
其中ω(t)是三次权重函数:
The final piece is that Γijk is a random unit vector for the lattice point (x,y,z) = (i,j,k). Since we want any potential ijk, we use a pseudorandom table:
最后一点是 Γ ijk是格点 ( x,y,z ) = ( i,j,k ) 的随机单位向量。由于我们想要任意潜在的ijk ,因此我们使用伪随机表:
where G is a precomputed array of n random unit vectors, and ϕ(i) = P[i mod n] where P is an array of length n containing a permutation of the integers 0 through n - 1. In practice, Perlin reports n = 256 works well. To choose a random unit vector (vx,vy,vz), first set
其中G是预先计算的n 个随机单位向量数组, ϕ ( i ) = P [ i mod n ],其中P是长度为n的数组,包含从 0 到n - 1 的整数排列。在实践中,Perlin 报告n = 256 效果很好。要选择随机单位向量 ( v x ,v y ,v z ),首先设置
where ξ,ξ′,ξ″ are canonical random numbers (uniform in the interval [0,1)). Then, if , make the vector a unit vector. Otherwise, keep setting it randomly until its length is less than one, and then make it a unit vector. This is an example of a rejection method, which will be discussed more in Chapter 13. Essentially, the “less than” test gets a random point in the unit sphere, and the vector for the origin to that point is uniformly random. That would not be true of random points in the cube, so we “get rid” of the corners with the test.
其中 ξ、ξ′、ξ ″是正则随机数(在区间 [0,1) 内均匀分布)。那么,如果( υ十2 + υ是2 + υ是2 ) < 1 ,使该向量成为单位向量。否则,继续随机设置它,直到其长度小于一,然后使其成为单位向量。这是一个拒绝方法的例子,将在第 13 章中进一步讨论。本质上,“小于”测试会在单位球面中得到一个随机点,并且从原点到该点的向量是均匀随机的。立方体中的随机点则不然,所以我们用测试“去掉”角。
Because solid noise can be positive or negative, it must be transformed before being converted to a color. The absolute value of noise over a 10 × 10 square is shown in Figure 11.25, along with stretched versions. These versions are stretched by scaling the points input to the noise function.
由于固体噪声可以是正值也可以是负值,因此在将其转换为颜色之前必须对其进行变换。图 11.25显示了 10 × 10 正方形上的噪声绝对值以及拉伸版本。这些版本通过缩放输入到噪声函数的点来进行拉伸。
Figure 11.25. Absolute value of solid noise, and noise for scaled x and y values.
图 11.25.固体噪声的绝对值以及缩放的x和y值的噪声。
The dark curves are where the original noise function changed from positive to negative. Since noise varies from - 1 to 1, a smoother image can be achieved by using (noise + 1)∕2 for color. However, since noise values close to 1 or - 1 are rare, this will be a fairly smooth image. Larger scaling can increase the contrast (Figure 11.26).
深色曲线是原始噪声函数从正变为负的位置。由于噪声从 -1 变化到 1,因此可以通过使用 (噪声 + 1)∕2 作为颜色来实现更平滑的图像。但是,由于接近 1 或 -1 的噪声值很少见,因此这将是一个相当平滑的图像。更大的缩放比例可以增加对比度(图 11.26 )。
Figure 11.26. Using 0.5(noise+1) (a) and 0.8(noise+1) (b) for intensity.
图 11.26.使用 0.5(noise+1) (a) 和 0.8(noise+1) (b) 作为强度。
Many natural textures contain a variety of feature sizes in the same texture. Perlin uses a pseudofractal “turbulence” function:
许多自然纹理在同一纹理中包含各种特征尺寸。Perlin 使用伪分形“湍流”函数:
This effectively repeatedly adds scaled copies of the noise function on top of itself as shown in Figure 11.27.
这实际上是将噪声函数的缩放副本重复添加到其自身之上,如图 11.27所示。
Figure 11.27. Turbulence function with (from top left to bottom right) one through eight terms in the summation.
图 11.27.湍流函数(从左上角到右下角)的求和项为一至八个。
The turbulence can be used to distort the stripe function:
湍流可用于扭曲条纹函数:
RGB turbstripe( point p, double w)
RGB turbstripe(点p ,双 w)
double t = (1 + sin(k1zp + turbulence(k2p))∕w)∕2
双t = (1 + sin( k 1 z p + 湍流( k 2 p ))∕ w )∕2
return t * s0 + (1 − t) * s1
返回t * s 0 + (1 − t ) * s 1
Various values for k1 and k2 were used to generate Figure 11.28.
使用了不同的k 1和k 2值来生成图 11.28 。
Figure 11.28. Various turbulent stripe textures with differentk1, k2. The top row has only the first term of the turbulence series.
图 11.28。具有不同k 1 、 k 2 的各种湍流条纹纹理。顶行仅包含湍流序列的第一项。
How do I implement displacement mapping in ray tracing?
如何在光线追踪中实现位移贴图?
There is no ideal way to do it. Generating all the triangles and caching the geometry when necessary will prevent memory overload (Pharr & Hanrahan, 1996; Pharr, Kolb, Gershbein, & Hanrahan, 1997). Trying to intersect the displaced surface directly is possible when the displacement function is restricted (Patterson, Hoggar, & Logie, 1991; Heidrich & Seidel, 1998; Smits, Shirley, & Stark, 2000).
没有理想的方法。生成所有三角形并在必要时缓存几何图形将防止内存过载(Pharr & Hanrahan,1996;Pharr、Kolb、Gershbein 和 Hanrahan,1997)。当位移函数受到限制时,尝试直接与位移表面相交是可能的(Patterson、Hoggar 和 Logie,1991;Heidrich 和 Seidel,1998;Smits、Shirley 和 Stark,2000)。
Humans are good at seeing small imperfections in surfaces. Geometric imperfections are typically absent in computer-generated images that use texture maps for details, so they look “too smooth.”
人类善于发现表面的细微瑕疵。使用纹理贴图呈现细节的计算机生成的图像通常不存在几何瑕疵,因此看起来“太平滑”。
The discussion of perspective-correct textures is based on Fast Shadows and Lighting Effects Using Texture Mapping (Segal, Korobkin, van Widenfelt, Foran, & Haeberli, 1992) and on 3D Game Engine Design (Eberly, 2000).
透视校正纹理的讨论基于使用纹理映射的快速阴影和照明效果(Segal、Korobkin、van Widenfelt、Foran & Haeberli,1992)和3D 游戏引擎设计(Eberly,2000)。
1. Find several ways to implement an infinite 2D checkerboard using surface and solid techniques. Which is best?
1.找到几种使用表面和实体技术实现无限二维棋盘的方法。哪种最好?
2. Verify that Equation (9.4) is a valid equality using brute-force algebra.
2.使用强力代数验证公式 (9.4) 为有效等式。
3. How could you implement solid texturing by using the z-buffer depth and a matrix transform?
3.如何使用 z 缓冲区深度和矩阵变换实现实体纹理?
4. Expand the function mipmap_sample_trilinear into a single function.
4.将函数 mipmap_sample_trilinear 扩展为单个函数。
Certain data structures seem to pop up repeatedly in graphics applications, perhaps because they address fundamental underlying ideas such as surfaces, space, and scene structure. This chapter talks about several basic and unrelated categories of data structures that are among the most common and useful: mesh structures, spatial data structures, scene graphs, and tiled multidimensional arrays.
某些数据结构似乎在图形应用程序中反复出现,可能是因为它们涉及表面、空间和场景结构等基本底层概念。本章讨论了几种基本且不相关的数据结构类别,这些结构是最常见和最有用的:网格结构、空间数据结构、场景图和平铺多维数组。
For meshes, we discuss the basic storage schemes used for storing static meshes and for transferring meshes to graphics APIs. We also discuss the winged-edge data structure (Baumgart, 1974) and the related half-edge structure, which are useful for managing models where the tessellation changes, such as in subdivision or model simplification. Although these methods generalize to arbitrary polygon meshes, we focus on the simpler case of triangle meshes here.
对于网格,我们讨论了用于存储静态网格和将网格传输到图形 API 的基本存储方案。我们还讨论了翼边数据结构 (Baumgart, 1974) 和相关的半边结构,它们对于管理细分发生变化的模型(例如在细分或模型简化中)非常有用。虽然这些方法可以推广到任意多边形网格,但我们在此重点关注三角形网格的简单情况。
Next, the scene-graph data structure is presented. Various forms of this data structure are ubiquitous in graphics applications because they are so useful in managing objects and transformations. All new graphics APIs are designed to support scene graphs well.
接下来,我们介绍场景图数据结构。这种数据结构的各种形式在图形应用程序中随处可见,因为它们在管理对象和转换方面非常有用。所有新的图形 API 都设计为能够很好地支持场景图。
For spatial data structures, we discuss three approaches to organizing models in 3D space—bounding volume hierarchies, hierarchical space subdivision, and uniform space subdivision—and the use of hierarchical space subdivision (BSP trees) for hidden surface removal. The same methods are also used for other purposes, including geometry culling and collision detection.
对于空间数据结构,我们讨论了在 3D 空间中组织模型的三种方法 - 边界体积层次结构、层次空间细分和均匀空间细分 - 以及使用层次空间细分 (BSP 树) 来移除隐藏表面。同样的方法也用于其他目的,包括几何剔除和碰撞检测。
Finally, the tiled multidimensional array is presented. Originally developed to help paging performance in applications where graphics data needed to be swapped in from disk, such structures are now crucial for memory locality on machines regardless of whether the array fits in main memory.
最后,介绍了平铺多维数组。这种结构最初是为了帮助提高需要从磁盘交换图形数据的应用程序中的分页性能而开发的,现在,无论数组是否适合主内存,这种结构对于机器上的内存局部性都至关重要。
Most real-world models are composed of complexes of triangles with shared vertices. These are usually known as triangular meshes, triangle meshes, or triangular irregular networks (TINs), and handling them efficiently is crucial to the performance of many graphics programs. The kind of efficiency that is important depends on the application. Meshes are stored on disk and in memory, and we’d like to minimize the amount of storage consumed. When meshes are transmitted across networks or from the CPU to the graphics system, they consume bandwidth, which is often even more precious than storage. In applications that perform operations on meshes, besides simply storing and drawing them—such as subdivision, mesh editing, mesh compression, or other operations—efficient access to adjacency information is crucial.
大多数现实世界的模型都是由具有共享顶点的三角形复合体组成的。这些通常称为三角网格、三角形网格或不规则三角网络(TIN),有效地处理它们对于许多图形程序的性能至关重要。哪种效率才是重要的取决于应用程序。网格存储在磁盘和内存中,我们希望尽量减少所消耗的存储量。当网格通过网络或从 CPU 传输到图形系统时,它们会消耗带宽,而带宽往往比存储空间更宝贵。在对网格执行操作的应用程序中,除了简单地存储和绘制网格(例如细分、网格编辑、网格压缩或其他操作)之外,高效访问邻接信息也至关重要。
Triangle meshes are generally used to represent surfaces, so a mesh is not just a collection of unrelated triangles, but rather a network of triangles that connect to one another through shared vertices and edges to form a single continuous surface. This is a key insight about meshes: a mesh can be handled more efficiently than a collection of the same number of unrelated triangles.
三角形网格通常用于表示表面,因此网格不仅仅是一组不相关的三角形,而是一个三角形网络,这些三角形通过共享的顶点和边相互连接,形成一个连续的表面。这是关于网格的一个关键见解:与相同数量的不相关三角形的集合相比,网格的处理效率更高。
The minimum information required for a triangle mesh is a set of triangles (triples of vertices) and the positions (in 3D space) of their vertices. But many, if not most, programs require the ability to store additional data at the vertices, edges, or faces to support texture mapping, shading, animation, and other operations. Vertex data are the most common: each vertex can have material parameters, texture coordinates, and irradiances—any parameters whose values change across the surface. These parameters are then linearly interpolated across each triangle to define a continuous function over the whole surface of the mesh. However, it is also occasionally important to be able to store data per edge or per face.
三角形网格所需的最少信息是一组三角形(顶点的三元组)及其顶点的位置(在 3D 空间中)。但许多(如果不是大多数)程序都需要能够在顶点、边或面上存储其他数据,以支持纹理映射、着色、动画和其他操作。顶点数据是最常见的:每个顶点可以具有材料参数、纹理坐标和辐照度 - 任何其值在整个表面上发生变化的参数。然后,这些参数在每个三角形上进行线性插值,以定义整个网格表面上的连续函数。但是,能够按边或按面存储数据有时也很重要。
The idea that meshes are surface-like can be formalized as constraints on the mesh topology—the way the triangles connect together, without regard for the vertex positions. Many algorithms will only work, or are much easier to implement, on a mesh with predictable connectivity. The simplest and most restrictive requirement on the topology of a mesh is for the surface to be a manifold. A manifold mesh is “watertight”—it has no gaps and separates the space on the inside of the surface from the space outside. It also looks like a surface everywhere on the mesh.
网格类似于表面这一概念可以形式化为对网格拓扑的约束——三角形连接在一起的方式,而不考虑顶点位置。许多算法只在具有可预测连通性的网格上有效,或者更容易实现。对网格拓扑最简单、最严格的要求是表面必须是流形。流形网格是“水密的”——它没有间隙,并将表面内部的空间与外部空间分开。它在网格上的任何地方看起来都像一个表面。
We’ll leave the precise definitions to the mathematicians; see the chapter notes.
我们将把精确的定义留给数学家;请参阅章节注释。
The term manifold comes from the mathematical field of topology: roughly speaking, a manifold (specifically a two-dimensional manifold, or 2-manifold) is a surface in which a small neighborhood around any point could be smoothed out into a bit of flat surface. This idea is most clearly explained by counterexample: if an edge on a mesh has three triangles connected to it, the neighborhood of a point on the edge is different from the neighborhood of one of the points in the interior of one of the triangles, because it has an extra “fin” sticking out of it (Figure 12.1). If the edge has exactly two triangles attached to it, points on the edge have neighborhoods just like points in the interior, only with a crease down the middle. Similarly, if the triangles sharing a vertex are in a configuration like the left one in Figure 12.2, the neighborhood is like two pieces of surface glued together at the center, which can’t be flattened without doubling it up. The vertex with the simpler neighborhood shown at right is just fine.
术语流形来自数学领域拓扑学:粗略地说,流形(特别是二维流形或 2 流形)是一种曲面,其中任意点周围的小邻域可以被平滑成一个平面。这个想法通过反例可以最清楚地解释:如果网格上的一条边连接到三个三角形,则边上点的邻域与其中一个三角形内部点的邻域不同,因为它多了一个伸出的“鳍”(图 12.1 )。如果边上恰好有两个三角形连接到它,则边上的点的邻域就像内部的点一样,只是中间有一条折痕。类似地,如果共享一个顶点的三角形的配置类似于图 12.2中左边的配置,则邻域就像两块在中心粘在一起的曲面,不将其对折就无法将其压平。右侧显示的具有更简单邻域的顶点就很好了。
Figure 12.1. Non-manifold (a) and manifold (b) interior edges.
图 12.1.非流形(a)和流形(b)的内部边缘。
Figure 12.2. Non-manifold (a) and manifold (b) interior vertices.
图 12.2.非流形(a)和流形(b)的内部顶点。
Many algorithms assume that meshes are manifold, and it’s always a good idea to verify this property to prevent crashes or infinite loops if you are handed a malformed mesh as input. This verification boils down to checking that all edges are manifold and checking that all vertices are manifold by verifying the following conditions:
许多算法都假设网格是流形的,因此,如果您收到格式错误的网格作为输入,最好验证此属性以防止崩溃或无限循环。此验证归结为通过验证以下条件来检查所有边是否都是流形的以及检查所有顶点是否都是流形的:
Every edge is shared by exactly two triangles.
每条边恰好由两个三角形共享。
Every vertex has a single, complete loop of triangles around it.
每个顶点周围都有一个完整的三角形环。
Figure 12.1 illustrates how an edge can fail the first test by having too many triangles, and Figure 12.2 illustrates how a vertex can fail the second test by having two separate loops of triangles attached to it.
图 12.1说明了由于某条边上有太多三角形,所以它无法通过第一次测试;图 12.2说明了由于某个顶点上附着两个独立的三角形环,所以它无法通过第二次测试。
Manifold meshes are convenient, but sometimes, it’s necessary to allow meshes to have edges or boundaries. Such meshes are not manifolds—a point on the boundary has a neighborhood that is cut off on one side. They are not necessarily watertight. However, we can relax the requirements of a manifold mesh to those for a manifold with boundary without causing problems for most mesh processing algorithms. The relaxed conditions are
流形网格很方便,但有时,有必要允许网格具有边或边界。这样的网格不是流形——边界上的点具有一侧被切断的邻域。它们不一定是无懈可击的。但是,我们可以将流形网格的要求放宽到具有边界的流形的要求,而不会给大多数网格处理算法带来问题。放宽条件是
Every edge is used by either one or two triangles.
每条边由一个或两个三角形使用。
Every vertex connects to a single edge-connected set of triangles.
每个顶点都连接到一组边连接的三角形。
Figure 12.3 illustrates these conditions: from left to right, there is an edge with one triangle, a vertex whose neighboring triangles are in a single edge-connected set, and a vertex with two disconnected sets of triangles attached to it.
图 12.3说明了这些情况:从左到右,有一条边包含一个三角形,一个顶点的邻近三角形位于一个边连通集中,一个顶点上附着有两组不连通的三角形。
Finally, in many applications it’s important to be able to distinguish the “front” or “outside” of a surface from the “back” or “inside”—this is known as the orientation of the surface. For a single triangle, we define orientation based on the order in which the vertices are listed: the front is the side from which the triangle’s three vertices are arranged in counterclockwise order. A connected mesh is consistently oriented if its triangles all agree on which side is the front—and this is true if and only if every pair of adjacent triangles is consistently oriented.
最后,在许多应用中,能够区分表面的“正面”或“外侧”与“背面”或“内侧”非常重要——这被称为表面的方向。对于单个三角形,我们根据顶点的排列顺序来定义方向:正面是三角形的三个顶点按逆时针顺序排列的一侧。如果连通网格的所有三角形都一致同意哪一侧是正面,则连通网格的方向一致——当且仅当每对相邻三角形的方向一致时,情况才如此。
Figure 12.3. Conditions at the edge of a manifold with boundary.
图 12.3.有边界的流形的边缘条件。
In a consistently oriented pair of triangles, the two shared vertices appear in opposite orders in the two triangles’ vertex lists (Figure 12.4). What’s important is consistency of orientation—some systems define the front using clockwise rather than counterclockwise order.
在方向一致的三角形对中,两个共享顶点在两个三角形的顶点列表中以相反的顺序出现(图 12.4 )。方向的一致性很重要——有些系统使用顺时针而不是逆时针顺序来定义正面。
Figure 12.4. Triangles (B,A,C) and (D,C,A) are consistently oriented, whereas (B,A,C) and (A,C,D) are inconsistently oriented.
图 12.4三角形 (B,A,C) 和 (D,C,A) 的方向一致,而 (B,A,C) 和 (A,C,D) 的方向不一致。
Any mesh that has non-manifold edges can’t be oriented consistently. But it’s also possible for a mesh to be a valid manifold with boundary (or even a manifold) and yet have no consistent way to orient the triangles—they are not orientable surfaces. An example is the Möbius band shown in Figure 12.5. This is rarely an issue in practice, however.
任何具有非流形边缘的网格都无法一致地定向。但是,网格也可能是具有边界的有效流形(甚至是流形),但没有一致的方式来定向三角形 - 它们不是可定向表面。一个例子是图 12.5中所示的莫比乌斯带。然而,这在实践中很少成为问题。
Figure 12.5. A triangulated Möbius band, which is not orientable.
图 12.5.不可定向的三角形莫比乌斯带。
A simple triangular mesh is shown in Figure 12.6. You could store these three triangles as independent entities, each of this form:
图 12.6显示了一个简单的三角形网格。您可以将这三个三角形存储为独立的实体,每个实体的形式如下:
Triangle { vector3 vertexPosition[3] }
Figure 12.6. A three-triangle mesh with four vertices, represented with separate triangles (left) and with shared vertices (right).
图 12.6.具有四个顶点的三角网格,用单独的三角形表示(左),用共享顶点表示(右)。
This would result in storing vertex b three times and the other vertices twice each for a total of nine stored points (three vertices for each of three triangles). Or you could instead arrange to share the common vertices and store only four, resulting in a shared-vertex mesh. Logically, this data structure has triangles which point to vertices which contain the vertex data (Figure 12.7):
这将导致顶点b存储三次,其他顶点各存储两次,总共存储九个点(三个三角形各三个顶点)。或者,您可以安排共享公共顶点并仅存储四个,从而产生共享顶点网格。从逻辑上讲,此数据结构具有指向包含顶点数据的顶点的三角形(图 12.7 ):
Triangle { Vertex v[3] } Vertex { vector3 position // or other vertex data }
Figure 12.7. The triangle-to-vertex references in a shared-vertex mesh.
图 12.7.共享顶点网格中的三角形到顶点的引用。
Note that the entries in the v array are references, or pointers, to Vertex objects; the vertices are not contained in the triangle.
请注意, v数组中的条目是Vertex对象的引用或指针;顶点不包含在三角形中。
In implementation, the vertices and triangles are normally stored in arrays, with the triangle-to-vertex references handled by storing array indices:
在实现中,顶点和三角形通常存储在数组中,通过存储数组索引来处理三角形到顶点的引用:
IndexedMesh { int tInd[nt][3] vector3 verts[nv] }
The index of the kth vertex of the ith triangle is found in tInd[i][k], and the position of that vertex is stored in the corresponding row of the verts array; see Figure 12.8 for an example. This way of storing a shared-vertex mesh is an indexed triangle mesh.
第 i 个三角形的第 k 个顶点的索引位于tInd[i][k]中,该顶点的位置存储在verts数组的相应行中;参见图 12.8中的示例。这种存储共享顶点网格的方式是索引三角形网格。
Figure 12.8. A larger triangle mesh, with part of its representation as an indexed triangle mesh.
图 12.8.较大的三角形网格,其部分表示为索引三角形网格。
Separate triangles or shared vertices will both work well. Is there a space advantage for sharing vertices? If our mesh has nv vertices and nt triangles, and if we assume that the data for floats, pointers, and ints all require the same storage (a dubious assumption), the space requirements are as follows:
单独的三角形或共享的顶点都可以很好地工作。共享顶点是否有空间优势?如果我们的网格有n v 个顶点和n t 个三角形,并且我们假设浮点数、指针和整数的数据都需要相同的存储空间(一个可疑的假设),则空间要求如下:
Triangle. Three vectors per triangle, for 9nt units of storage;
三角形。每个三角形三个向量,占用 9 n t个存储单元;
IndexedMesh. One vector per vertex and three ints per triangle, for 3nv + 3nt units of storage.
IndexedMesh。每个顶点一个向量,每个三角形三个整数,共计 3 n v + 3 n t个存储单位。
The relative storage requirements depend on the ratio of nt to nv.
相对存储要求取决于n t与n v的比率。
Is this factor of two worth the complication? I think the answer is yes, and it becomes an even bigger win as soon as you start adding “properties” to the vertices.
这两个因素是否值得这么复杂?我认为答案是肯定的,而且一旦你开始为顶点添加“属性”,它就会成为更大的胜利。
As a rule of thumb, a large mesh has each vertex connected to about six triangles (although there can be any number for extreme cases). Since each triangle connects to three vertices, this means that there are generally twice as many triangles as vertices in a large mesh: nt ≈ 2nv. Making this substitution, we can conclude that the storage requirements are 18nv for the Triangle structure and 9nv for IndexedMesh. Using shared vertices reduces storage requirements by about a factor of two, and this seems to hold in practice for most implementations.
根据经验,大型网格的每个顶点都连接到大约六个三角形(尽管在极端情况下可以有任意数量的三角形)。由于每个三角形连接到三个顶点,这意味着大型网格中的三角形数量通常是顶点数量的两倍: nt≈2nv 。进行此替换后,我们可以得出结论, Triangle结构的存储要求为18nv , IndexedMesh的存储要求为9nv 。使用共享顶点可将存储要求降低约一半,这似乎在大多数实现中都适用。
Indexed meshes are the most common in-memory representation of triangle meshes, because they achieve a good balance of simplicity, convenience, and compactness. They are also commonly used to transfer meshes over networks and between the application and graphics pipeline. In applications where even more compactness is desirable, the triangle vertex indices (which take up two-thirds of the space in an indexed mesh with only positions at the vertices) can be expressed more efficiently using triangle strips and triangle fans.
索引网格是三角形网格最常见的内存表示形式,因为它们在简单性、便利性和紧凑性之间实现了良好的平衡。它们还常用于通过网络以及在应用程序和图形管道之间传输网格。在需要更紧凑的应用程序中,三角形顶点索引(在仅包含顶点位置的索引网格中,它们占据了三分之二的空间)可以使用三角形条带和三角形扇形更有效地表示。
A triangle fan is shown in Figure 12.9. In an indexed mesh, the triangles array would contain [(0, 1, 2), (0, 2, 3), (0, 3, 4), (0, 4, 5)]. We are storing 12 vertex indices, although there are only six distinct vertices. In a triangle fan, all the triangles share one common vertex, and the other vertices generate a set of triangles like the vanes of a collapsible fan. The fan in the figure could be specified with the sequence [0, 1, 2, 3, 4, 5]: the first vertex establishes the center, and subsequently each pair of adjacent vertices (1-2, 2-3, etc.) creates a triangle.
图 12.9显示了三角扇。在索引网格中,三角形数组将包含 [(0, 1, 2), (0, 2, 3), (0, 3, 4), (0, 4, 5)]。我们存储了 12 个顶点索引,尽管只有六个不同的顶点。在三角扇中,所有三角形共享一个公共顶点,其他顶点生成一组三角形,就像可折叠风扇的叶片一样。图中的扇形可以用 [0, 1, 2, 3, 4, 5] 序列指定:第一个顶点确定中心,随后每对相邻顶点(1-2、2-3 等)创建一个三角形。
Figure 12.9. A triangle fan.
图 12.9。三角形扇形。
The triangle strip is a similar concept, but it is useful for a wider range of meshes. Here, vertices are added alternating top and bottom in a linear strip as shown in Figure 12.10. The triangle strip in the figure could be specified by the sequence [0 1 2 3 4 5 6 7], and every subsequence of three adjacent vertices (0-1-2, 1-2-3, etc.) creates a triangle. For consistent orientation, every other triangle needs to have its order reversed. In the example, this results in the triangles (0, 1, 2), (2, 1, 3), (2, 3, 4), (4, 3, 5), etc. For each new vertex that comes in, the oldest vertex is forgotten and the order of the two remaining vertices is swapped. See Figure 12.11 for a larger example.
三角形带是一个类似的概念,但它对更广泛的网格有用。在这里,顶点以线性带的形式交替从顶部和底部添加,如图 12.10所示。图中的三角形带可以通过序列 [0 1 2 3 4 5 6 7] 指定,并且三个相邻顶点(0-1-2、1-2-3 等)的每个子序列都会创建一个三角形。为了保持一致的方向,每个其他三角形都需要反转其顺序。在该示例中,这会产生三角形 (0, 1, 2)、(2, 1, 3)、(2, 3, 4)、(4, 3, 5) 等。对于每个进入的新顶点,最旧的顶点会被遗忘,并且剩余两个顶点的顺序会进行交换。有关更大的示例,请参见图 12.11 。
Figure 12.10. A triangle strip.
图 12.10。三角形条带。
Figure 12.11. Two triangle strips in the context of a larger mesh. Note that neither strip can be extended to include the triangle marked with an asterisk.
图 12.11。较大网格中的两个三角形条带。请注意,两个条带都无法扩展以包含标有星号的三角形。
In both strips and fans, n + 2 vertices suffice to describe n triangles—a substantial savings over the 3n vertices required by a standard indexed mesh. Long triangle strips will save approximately a factor of three if the program is vertex-bound.
在条带和扇形中,n + 2 个顶点足以描述 n 个三角形 — 与标准索引网格所需的 3n 个顶点相比,节省了大量顶点。如果程序受顶点限制,长三角形条带将节省大约三倍。
It might seem that triangle strips are only useful if the strips are very long, but even relatively short strips already gain most of the benefits. The savings in storage space (for only the vertex indices) are as follows:
三角形条带似乎只有在条带很长时才有用,但即使是相对较短的条带也已经获得了大部分好处。存储空间节省(仅针对顶点索引)如下:
strip length |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
16 |
100 |
∞ |
relative size |
1.00 |
0.67 |
0.56 |
0.50 |
0.47 |
0.44 |
0.43 |
0.42 |
0.38 |
0.34 |
0.33 |
So, in fact, there is a rather rapid diminishing return as the strips grow longer. Thus, even for an unstructured mesh, it is worthwhile to use some greedy algorithm to gather them into short strips.
因此,实际上,随着条带变长,收益会迅速递减。因此,即使对于非结构化网格,也值得使用某种贪婪算法将它们聚集成短条带。
Indexed meshes, strips, and fans are all good, compact representations for static meshes. However, they do not readily allow for meshes to be modified. In order to efficiently edit meshes, more complicated data structures are needed to efficiently answer queries such as
索引网格、条带和扇形都是静态网格的良好、紧凑表示。但是,它们并不容易修改网格。为了有效地编辑网格,需要更复杂的数据结构来有效地回答查询,例如
Given a triangle, what are the three adjacent triangles?
给定一个三角形,三个相邻的三角形是什么?
Given an edge, which two triangles share it?
给定一条边,哪两个三角形共享它?
Given a vertex, which faces share it?
给定一个顶点,哪些面共享它?
Given a vertex, which edges share it?
给定一个顶点,哪些边共享它?
There are many data structures for triangle meshes, polygonal meshes, and polygonal meshes with holes (see the notes at the end of this chapter for references). In many applications, the meshes are very large, so an efficient representation can be crucial.
三角形网格、多边形网格和带孔的多边形网格有许多数据结构(请参阅本章末尾的注释以获取参考)。在许多应用中,网格非常大,因此有效的表示至关重要。
The most straightforward, though bloated, implementation would be to have three types, Vertex, Edge, and Triangle, and to just store all the relationships directly:
最直接但臃肿的实现是有三种类型, Vertex , Edge和Triangle ,并直接存储所有关系:
Triangle { Vertex v[3] Edge e[3] } Edge { Vertex v[2] Triangle t[2] } Vertex { Triangle t[] Edge e[] }
This lets us directly look up answers to the connectivity questions above, but because this information is all inter-related, it stores more than is really needed. Also, storing connectivity in vertices makes for variable-length data structures (since vertices can have arbitrary numbers of neighbors), which are generally less efficient to implement. Rather than committing to store all these relationships explicitly, it is best to define a class interface to answer these questions, behind which a more efficient data structure can hide. It turns out we can store only some of the connectivity and efficiently recover the other information when needed.
这样我们就可以直接查找上述连通性问题的答案,但由于这些信息都是相互关联的,因此存储的信息比实际需要的多。此外,将连通性存储在顶点中会产生可变长度的数据结构(因为顶点可以有任意数量的邻居),这通常实施起来效率较低。与其明确地存储所有这些关系,不如定义一个类接口来回答这些问题,在其背后可以隐藏更高效的数据结构。事实证明,我们可以只存储部分连通性,并在需要时有效地恢复其他信息。
The fixed-size arrays in the Edge and Triangle classes suggest that it will be more efficient to store the connectivity information there. In fact, for polygon meshes, in which polygons have arbitrary numbers of edges and vertices, only edges have fixed-size connectivity information, which leads to many traditional mesh data structures being based on edges. But for triangle-only meshes, storing connectivity in the (less numerous) faces is appealing.
Edge和Triangle类中的固定大小数组表明将连接信息存储在那里会更有效。事实上,对于多边形网格,其中多边形具有任意数量的边和顶点,只有边具有固定大小的连接信息,这导致许多传统网格数据结构基于边。但对于只有三角形的网格,将连接存储在(数量较少的)面中很有吸引力。
A good mesh data structure should be reasonably compact and allow efficient answers to all adjacency queries. Efficient means constant-time: the time to find neighbors should not depend on the size of the mesh. We’ll look at three data structures for meshes, one based on triangles and two based on edges.
良好的网格数据结构应合理紧凑,并允许高效地回答所有邻接查询。高效意味着恒定时间:查找邻居的时间不应取决于网格的大小。我们将研究网格的三种数据结构,一种基于三角形,两种基于边。
We can create a compact mesh data structure based on triangles by augmenting the basic shared-vertex mesh with pointers from the triangles to the three neighboring triangles, and a pointer from each vertex to one of the adjacent triangles (it doesn’t matter which one); see Figure 12.12:
我们可以创建一个基于三角形的紧凑网格数据结构,通过在基本共享顶点网格中添加从三角形指向三个相邻三角形的指针,以及从每个顶点指向其中一个相邻三角形(哪一个都行)的指针;见图12.12 :
Triangle { Triangle nbr[3]; Vertex v[3]; } Vertex { // ... per-vertex data ... Triangle t; // any adjacent tri }
Figure 12.12. The references between triangles and vertices in the triangle-neighbor structure.
图 12.12.三角形邻域结构中三角形和顶点之间的引用。
In the array Triangle.nbr, the kth entry points to the neighboring triangle that shares vertices k and k + 1. We call this structure the triangle-neighbor structure. Starting from standard indexed mesh arrays, it can be implemented with two additional arrays: one that stores the three neighbors of each triangle and one that stores a single neighboring triangle for each vertex (see Figure 12.13 for an example):
在数组Triangle.nbr中,第 k 个条目指向共享顶点 k 和 k + 1 的邻近三角形。我们将此结构称为三角形邻居结构。从标准索引网格数组开始,它可以用两个附加数组来实现:一个数组存储每个三角形的三个邻居,另一个数组存储每个顶点的一个邻近三角形(参见图 12.13中的示例):
Mesh { // ... per-vertex data ... int tInd[nt][3]; // vertex indices int tNbr[nt][3]; // indices of neighbor triangles int vTri[nv]; // index of any adjacent triangle }
Figure 12.13. The triangle-neighbor structure as encoded in arrays, and the sequence that is followed in traversing the neighboring triangles of vertex 2.
图 12.13.以数组形式编码的三角形邻域结构,以及遍历顶点 2 的邻域三角形时所遵循的顺序。
Clearly, the neighboring triangles and vertices of a triangle can be found directly in the data structure, but by using this triangle adjacency information carefully, it is also possible to answer connectivity queries about vertices in constant time. The idea is to move from triangle to triangle, visiting only the triangles adjacent to the relevant vertex. If triangle t has vertex v as its kth vertex, then the triangle t.nbr[k] is the next triangle around v in the clockwise direction. This observation leads to the following algorithm to traverse all the triangles adjacent to a given vertex:
显然,可以直接在数据结构中找到三角形的相邻三角形和顶点,但通过仔细使用此三角形邻接信息,也可以在恒定时间内回答有关顶点的连接查询。这个想法是从一个三角形移动到另一个三角形,只访问与相关顶点相邻的三角形。如果三角形 t 的第 k 个顶点是顶点 v,那么三角形t.nbr[ k ]就是顺时针方向围绕 v 的下一个三角形。这一观察结果导致了以下算法来遍历与给定顶点相邻的所有三角形:
Of course, a real program would do something with the triangles as it found them.
当然,真正的程序在找到三角形后会对它们做一些事情。
TrianglesOfVertex(v) { t = v.t do { find i such that (t.v[i] == v) t = t.nbr[i] } while (t != v.t) }
This operation finds each subsequent triangle in constant time—even though a search is required to find the position of the central vertex in each triangle’s vertex list, the vertex lists have constant size so the search takes constant time. However, that search is awkward and requires extra branching.
此操作可在常数时间内找到每个后续三角形——尽管需要搜索才能找到每个三角形顶点列表中中心顶点的位置,但顶点列表的大小是恒定的,因此搜索需要常数时间。然而,这种搜索很麻烦,需要额外的分支。
A small refinement can avoid these searches. The problem is that once we follow a pointer from one triangle to the next, we don’t know from which way we came: we have to search the triangle’s vertices to find the vertex that connects back to the previous triangle. To solve this, instead of storing pointers to neighboring triangles, we can store pointers to specific edges of those triangles by storing an index with the pointer:
稍加改进即可避免这些搜索。问题是,一旦我们跟随指针从一个三角形到达下一个三角形,我们就不知道从哪里来的了:我们必须搜索三角形的顶点来找到连接回前一个三角形的顶点。为了解决这个问题,我们可以通过在指针中存储索引来存储指向这些三角形特定边的指针,而不是存储指向相邻三角形的指针:
Triangle { Edge nbr[3]; Vertex v[3]; } Edge { // the i-th edge of triangle t Triangle t; int i; // in {0,1,2} } Vertex { // ... per-vertex data ... Edge e; // any edge leaving vertex }
In practice, the Edge is stored by borrowing two bits of storage from the triangle index t to store the edge index i, so that the total storage requirements remain the same.
实际上,通过从三角形索引t中借用两位存储空间来存储边索引i来存储边,使得总存储需求保持不变。
In this structure, the neighbor array for a triangle tells which of the neighboring triangles’ edges are shared with the three edges of that triangle. With this extra information, we always know where to find the original triangle, which leads to an invariant of the data structure: for any jth edge of any triangle t,
在这个结构中,三角形的邻居数组会告诉相邻三角形的哪条边与该三角形的三条边共享。有了这些额外的信息,我们总是知道在哪里可以找到原始三角形,这导致了数据结构的不变量:对于任何三角形t的任何第 j 条边,
t.nbr[j].t.nbr[t.nbr[j].i].t ==t.
Knowing which edge we came in through lets us know immediately which edge to leave through in order to continue traversing around a vertex, leading to a streamlined algorithm:
知道我们从哪条边进来,让我们立即知道要从哪条边离开,以便继续遍历顶点,从而产生一个简化的算法:
TrianglesOfVertex(v) { {t, i} = v.e; do { {t, i} = t.nbr[i]; i = (i+1) mod 3; } while (t != v.e.t); }
The triangle-neighbor structure is quite compact. For a mesh with only vertex positions, we are storing four numbers (three coordinates and an edge) per vertex and six (three vertex indices and three edges) per face, for a total of 4nv + 6nt ≈ 16nv units of storage per vertex, compared with 9nv for the basic indexed mesh.
三角形邻域结构非常紧凑。对于只有顶点位置的网格,我们为每个顶点存储四个数字(三个坐标和一个边),为每个面存储六个数字(三个顶点索引和三个边),每个顶点总共有 4 n v + 6 n t ≈ 16 n v个存储单元,而基本索引网格则有 9n v 个存储单元。
The triangle neighbor structure as presented here works only for manifold meshes, because it depends on returning to the starting triangle to terminate the traversal of a vertex’s neighbors, which will not happen at a boundary vertex that doesn’t have a full cycle of triangles. However, it is not difficult to generalize it to manifolds with boundary, by introducing a suitable sentinel value (such as - 1) for the neighbors of boundary triangles and taking care that the boundary vertices point to the most counterclockwise neighboring triangle, rather than to any arbitrary triangle.
这里介绍的三角形邻居结构仅适用于流形网格,因为它依赖于返回起始三角形来终止顶点邻居的遍历,而这不会发生在没有完整三角形循环的边界顶点上。但是,将其推广到具有边界的流形并不困难,只需为边界三角形的邻居引入合适的标记值(例如 -1)并注意边界顶点指向最逆时针的相邻三角形,而不是指向任何任意三角形。
One widely used mesh data structure that stores connectivity information at the edges instead of the faces is the winged-edge data structure. This data structure makes edges the first-class citizen of the data structure, as illustrated in Figures 12.14 and 12.15.
一种广泛使用的网格数据结构是翼边数据结构,它将连接信息存储在边而不是面上。这种数据结构将边作为数据结构的一等公民,如图 12.14和12.15所示。
Figure 12.14. An example of a winged-edge mesh structure, stored in arrays.
图 12.14.翼边网格结构的示例,存储在数组中。
Figure 12.15. A tetrahedron and the associated elements for a winged-edge data structure. The two small tables are not unique; each vertex and face stores any one of the edges with which it is associated.
图 12.15.四面体和翼边数据结构的相关元素。这两个小表并不唯一;每个顶点和面都存储与其关联的任意一条边。
In a winged-edge mesh, each edge stores pointers to the two vertices it connects (the head and tail vertices), the two faces it is part of (the left and right faces), and, most importantly, the next and previous edges in the counterclockwise traversal of its left and right faces (Figure 12.16). Each vertex and face also stores a pointer to a single, arbitrary edge that connects to it:
在翼边网格中,每条边都存储指向它连接的两个顶点(头顶点和尾顶点)、它所属的两个面(左面和右面)的指针,以及最重要的是,在逆时针遍历其左面和右面时,存储指向下一条边和上一条边的指针(图 12.16 )。每个顶点和面还存储指向与其连接的单个任意边的指针:
Edge { Edge lprev, lnext, rprev, rnext; Vertex head, tail; Face left, right; } Face { // ... per-face data ... Edge e; // any adjacent edge } Vertex { // ... per-vertex data ... Edge e; // any incident edge }
Figure 12.16. The references from an edge to the neighboring edges, faces, and vertices in the winged-edge structure.
图 12.16.翼边结构中一条边到相邻边、面和顶点的引用。
The winged-edge data structure supports constant-time access to the edges of a face or of a vertex, and from those edges the adjoining vertices or faces can be found:
翼边数据结构支持对面或顶点的边的恒定时间访问,并且可以从这些边找到相邻的顶点或面:
EdgesOfVertex(v) { e = v.e; do { if (e.tail == v) e = e.lprev; else e = e.rprev; } while (e != v.e); } EdgesOfFace(f) { e = f.e; do { if (e.left == f) e = e.lnext; else e = e.rnext; } while (e != f.e); }
These same algorithms and data structures will work equally well in a polygon mesh that isn’t limited to triangles; this is one important advantage of edge-based structures.
这些相同的算法和数据结构在不限于三角形的多边形网格中同样有效;这是基于边的结构的一个重要优点。
As with any data structure, the winged-edge data structure makes a variety of time/space tradeoffs. For example, we can eliminate the prev references. This makes it more difficult to traverse clockwise around faces or counterclockwise around vertices, but when we need to know the previous edge, we can always follow the successor edges in a circle until we get back to the original edge. This saves space, but it makes some operations slower. (See the chapter notes for more information on these tradeoffs).
与任何数据结构一样,翼边数据结构会做出各种时间/空间权衡。例如,我们可以消除上一个引用。这使得顺时针遍历面或逆时针遍历顶点变得更加困难,但是当我们需要知道前一个边时,我们总是可以沿着后继边绕一圈,直到回到原始边。这节省了空间,但会使某些操作变慢。(有关这些权衡的更多信息,请参阅章节注释)。
The winged-edge structure is quite elegant, but it has one remaining awkward- ness—the need to constantly check which way the edge is oriented before moving to the next edge. This check is directly analogous to the search we saw in the basic version of the triangle neighbor structure: we are looking to find out whether we entered the present edge from the head or from the tail. The solution is also almost indistinguishable: rather than storing data for each edge, we store data for each half-edge. There is one half-edge for each of the two triangles that share an edge, and the two half-edges are oriented oppositely, each oriented consistently with its own triangle.
翼边结构非常优雅,但它还有一个尴尬之处——在移动到下一条边之前,需要不断检查边的方向。这种检查与我们在三角形邻居结构的基本版本中看到的搜索完全类似:我们正在寻找从头部还是尾部进入当前边的方法。解决方案也几乎没有区别:我们不是存储每条边的数据,而是存储每条半边的数据。两个共享边的三角形各有一条半边,这两条半边的方向相反,每条半边的方向都与自己的三角形一致。
The data normally stored in an edge are split between the two half-edges. Each half-edge points to the face on its side of the edge and to the vertex at its head, and each contains the edge pointers for its face (Figure 12.17). It also points to its neighbor on the other side of the edge, from which the other half of the information can be found. Like the winged-edge, a half-edge can contain pointers to both the previous and next half-edges around its face, or only to the next half-edge. We’ll show the example that uses a single pointer.
通常存储在边中的数据被分成两个半边。每个半边指向其边一侧的面和其头部的顶点,并且每个半边都包含其面的边指针(图 12.17 )。它还指向边另一侧的邻居,从中可以找到另一半信息。与翼边一样,半边可以包含指向其面周围的上一个和下一个半边的指针,或者只包含指向下一个半边的指针。我们将展示使用单个指针的示例。
Figure 12.17. The references from a half-edge to its neighboring mesh components.
图 12.17.从半边到其相邻网格组件的引用。
HEdge { HEdge pair, next; Vertex v; Face f; } Face { // ... per-face data ... HEdge h; // any h-edge of this face } Vertex { // ... per-vertex data ... HEdge h; // any h-edge pointing toward this vertex }
Traversing a half-edge structure is just like traversing a winged-edge structure except that we no longer need to check orientation, and we follow the pair pointer to access the edges in the opposite face.
遍历半边结构就像遍历翼边结构一样,只是我们不再需要检查方向,并且我们按照对指针来访问相对面上的边。
EdgesOfVertex(v) { h = v.h; do { h = h.pair.next; } while (h != v.h); } EdgesOfFace(f) { h = f.h; do { h = h.next; } while (h != f.h); }
The vertex traversal here is clockwise, which is necessary because of omitting the prev pointer from the structure.
这里的顶点遍历是顺时针的,这是必要的,因为从结构中省略了prev指针。
Because half-edges are generally allocated in pairs (at least in a mesh with no boundaries), many implementations can do away with the pair pointers. For instance, in an implementation based on array indexing (such as shown in Figure 12.18), the array can be arranged so that an even-numbered edge i always pairs with edge i + 1 and an odd-numbered edge j always pairs with edge j - 1.
由于半边通常成对分配(至少在没有边界的网格中),因此许多实现可以取消对指针。例如,在基于数组索引的实现中(如图 12.18所示),可以安排数组,使得偶数边 i 始终与边 i + 1 配对,奇数边 j 始终与边 j - 1 配对。
Figure 12.18. An example of a half-edge mesh structure, stored in arrays.
图 12.18.半边网格结构的示例,存储在数组中。
In addition to the simple traversal algorithms shown in this chapter, all three of these mesh topology structures can support “mesh surgery” operations of various sorts, such as splitting or collapsing vertices, swapping edges, adding, or removing triangles.
除了本章中展示的简单遍历算法之外,所有这三种网格拓扑结构都可以支持各种“网格手术”操作,例如分割或折叠顶点、交换边、添加或删除三角形。
A triangle mesh manages a collection of triangles that constitute an object in a scene, but another universal problem in graphics applications is arranging the objects in the desired positions. As we saw in Chapter 7, this is done using transformations, but complex scenes can contain a great many transformations and organizing them well makes the scene much easier to manipulate. Most scenes admit to a hierarchical organization, and the transformations can be managed according to this hierarchy using a scene graph.
三角形网格管理构成场景中对象的三角形集合,但图形应用程序中的另一个普遍问题是将对象排列在所需的位置。正如我们在第 7 章中看到的那样,这是使用变换来完成的,但复杂的场景可能包含大量变换,而组织好它们会使场景更容易操作。大多数场景都允许分层组织,并且可以使用场景图根据此层次结构来管理变换。
To motivate the scene-graph data structure, we will use the hinged pendulum shown in Figure 12.19. Consider how we would draw the top part of the pendulum:
为了阐明场景图数据结构,我们将使用图 12.19所示的铰链摆。考虑如何绘制摆的顶部:
Figure 12.19. A hinged pendulum. On the left are the two pieces in their “local” coordinate systems. The hinge of the bottom piece is at point b, and the attachment for the bottom piece is at its local origin. The degrees of freedom for the assembled object are the angles (θ,ϕ) and the location p of the top hinge.
图 12.19。铰链摆。左侧是两个部件在其“局部”坐标系中的位置。底部部件的铰链位于点b ,底部部件的附件位于其局部原点。组装物体的自由度是角度 ( θ,φ ) 和顶部铰链的位置p 。
M1 = rotate(θ)
M 1 = 旋转( θ )
M2 = translate(p)
M2 =翻译(p )
M3 = M2M1
M3 = M2M1
Apply M3 to all points in upper pendulum
将M 3应用于上摆的所有点
The bottom is more complicated, but we can take advantage of the fact that it is attached to the bottom of the upper pendulum at point b in the local coordinate system. First, we rotate the lower pendulum so that it is at an angle ϕ relative to its initial position. Then, we move it so that its top hinge is at point b. Now it is at the appropriate position in the local coordinates of the upper pendulum, and it can then be moved along with that coordinate system. The composite transform for the lower pendulum is
底部更复杂,但我们可以利用它附着在局部坐标系中点b处的上摆底部这一事实。首先,我们旋转下摆,使其相对于其初始位置成φ角。然后,我们移动它,使其顶部铰链位于点b处。现在它位于上摆局部坐标中的适当位置,然后可以沿着该坐标系移动。下摆的复合变换为
Ma = rotate(ϕ)
M a = 旋转( ϕ )
Mb = translate(b)
M b = 翻译( b )
Mc = MbMa
Mc = MbMa
Md = M3Mc
Md = M3Mc
Apply Md to all points in lower pendulum
将M d应用于下摆的所有点
Thus, we see not onyl that the lower pendulum lives in its own local coordinate system, but also that coordinate system itself is moved along with that of the upper pendulum.
因此,我们不仅看到下摆位于其自己的局部坐标系中,而且坐标系本身也随着上摆的坐标系移动。
We can encode the pendulum in a data structure that makes management of these coordinate system issues easier, as shown in Figure 12.20. The appropriate matrix to apply to an object is just the product of all the matrices in the chain from the object to the root of the data structure. For example, consider the model of a ferry that has a car that can move freely on the deck of the ferry and wheels that each move relative to the car as shown in Figure 12.21.
我们可以将钟摆编码到数据结构中,这样可以更轻松地管理这些坐标系问题,如图 12.20所示。适用于对象的适当矩阵只是从对象到数据结构根的链中所有矩阵的乘积。例如,考虑渡轮模型,该模型具有可在渡轮甲板上自由移动的车厢,每个车轮都相对于车厢移动,如图 12.21所示。
Figure 12.20. The scene graph for the hinged pendulum of Figure 12.19.
图 12.20.图 12.19中的铰链摆的场景图。
Figure 12.21. A ferry, a car on the ferry, and the wheels of the car (only two shown) are stored in a scene graph.
图 12.21。一艘渡船、渡船上的一辆汽车以及汽车的车轮(仅显示两个)都存储在场景图中。
As with the pendulum, each object should be transformed by the product of the matrices in the path from the root to the object:
与钟摆一样,每个物体都应该通过从根到物体的路径中的矩阵乘积进行变换:
ferry transform using M0;
使用M 0进行渡轮变换;
car body transform using M0M1;
使用M 0 M 1进行车身变换;
left wheel transform using M0M1M2;
左轮变换使用M0M1M2 ;
left wheel transform using M0M1M3.
使用M 0 M 1 M 3进行左轮变换。
An efficient implementation in the case of rasterization can be achieved using a matrix stack, a data structure supported by many APIs. A matrix stack is manipulated using push and pop operations that add and delete matrices from the right-hand side of a matrix product. For example, calling
在光栅化的情况下,可以使用矩阵堆栈(许多 API 支持的数据结构)实现高效实现。矩阵堆栈使用推送和弹出操作进行操作,这些操作从矩阵乘积的右侧添加和删除矩阵。例如,调用
push(M0)
推( M 0 )
push(M1)
推( M1 )
push(M2)
推( M2 )
creates the active matrix M = M0M1M2. A subsequent call to pop() strips the last matrix added so that the active matrix becomes M = M0M1. Combining the matrix stack with a recursive traversal of a scene graph gives us
创建活动矩阵M = M 0 M 1 M 2 。随后调用pop()删除最后添加的矩阵,使活动矩阵变为M = M 0 M 1 。将矩阵堆栈与场景图的递归遍历相结合,我们可以得到
function traverse(node)
函数遍历(节点)
push(Mlocal)
推( M本地)
draw object using composite matrix from stack
使用堆栈中的复合矩阵绘制对象
traverse(left child)
遍历(左孩子)
traverse(right child)
遍历(右孩子)
pop()
流行音乐()
There are many variations on scene graphs but all follow the basic idea above.
场景图有很多变化但都遵循上述基本思想。
An elegant property of ray tracing is that it allows very natural application of transformations without changing the representation of the geometry. The basic idea of instancing is to distort all points on an object by a transformation matrix before the object is displayed. For example, if we transform the unit circle (in 2D) by a scale factor (2,1) in x and y, respectively, then rotate it by 45∘, and move one unit in the x-direction, the result is an ellipse with an eccentricity of 2 and a long axis along the (x = −y)-direction centered at (0,1) (Figure 12.22). The key thing that makes that entity an “instance” is that we store the circle and the composite transform matrix. Thus, the explicit construction of the ellipse is left as a future operation at render time.
光线追踪的一个巧妙特性是它允许非常自然地应用变换而不改变几何表示。实例化的基本思想是在显示对象之前通过变换矩阵扭曲对象上的所有点。例如,如果我们分别在x和y 方向上按比例因子 (2,1) 变换单位圆(在二维中),然后将其旋转 45度,并在x方向上移动一个单位,则结果是一个偏心率为 2 的椭圆,其长轴沿着( x = −y )方向,中心为 (0,1)(图 12.22 )。使该实体成为“实例”的关键是我们存储了圆和复合变换矩阵。因此,椭圆的显式构造留作渲染时的未来操作。
Figure 12.22. An instance of a circle with a series of three transforms is an ellipse.
图 12.22.经过一系列三个变换后的圆的实例是椭圆。
The advantage of instancing in ray tracing is that we can choose the space in which to do intersection. If the base object is composed of a set of points, one of which is p, then the transformed object is composed of that set of points transformed by matrix M, where the example point is transformed to Mp. If we have a ray a + tb that we want to intersect with the transformed object, we can instead intersect an inverse-transformed ray with the untransformed object (Figure 12.23). There are two potential advantages to computing in the untransformed space (i.e., the right-hand side of Figure 12.23):
射线追踪中实例化的优点在于,我们可以选择进行相交的空间。如果基础对象由一组点组成,其中一个是p ,则变换后的对象由经过矩阵M变换的该组点组成,其中示例点被变换为Mp 。如果我们有一条射线a + t b想要与变换后的对象相交,我们可以改为与逆变换后的射线与未变换的物体(图 12.23 )。在未变换空间中计算有两个潜在优势(即图 12.23的右侧):
The untransformed object may have a simpler intersection routine, e.g., a sphere versus an ellipsoid.
未变换的对象可能具有更简单的交叉程序,例如球体与椭圆体。
Many transformed objects can share the same untransformed object, thus reducing storage, e.g., a traffic jam of cars, where individual cars are just transforms of a few base (untransformed) models.
许多经过转换的对象可以共享相同的未转换对象,从而减少存储,例如,汽车交通堵塞,其中单个汽车只是一些基本(未转换)模型的转换。
Figure 12.23. The ray intersection problem in the two spaces is just simple transforms of each other. The object is specified as a sphere plus matrix M. The ray is specified in the transformed (world) space by location a and direction b.
图 12.23。两个空间中的射线相交问题只是彼此的简单变换。对象被指定为球体加矩阵M 。射线在变换后的(世界)空间中由位置 a 和方向b指定。
As discussed in Section 7.2.2, surface normal vectors transform differently. With this in mind and using the concepts illustrated in Figure 12.23, we can determine the intersection of a ray and an object transformed by matrix M. If we create an instance class of type surface, we need to create a hit function:
如第 7.2.2 节所述,表面法向量的变换方式不同。考虑到这一点,并使用图 12.23中所示的概念,我们可以确定射线与矩阵M变换的对象的交点。如果我们创建类型为surface的实例类,则需要创建一个hit函数:
instance::hit(ray a + tb, real t0, real t1, hit-record rec)
实例::hit(射线a + t b , 真实t 0 , 真实t 1 , 命中记录 rec)
ray r’ = M-1a + tM-1b
射线r '= M -1a + tM - 1b
if (base-object →hit(r′, t0, t1, rec)) then
如果(base-object →hit( r ′, t 0 , t 1 , rec))那么
rec.n = (M-1)Trec.n
记录n = ( M -1 ) T记录n
return true
返回true
else
别的
return false
返回false
An elegant thing about this function is that the parameter rec.t does not need to be changed, because it is the same in either space. Also note that we need not compute or store the matrix M.
此函数的一个巧妙之处在于参数 rec.t 无需更改,因为它在任一空间中都相同。另请注意,我们不需要计算或存储矩阵M 。
This brings up a very important point: the ray direction b must not be restricted to a unit-length vector, or none of the infrastructure above works. For this reason, it is useful not to restrict ray directions to unit vectors.
这引出了一个非常重要的观点:射线方向b不能限制为单位长度向量,否则上述所有基础结构都无法正常工作。因此,不将射线方向限制为单位向量很有用。
In many, if not all, graphics applications, the ability to quickly locate geometric objects in particular regions of space is important. Ray tracers need to find objects that intersect rays; interactive applications navigating an environment need to find the objects visible from any given viewpoint; games and physical simulations require detecting when and where objects collide. All these needs can be supported by various spatial data structures designed to organize objects in space so they can be looked up efficiently.
在许多(如果不是全部)图形应用程序中,快速定位特定空间区域中的几何对象的能力非常重要。光线追踪器需要找到与光线相交的物体;在环境中导航的交互式应用程序需要找到从任何给定视点可见的物体;游戏和物理模拟需要检测物体碰撞的时间和地点。所有这些需求都可以通过各种空间数据结构来支持,这些结构旨在组织空间中的对象,以便高效地查找它们。
In this section, we will discuss examples of three general classes of spatial data structures. Structures that group objects together into a hierarchy are object partitioning schemes: objects are divided into disjoint groups, but the groups may end up overlapping in space. Structures that divide space into disjoint regions are space partitioning schemes: space is divided into separate partitions, but one object may have to intersect more than one partition. Space partitioning schemes can be regular, in which space is divided into uniformly shaped pieces, or irregular, in which space is divided adaptively into irregular pieces, with smaller pieces where there are more and smaller objects.
在本节中,我们将讨论三类通用的空间数据结构的示例。将对象分组为层次结构的结构包括对象分割方案:对象被分成不相交的组,但这些组最终可能会在空间中重叠。将空间划分为不相交区域的结构是空间分割方案:空间被划分为单独的分区,但一个对象可能必须与多个分区相交。空间分割方案可以是规则的,其中空间被划分为均匀形状的部分,也可以是不规则的,其中空间被自适应地划分为不规则的部分,其中对象越多、越小,则部分越小。
We will use ray tracing as the primary motivation while discussing these structures, although they can all also be used for view culling or collision detection. In Chapter 4, all objects were looped over while checking for intersections. For N objects, this is an O(N) linear search and is thus slow for large scenes. Like most search problems, the ray-object intersection can be computed in sub-linear time using “divide and conquer” techniques, provided we can create an ordered data structure as a preprocess. There are many techniques to do this.
在讨论这些结构时,我们将使用光线追踪作为主要动机,尽管它们也可用于视图剔除或碰撞检测。在第 4 章中,所有对象在检查相交时都经过循环。对于 N 个对象,这是一个 O(N) 线性搜索,因此对于大型场景来说速度很慢。与大多数搜索问题一样,只要我们可以创建一个有序的数据结构作为预处理,就可以使用“分而治之”技术在亚线性时间内计算出光线与对象的相交。有很多技术可以做到这一点。
This section discusses three of these techniques in detail: bounding volume hierarchies (Rubin & Whitted, 1980; Whitted, 1980; Goldsmith & Salmon, 1987), uniform spatial subdivision (Cleary, Wyvill, Birtwistle, & Vatti, 1983; Fujimoto, Tanaka, & Iwata, 1986; Amanatides & Woo, 1987), and binary space partitioning (Glassner, 1984; Jansen, 1986; Havran, 2000). An example of the first two strategies is shown in Figure 12.24.
本节详细讨论了其中三种技术:边界体积层次结构(Rubin & Whitted,1980;Whitted,1980;Goldsmith & Salmon,1987)、均匀空间细分(Cleary、Wyvill、Birtwistle & Vatti,1983;Fujimoto、Tanaka & Iwata,1986;Amanatides & Woo,1987)和二元空间划分(Glassner,1984;Jansen,1986;Havran,2000)。图 12.24显示了前两种策略的示例。
Figure 12.24. (a) A uniform partitioning of space. (b) Adaptive bounding-box hierarchy. Image courtesy David DeMarle.
图 12.24。 (a)空间的均匀划分。(b)自适应边界框层次结构。图片由 David DeMarle 提供。
A key operation in most intersection-acceleration schemes is computing the intersection of a ray with a bounding box (Figure 12.25). This differs from conventional intersection tests in that we do not need to know where the ray hits the box; we only need to know whether it hits the box.
大多数相交加速方案中的一个关键操作是计算射线与边界框的交点(图 12.25 )。这与传统的相交测试不同,因为我们不需要知道射线在何处击中了边界框;我们只需要知道它是否击中了边界框。
Figure 12.25. The ray is only tested for intersection with the surfaces if it hits the bounding box.
图 12.25.只有当射线击中边界框时,才会测试其是否与表面相交。
To build an algorithm for ray-box intersection, we begin by considering a 2D ray whose direction vector has positive x and y components. We can generalize this to arbitrary 3D rays later. The 2D bounding box is defined by two horizontal and two vertical lines:
为了构建射线框相交算法,我们首先考虑一条 2D 射线,其方向向量具有正的 x 和 y 分量。稍后我们可以将其推广到任意 3D 射线。2D 边界框由两条水平线和两条垂直线定义:
The points bounded by these lines can be described in interval notation:
这些线所围成的点可以用区间符号来描述:
As shown in Figure 12.26, the intersection test can be phrased in terms of these intervals. First, we compute the ray parameter where the ray hits the line x = xmin:
如图 12.26所示,相交测试可以用这些间隔来表述。首先,我们计算射线与线x = x min相交处的射线参数:
Figure 12.26. The ray will be inside the interval x ∈ [xmin,xmax] for some interval in its parameter space t ∈ [txmin,txmax]. A similar interval exists for the y interval. The ray intersects the box if it is in both the x interval and y interval at the same time; i.e., the intersection of the two one-dimensional intervals is not empty.
图 12.26。射线将位于其参数空间t ∈ [ t xmin , t xmax ] 中的某个区间 x ∈ [ x min , x max ] 内。y 区间也存在类似的区间。如果射线同时位于x区间和y区间内,则射线与盒子相交;即两个一维区间的交点不为空。
We then make similar computations for txmax, tymin, and tymax. The ray hits the box if and only if the intervals [txmin,txmax] and [tymin,tymax] overlap; i.e., their intersection is nonempty. In pseudocode this algorithm is
然后,我们对t xmax 、 t ymin和t ymax进行类似的计算。当且仅当区间 [t xmin ,t xmax ] 和 [t ymin ,t ymax ] 重叠时,射线才会击中盒子;即它们的交集非空。在伪代码中,此算法为
txmin = (xmin − xe)∕xd
t xmin = ( x min − x e )∕x d
txmax = (xmax - xe)∕xd
t xmax = ( xmax - xe ) ∕ xd
tymin = (ymin - ye)∕yd
t ymin = ( y min - y e )∕ y d
tymax = (ymax - ye)∕yd
t ymax = ( y max - y e )∕ y d
if (txmin > tymax) or (tymin > txmax) then
如果(t xmin > t ymax ) 或 (t ymin > t xmax )则
return false
返回false
else
别的
return true
返回true
The if statement may seem non-obvious. To see the logic of it, note that there is no overlap if the first interval is either entirely to the right or entirely to the left of the second interval.
if 语句可能看起来不太明显。要了解它的逻辑,请注意,如果第一个间隔完全位于第二个间隔的右侧或完全位于第二个间隔的左侧,则不会重叠。
The first thing we must address is the case when xd or yd is negative. If xd is negative, then the ray will hit xmax before it hits xmin. Thus, the code for computing txmin and txmax expands to
我们必须解决的第一件事是x d或y d为负的情况。如果x d为负,则射线将在击中x min之前击中x max 。因此,计算t xmin和t xmax 的代码扩展为
if (xd ≥ 0) then
如果( x d ≥ 0)则
txmin = (xmin - xe)∕xd
t xmin = ( x min - x e )∕ x d
txmax = (xmax - xe)∕xd
t xmax = ( xmax - xe ) ∕ xd
else
别的
txmin = (xmax - xe)∕xd
t xmin = ( x max - x e )∕ x d
txmax = (xmin - xe)∕xd
t x 最大= ( x最小- x e )∕ x d
A similar code expansion must be made for the y cases. A major concern is that horizontal and vertical rays have a zero value for yd and xd, respectively. This will cause divide-by-zero which may be a problem. However, before addressing this directly, we check whether IEEE floating point computation handles these cases gracefully for us. Recall from Section 1.5 the rules for divide-by-zero: for any positive real number a,
对于y 的情况,必须进行类似的代码扩展。主要问题是水平和垂直射线分别对y d和x d具有零值。这将导致除以零,这可能是一个问题。然而,在直接解决这个问题之前,我们先检查一下 IEEE 浮点计算是否能为我们优雅地处理这些情况。回想一下第 1.5 节中除以零的规则:对于任何正实数a ,
Consider the case of a vertical ray where xd = 0 and yd > 0. We can then calculate
考虑垂直射线的情况,其中x d = 0 且y d > 0。然后我们可以计算
There are three possibilities of interest:
有三种值得关注的可能性:
xe ≤ xmin (no hit);
xe ≤ xmin (无命中);
xmin < xe < xmax (hit);
x最小值< x e < x最大值(命中);
xmax ≤ xe (no hit).
x max ≤ x e (无命中)。
For the first case, we have
对于第一种情况,我们有
This yields the interval (txmin,txmin) = (∞,∞). That interval will not overlap with any interval, so there will be no hit, as desired. For the second case, we have
这得出区间 ( t xmin , t xmin ) = (∞,∞)。该区间不会与任何区间重叠,因此不会出现命中,正如所期望的那样。对于第二种情况,我们有
This yields the interval (txmin,txmin) = (-∞,∞) which will overlap with all intervals and thus will yield a hit as desired. The third case results in the interval (-∞,-∞) which yields no hit, as desired. Because these cases work as desired, we need no special checks for them. As is often the case, IEEE floating point conventions are our ally. However, there is still a problem with this approach.
这将产生区间 ( t xmin , t xmin ) = (-∞,∞),它将与所有区间重叠,因此将产生所需的命中。第三种情况导致区间 (-∞,-∞) 不产生命中,如预期的那样。由于这些情况按预期工作,我们不需要对它们进行特殊检查。正如通常的情况一样,IEEE 浮点约定是我们的盟友。但是,这种方法仍然存在问题。
Consider the code segment:
考虑代码段:
if (xd ≥ 0) then
如果( x d ≥ 0)则
tmin = (xmin - xe)∕xd
t最小值= ( x最小值- x e )∕ x d
tmax = (xmax - xe)∕xd
t最大值= ( x最大值- x e )∕ x d
else
别的
tmin = (xmax - xe)∕xd
t最小值= ( x最大值- x e )∕ x d
tmax = (xmin - xe)∕xd
t最大值= ( x最小值- x e )∕ x d
This code breaks down when xd = −0. This can be overcome by testing on the reciprocal of xd (Williams, Barrus, Morley, & Shirley, 2005):
当x d = −0 时,此代码会失效。这可以通过测试x d的倒数来克服(Williams、Barrus、Morley 和 Shirley,2005 年):
a = 1∕xd
a = 1∕ x d
if (a ≥ 0) then
如果( a ≥ 0)则
tmin = a(xmin - xe)
t最小值= a( x最小值- x e )
tmax = a(xmax - xe)
t最大值= a( x最大值- x e )
else
别的
tmin = a(xmax - xe)
t最小值= a( x最大值- x e )
tmax = a(xmin - xe)
t最大值= a( x最小值- x e )
The basic idea of hierarchical bounding boxes can be seen by the common tactic of placing an axis-aligned 3D bounding box around all the objects as shown in Figure 12.27. Rays that hit the bounding box will actually be more expensive to compute than in a brute force search, because testing for intersection with the box is not free. However, rays that miss the box are cheaper than the brute force search. Such bounding boxes can be made hierarchical by partitioning the set of objects in a box and placing a box around each partition as shown in Figure 12.28. The data structure for the hierarchy shown in Figure 12.29 might be a tree with the large bounding box at the root and the two smaller bounding boxes as left and right subtrees. These would in turn each point to a list of three triangles. The intersection of a ray with this particular hard-coded tree would be
分层边界框的基本思想可以通过一种常用策略看出,即在所有对象周围放置一个轴对齐的 3D 边界框,如图 12.27所示。与蛮力搜索相比,击中边界框的射线实际上计算成本更高,因为测试与边界框的交点并非免费。但是,未击中边界框的射线比蛮力搜索的成本更低。可以通过将对象集划分为一个框并在每个分区周围放置一个框来使此类边界框分层,如图 12.28所示。图 12.29所示层次结构的数据结构可能是一棵树,大边界框位于根部,两个较小的边界框作为左子树和右子树。这些又会指向三个三角形的列表。射线与这棵特定的硬编码树的交点将是
Figure 12.27. A 2D ray e + t d is tested against a 2D bounding box.
图 12.27.一条 2D 射线e + t d依据 2D 边界框进行测试。
Figure 12.28. The bounding boxes can be nested by creating boxes around subsets of the model.
图 12.28.可以通过在模型子集周围创建框来嵌套边界框。
Figure 12.29. The gray box is a tree node that points to the three gray spheres, and the thick black box points to the three black spheres. Note that not all spheres enclosed by the box are guaranteed to be pointed to by the corresponding tree node.
图 12.29。灰色框是指向三个灰色球体的树节点,粗黑色框指向三个黑色球体。请注意,并非所有被框包围的球体都能保证被相应的树节点指向。
if (ray hits root box) then
如果(射线击中根框)则
if (ray hits left subtree box) then
如果(射线击中左子树框)则
check three triangles for intersection
检查三个三角形是否相交
if (ray intersects right subtree box) then
如果(射线与右子树框相交)则
check other three triangles for intersection
检查其他三个三角形是否相交
if (an intersections returned from each subtree) then
如果(从每个子树返回一个交点)则
return the closest of the two hits
返回两个结果中最接近的一个
else if (a intersection is returned from exactly one subtree) then
否则,如果(交集恰好从一棵子树返回,则)
return that intersection
返回该路口
else
别的
return false
返回false
else
别的
return false
返回false
Some observations related to this algorithm are that there is no geometric ordering between the two subtrees, and there is no reason a ray might not hit both subtrees. Indeed, there is no reason that the two subtrees might not overlap.
与此算法相关的一些观察结果是,两棵子树之间没有几何顺序,并且没有理由一条射线可能不会击中两棵子树。事实上,没有理由这两棵子树可能不重叠。
A key point of such data hierarchies is that a box is guaranteed to bound all objects that are below it in the hierarchy, but they are not guaranteed to contain all objects that overlap it spatially, as shown in Figure 12.29. This makes this geometric search somewhat more complicated than a traditional binary search on strictly ordered one-dimensional data. The reader may note that several possible optimizations present themselves. We defer optimizations until we have a full hierarchical algorithm.
此类数据层次结构的关键点在于,一个框可以保证包含层次结构中位于其下方的所有对象,但不能保证包含空间上与其重叠的所有对象,如图 12.29所示。这使得这种几何搜索比对严格排序的一维数据进行传统的二分搜索要复杂一些。读者可能会注意到,存在几种可能的优化。我们将优化推迟到我们有完整的层次算法为止。
If we restrict the tree to be binary and require that each node in the tree has bounding box, then this traversal code extends naturally. Furthermore, assume that all nodes are leaves in the tree and contain a primitive, or that they contain one or two subtrees.
如果我们将树限制为二叉树,并要求树中的每个节点都有边界框,那么这个遍历代码自然会延伸。此外,假设树中的所有节点都是叶子,并包含一个基元,或者它们包含一棵或两棵子树。
The bvh-node class should be of type surface, so it should implement surface::hit. The data it contains should be simple:
bvh-node类应为 Surface 类型,因此它应实现Surface::hit 。它包含的数据应该很简单:
class bvh-node subclass of surface
类bvh-node 表面子类
virtual bool hit(ray e + td, real t0, real t1, hit-record rec)
虚拟 bool hit(射线e + t d ,真实t 0 ,真实t 1 ,命中记录 rec)
virtual box bounding-box()
虚拟框边界框()
surface-pointer left
表面指针向左
surface-pointer right
表面指针右
box bbox
盒子 盒子
The traversal code can then be called recursively in an object-oriented style:
然后可以以面向对象的方式递归调用遍历代码:
function bool bvh-node::hit(ray a + tb, real t0, real t1,
函数bool bvh-node::hit(ray a + t b , 实数t 0 , 实数t 1 ,
hit-record rec)
命中记录 rec)
if (bbox.hitbox(a + tb, t0, t1)) then
如果(bbox.hitbox( a + t b , t 0 , t 1 ))那么
hit-record lrec, rrec
命中记录 lrec, rrec
left-hit = (left ≠ NULL) and (left → hit(a + tb, t0, t1, lrec))
左命中 = (左≠NULL) 和 (左→命中 ( a + t b , t 0 , t 1 , lrec))
right-hit = (right ≠ NULL) and (right → hit(a + tb, t0, t1, rrec))
右命中 = (右≠NULL) 和 (右→命中 ( a + t b , t 0 , t 1 , rrec))
if (left-hit and right-hit) then
如果(左击和右击)则
if (lrec.t < rrec.t) then
如果(lrec.t < rrec.t)那么
rec = lrec
else
别的
rec = rrec
return true
返回true
else if (left-hit) then
否则,如果(左击)则
rec = lrec
return true
返回true
else if (right-hit) then
否则,如果(右击)则
rec = rrec
return true
返回true
else
别的
return false
返回false
else
别的
return false
返回false
Note that because left and right point to surfaces rather than bvh-nodes specifically, we can let the virtual functions take care of distinguishing between internal and leaf nodes; the appropriate hit function will be called. Note that if the tree is built properly, we can eliminate the check for left being NULL. If we want to eliminate the check for right being NULL, we can replace NULL right pointers with a redundant pointer to left. This will end up checking left twice, but will eliminate the check throughout the tree. Whether that is worth it will depend on the details of tree construction.
请注意,由于left和right指向表面而不是bvh-node ,我们可以让虚拟函数负责区分内部节点和叶节点;将调用适当的命中函数。请注意,如果树构建正确,我们可以消除对 left 为 NULL 的检查。如果我们想消除对 right 为 NULL 的检查,我们可以用指向 left 的冗余指针替换 NULL right 指针。这最终将检查 left 两次,但将消除整个树的检查。这是否值得将取决于树构造的细节。
There are many ways to build a tree for a bounding volume hierarchy. It is convenient to make the tree binary, roughly balanced, and to have the boxes of sibling subtrees not overlap too much. A heuristic to accomplish this is to sort the surfaces along an axis before dividing them into two sublists. If the axes are defined by an integer with x = 0, y = 1, and z = 2, we have
构建边界体积层次结构的树有很多种方法。将树设为二元树、大致平衡,并使兄弟子树的框重叠不太多,这样很方便。实现此目的的启发式方法是沿轴对表面进行排序,然后将它们分成两个子列表。如果轴由整数y定义,其中x = 0、 y = 1 和z = 2,则我们有
function bvh-node::create(object-array A, int AXIS)
函数bvh-node::create(对象数组 A,int AXIS)
N = A.length
N = A.长度
if (N = 1) then
如果(N = 1)则
left = A[0]
左 = A[0]
right = NULL
右 = NULL
bbox = bounding-box(A[0])
bbox = 边界框(A[0])
else if (N = 2) then
否则,如果(N = 2)则
left-node = A[0]
左节点 = A[0]
right-node = A[1]
右节点 = A[1]
bbox = combine(bounding-box(A[0]), bounding-box(A[1]))
bbox = 组合(边界框(A[0]),边界框(A[1]))
else
别的
sort A by the object center along AXIS
按对象中心沿 AXIS 对 A 进行排序
left= new bvh-node(A[0..N/2 - 1], (AXIS + 1) mod 3)
左=新bvh节点(A [0..N / 2-1],(AXIS + 1)mod 3)
right = new bvh-node(A[N/2..N - 1], (AXIS + 1) mod 3)
右 = 新 bvh 节点(A[N/2..N - 1],(AXIS + 1)mod 3)
bbox = combine(left → bbox, right → bbox)
bbox = 组合(左 → bbox,右 → bbox)
The quality of the tree can be improved by carefully choosing AXIS each time. One way to do this is to choose the axis such that the sum of the volumes of the bounding boxes of the two subtrees is minimized. This change compared to rotating through the axes will make little difference for scenes composed of isotopically distributed small objects, but it may help significantly in less well-behaved scenes. This code can also be made more efficient by doing just a partition rather than a full sort.
每次仔细选择 AXIS 可以提高树的质量。一种方法是选择轴,使两个子树的边界框体积之和最小化。与旋转轴相比,这种变化对于由同位素分布的小物体组成的场景影响不大,但在表现不太好的场景中可能会有很大帮助。此代码还可以通过仅执行分区而不是完整排序来提高效率。
Another, and probably better, way to build the tree is to have the subtrees contain about the same amount of space rather than the same number of objects.
构建树的另一种方式(可能更好)是让子树包含大约相同量的空间而不是相同数量的对象。
To do this, we partition the list based on space:
为此,我们根据空间对列表进行分区:
function bvh-node::create(object-array A, int AXIS)
函数bvh-node::create(对象数组 A,int AXIS)
N = A.length
N = A.长度
if (N = 1) then
如果(N = 1)则
left = A[0]
左 = A[0]
right = NULL
右 = NULL
bbox = bounding-box(A[0])
bbox = 边界框(A[0])
else if (N = 2) then
否则,如果(N = 2)则
left = A[0]
左 = A[0]
right = A[1]
右 = A[1]
bbox = combine(bounding-box(A[0]), bounding-box(A[1]))
bbox = 组合(边界框(A[0]),边界框(A[1]))
else
别的
find the midpoint m of the bounding box of A along AXIS
沿 AXIS 找到 A 边界框的中点 m
partition A into lists with lengths k and (N − k) surrounding m
将 A 划分为长度为k和 ( N − k ) 的列表,其中m为
left = new bvh-node(A[0..k], (AXIS + 1) mod 3)
左 = 新 bvh 节点(A[0.. k ],(AXIS + 1)mod 3)
right = new bvh-node(A[k + 1..N - 1], (AXIS + 1) mod 3)
右 = 新 bvh 节点(A[k + 1..N - 1],(AXIS + 1)mod 3)
bbox = combine(left → bbox, right → bbox)
bbox = 组合(左 → bbox,右 → bbox)
Although this results in an unbalanced tree, it allows for easy traversal of empty space and is cheaper to build because partitioning is cheaper than sorting.
虽然这会导致树不平衡,但它可以轻松遍历空白空间,并且构建成本更低,因为分区比排序更便宜。
Another strategy to reduce intersection tests is to divide space. This is fundamentally different from dividing objects as was done with hierarchical bounding volumes:
减少相交测试的另一种策略是划分空间。这与使用分层边界体积划分对象有根本区别:
In hierarchical bounding volumes, each object belongs to one of two sibling nodes, whereas a point in space may be inside both sibling nodes.
在分层边界体中,每个对象属于两个兄弟节点之一,而空间中的点可能位于两个兄弟节点内。
In spatial subdivision, each point in space belongs to exactly one node, whereas objects may belong to many nodes.
在空间细分中,空间中的每个点都只属于一个节点,而对象可能属于多个节点。
In uniform spatial subdivision, the scene is partitioned into axis-aligned boxes. These boxes are all the same size, although they are not necessarily cubes. The ray traverses these boxes as shown in Figure 12.30. When an object is hit, the traversal ends.
在均匀空间细分中,场景被划分为轴对齐的框。这些框大小相同,尽管它们不一定是立方体。射线穿过这些框的方式如图 12.30所示。当击中物体时,遍历结束。
Figure 12.30. In uniform spatial subdivision, the ray is tracked forward through cells until an object in one of those cells is hit. In this example, only objects in the shaded cells are checked.
图 12.30。在均匀空间细分中,光线在单元格中向前追踪,直到击中其中一个单元格中的物体。在此示例中,仅检查阴影单元格中的物体。
The grid itself should be a subclass of surface and should be implemented as a 3D array of pointers to surface. For empty cells, these pointers are NULL. For cells with one object, the pointer points to that object. For cells with more than one object, the pointer can point to a list, another grid, or another data structure, such as a bounding volume hierarchy.
网格本身应该是表面的子类,并且应该实现为指向表面的 3D 指针数组。对于空单元格,这些指针为 NULL。对于具有一个对象的单元格,指针指向该对象。对于具有多个对象的单元格,指针可以指向列表、另一个网格或另一个数据结构,例如边界体积层次结构。
This traversal is done in an incremental fashion. The regularity comes from the way that a ray hits each set of parallel planes, as shown in Figure 12.31. To see how this traversal works, first consider the 2D case where the ray direction has positive x and y components and starts outside the grid. Assume the grid is bounded by points (xmin,ymin) and (xmax,ymax). The grid has nx × ny cells.
这种遍历以增量方式完成。规律性来自射线击中每组平行平面的方式,如图 12.31所示。要了解这种遍历的工作原理,首先考虑 2D 情况,其中射线方向具有正 x 和 y 分量并且始于网格外部。假设网格由点 ( x min , y min ) 和 ( x max , y max ) 界定。网格有 n x × n y个单元格。
Figure 12.31. Although the pattern of cell hits seems irregular (left), the hits on sets of parallel planes are very even.
图 12.31.尽管细胞命中的模式看起来不规则(左),但平行平面集上的命中非常均匀。
Our first order of business is to find the index (i,j) of the first cell hit by the ray e + td. Then, we need to traverse the cells in an appropriate order. The key parts to this algorithm are finding the initial cell (i,j) and deciding whether to increment i or j (Figure 12.32). Note that when we check for an intersection with objects in a cell, we restrict the range of t to be within the cell (Figure 12.33). Most implementations make the 3D array of type “pointer to surface.” To improve the locality of the traversal, the array can be tiled as discussed in Section 12.5.
我们的首要任务是找到射线e + t d击中的第一个单元格的索引 ( i,j )。然后,我们需要以适当的顺序遍历这些单元格。该算法的关键部分是找到初始单元格 (i,j) 并决定是否增加 i 或 j (图 12.32 )。请注意,当我们检查与单元格中的对象的交点时,我们将 t 的范围限制在单元格内 (图 12.33 )。大多数实现都将 3D 数组设为“指向表面的指针”类型。为了提高遍历的局部性,可以按第 12.5 节中讨论的那样对数组进行平铺。
Figure 12.32. To decide whether we advance right or upward, we keep track of the intersections with the next vertical and horizontal boundary of the cell.
图 12.32.为了决定是向右前进还是向上前进,我们要跟踪与单元格的下一个垂直和水平边界的交点。
Figure 12.33. Only hits within the cell should be reported. Otherwise, the case above would cause us to report hitting object b rather than object a.
图 12.33。仅应报告单元格内的命中。否则,上述情况会导致我们报告命中对象b而不是对象a 。
We can also partition space in a hierarchical data structure such as a binary space partitioning tree (BSP tree). This is similar to the BSP tree used for visibility sorting in Section 12.4, but it’s most common to use axis-aligned, rather than polygon-aligned, cutting planes for ray intersection.
我们还可以使用分层数据结构(例如二叉空间分割树(BSP 树))来分割空间。这类似于第 12.4 节中用于可见性排序的 BSP 树,但最常见的是使用轴对齐的切割平面(而不是多边形对齐的切割平面)来进行射线相交。
A node in this structure contains a single cutting plane and a left and right subtree. Each subtree contains all the objects on one side of the cutting plane. Objects that pass through the plane are stored in in both subtrees. If we assume the cutting plane is parallel to the yz plane at x = D, then the node class is
此结构中的一个节点包含一个切割平面以及一个左子树和一个右子树。每个子树包含切割平面一侧的所有对象。穿过该平面的对象存储在两个子树中。如果我们假设切割平面在 x = D 处与 yz 平面平行,则节点类为
class bsp-node subclass of surface
类bsp-node 表面子类
virtual bool hit(ray e + td, real t0, real t1, hit-record rec)
虚拟 bool hit(射线e + t d ,真实t 0 ,真实t 1 ,命中记录 rec)
virtual box bounding-box()
虚拟框边界框()
surface-pointer left
表面指针向左
surface-pointer right
表面指针右
real D
真正的D
We generalize this to y and z cutting planes later. The intersection code can then be called recursively in an object-oriented style. The code considers the four cases shown in Figure 12.34. For our purposes, the origin of these rays is a point at parameter t0:
我们稍后将其推广到 y 和 z 切割平面。然后可以以面向对象的方式递归调用相交代码。代码考虑了图 12.34中所示的四种情况。就我们的目的而言,这些射线的原点是参数t 0 处的一个点:
Figure 12.34. The four cases of how a ray relates to the BSP cutting plane x = D.
图 12.34。射线与 BSP 切割平面x = D 的关系的四种情况。
The four cases are
这四起案件分别是
The ray only interacts with the left subtree, and we need not test it for intersection with the cutting plane. It occurs for xp < D and xb < 0.
射线仅与左子树交互,我们不需要测试它是否与切割平面相交。当x p < D 且x b < 0 时,会发生这种情况。
The ray is tested against the left subtree, and if there are no hits, it is then tested against the right subtree. We need to find the ray parameter at x = D, so we can make sure we only test for intersections within the subtree. This case occurs for xp < D and xb > 0.
射线会针对左子树进行测试,如果没有命中,则针对右子树进行测试。我们需要在x = D处找到射线参数,这样我们才能确保只测试子树内的交点。这种情况发生在x p < D 且x b > 0 时。
This case is analogous to case 1 and occurs for xp > D and xb > 0.
这种情况与情况 1 类似,并且发生在x p > D 和x b > 0 时。
This case is analogous to case 2 and occurs for xp > D and xb < 0.
这种情况与情况 2 类似,发生在x p > D 和x b < 0 时。
The resulting traversal code handling these cases in order is
按顺序处理这些情况的最终遍历代码是
function bool bsp-node::hit(ray a + tb, real t0, real t1,
函数bool bsp-node::hit(ray a + t b , 实数t 0 , 实数t 1 ,
hit-record rec)
命中记录 rec)
xp = xa + t0xb
xp = xa + t0xb
if (xp < D) then
如果( x p < D)则
if (xb < 0) then
如果( x b < 0)则
return (left ≠ NULL) and (left →hit(a + tb, t0, t1, rec))
返回(left≠NULL) 和( left→hit( a + tb , t0 , t1 ,rec ) )
t = (D - xa)∕xb
t = ( D - xa ) ∕ xb
if (t > t1) then
如果(t > t 1 )则
return (left ≠ NULL) and (left →hit(a + tb, t0, t1, rec))
返回(left≠NULL) 和( left→hit( a + tb , t0 , t1 ,rec ) )
if (left ≠ NULL) and (left →hit(a + tb, t0, t, rec)) then
如果(left≠NULL) 且 (left→hit( a + tb , t0 ,t,rec ) )那么
return true
返回true
return (right ≠ NULL) and (right →hit(a + tb, t, t1, rec))
返回(right ≠ NULL) 和 (right →hit( a + t b , t, t 1 , rec))
else
别的
analogous code for cases 3 and 4
案例 3 和 4 的类似代码
This is very clean code. However, to get it started, we need to hit some root object that includes a bounding box so we can initialize the traversal, t0 and t1. An issue we have to address is that the cutting plane may be along any axis. We can add an integer index axis to the bsp-node class. If we allow an indexing operator for points, this will result in some simple modifications to the code above, for example,
这是非常干净的代码。但是,要开始,我们需要命中一些包含边界框的根对象,以便初始化遍历, t 0和t 1 。我们必须解决的一个问题是切割平面可能沿着任何轴。我们可以向bsp-node类添加一个整数索引轴。如果我们允许对点使用索引运算符,这将导致对上述代码进行一些简单的修改,例如,
xp = xa + t0xb
xp = xa + t0xb
would become
将成为
up = a[axis] + t0b[axis]
u p = a [轴] + t 0 b [轴]
which will result in some additional array indexing, but will not generate more branches.
这将导致一些额外的数组索引,但不会产生更多的分支。
While the processing of a single bsp-node is faster than processing a bvh-node, the fact that a single surface may exist in more than one subtree means there are more nodes and, potentially, a higher memory use. How “well” the trees are built determines which is faster. Building the tree is similar to building the BVH tree. We can pick axes to split in a cycle, and we can split in half each time, or we can try to be more sophisticated in how we divide.
虽然处理单个 bsp 节点比处理 bvh 节点要快,但事实上单个表面可能存在于多个子树中,这意味着节点更多,内存使用量也更高。树的构建“好坏”决定了哪个更快。构建树与构建 BVH 树类似。我们可以选择轴来循环分割,每次可以分成两半,或者我们可以尝试更复杂的分割方式。
Another geometric problem in which spatial data structures can be used is determining the visibility ordering of objects in a scene with changing viewpoint.
可以使用空间数据结构的另一个几何问题是确定具有变化视点的场景中对象的可见性顺序。
If we are making many images of a fixed scene composed of planar polygons, from different viewpoints—as is often the case for applications such as games—we can use a binary space partitioning scheme closely related to the method for ray intersection discussed in the previous section. The difference is that for visibility sorting, we use non–axis-aligned splitting planes, so that the planes can be made coincident with the polygons. This leads to an elegant algorithm known as the BSP tree algorithm to order the surfaces from front to back. The key aspect of the BSP tree is that it uses a preprocess to create a data structure that is useful for any viewpoint. So, as the viewpoint changes, the same data structure is used without change.
如果我们要从不同视点制作由平面多边形组成的固定场景的多个图像(游戏等应用通常如此),我们可以使用与上一节讨论的射线相交方法密切相关的二元空间分区方案。不同之处在于,对于可见性排序,我们使用非轴对齐的分割平面,以便平面可以与多边形重合。这导致了一种称为 BSP 树算法的优雅算法,该算法将表面从前到后排序。BSP 树的关键方面是它使用预处理来创建对任何视点都有用的数据结构。因此,随着视点的变化,使用相同的数据结构而无需更改。
The BSP tree algorithm is an example of a painter’s algorithm. A painter’s algorithm draws every object from back-to-front, with each new polygon potentially overdrawing previous polygons, as is shown in Figure 12.35. It can be implemented as follows:
BSP 树算法是画家算法的一个例子。画家算法从后到前绘制每个对象,每个新多边形都可能覆盖先前的多边形,如图 12.35所示。它可以按如下方式实现:
Figure 12.35. A painter’s algorithm starts with a blank image and then draws the scene one object at a time from back-to-front, overdrawing whatever is already there. This automatically eliminates hidden surfaces.
图 12.35。画家算法从一张空白图像开始,然后从后到前逐个绘制场景中的对象,覆盖已存在的对象。这会自动消除隐藏的表面。
sort objects back to front relative to viewpoint
按照视点从后向前对对象进行排序
for each object do
对每个对象执行
draw object on screen
在屏幕上绘制对象
The problem with the first step (the sort) is that the relative order of multiple objects is not always well defined, even if the order of every pair of objects is. This problem is illustrated in Figure 12.36 where the three triangles form a cycle.
第一步(排序)的问题在于,多个对象的相对顺序并不总是明确定义的,即使每对对象的顺序都是明确的。图 12.36说明了这个问题,其中三个三角形形成一个循环。
Figure 12.36. A cycle occurs if a global back-to-front ordering is not possible for a particular eye position.
图 12.36.如果对于特定的眼球位置无法进行全局从后到前的排序,则会发生循环。
The BSP tree algorithm works on any scene composed of polygons where no polygon crosses the plane defined by any other polygon. This restriction is then relaxed by a preprocessing step. For the rest of this discussion, triangles are assumed to be the only primitive, but the ideas extend to arbitrary polygons.
BSP 树算法适用于任何由多边形组成的场景,其中没有多边形与任何其他多边形定义的平面相交。然后通过预处理步骤放宽此限制。对于本讨论的其余部分,假设三角形是唯一的图元,但这些想法可以扩展到任意多边形。
The basic idea of the BSP tree can be illustrated with two triangles, T1 and T2. We first recall (see Section 2.7.3) the implicit plane equation of the plane containing T1: f1(p) = 0. The key property of implicit planes that we wish to take advantage of is that for all points p+ on one side of the plane, f1(p+) > 0; and for all points p- on the other side of the plane, f1(p-) < 0. Using this property, we can find out on which side of the plane T2 lies. Again, this assumes all three vertices of T2 are on the same side of the plane. For discussion, assume that T2 is on the f1(p) < 0 side of the plane. Then, we can draw T1 and T2 in the right order for any eyepoint e:
BSP 树的基本思想可以用两个三角形 T 1和 T 2来说明。我们首先回想一下(参见第 2.7.3 节)包含 T 1的平面的隐式平面方程:f 1 ( p ) = 0。我们希望利用的隐式平面的关键属性是,对于平面一侧的所有点p + ,f 1 ( p + ) > 0;对于平面另一侧的所有点p - ,f 1 ( p - ) < 0。使用此属性,我们可以找出 T 2位于平面的哪一侧。同样,这假设 T 2的所有三个顶点都在平面的同一侧。为了便于讨论,假设 T 2位于平面的 f 1 ( p ) < 0 侧。然后,我们可以按正确顺序为任何视点e绘制 T 1和 T 2 :
if (f1(e) < 0) then
如果(f 1 ( e ) < 0)那么
draw T1
绘制 T 1
draw T2
绘制 T 2
else
别的
draw T2
绘制 T 2
draw T1
绘制 T 1
The reason this works is that if T2 and e are on the same side of the plane containing T1, there is no way for T2 to be fully or partially blocked by T1 as seen from e, so it is safe to draw T1 first. If e and T2 are on opposite sides of the plane containing T1, then T2 cannot fully or partially block T1, and the opposite drawing order is safe (Figure 12.37).
这种方法之所以有效,是因为如果T 2和e位于包含T 1的平面的同一侧,那么从e的角度来看, T 2不可能被T 1完全或部分遮挡,因此可以安全地先绘制T 1。如果e和T 2位于包含T 1的平面的相对侧,那么T 2就不可能完全或部分遮挡T 1 ,相反的绘制顺序是安全的(图 12.37 )。
Figure 12.37. When e and T2 are on opposite sides of the plane containing T1, then it is safe to draw T2 first and T1 second. If e and T2 are on the same side of the plane, then T1 should be drawn before T2. This is the core idea of the BSP tree algorithm.
图 12.37。当e和T 2位于包含T 1的平面的相对侧时,可以安全地先绘制T 2然后再绘制T 1。如果e和T 2位于平面的同一侧,则应先绘制T 1再绘制T 2 。这是 BSP 树算法的核心思想。
This observation can be generalized to many objects provided none of them span the plane defined by T1. If we use a binary tree data structure with T1 as root, the negative branch of the tree contains all the triangles whose vertices have fi(p) < 0, and the positive branch of the tree contains all the triangles whose vertices have fi(p) > 0. We can draw in proper order as follows:
这一观察可以推广到许多对象,只要它们都不跨越由T 1定义的平面。如果我们使用以T 1为根的二叉树数据结构,则树的负分支包含所有顶点具有 f( p ) < 0 的三角形,而树的正分支包含所有顶点具有 f( p ) > 0 的三角形。我们可以按正确顺序绘制如下:
function draw(bsptree tree, point e)
函数draw(bsptree 树, 点e )
if (tree.empty) then
如果(tree.empty)则
return
返回
if (ftree.root(e) < 0) then
如果( f tree.root ( e )<0)则
draw(tree.plus, e)
绘制(tree.plus, e )
rasterize tree.triangle
栅格化树.三角形
draw(tree.minus, e)
绘制(tree.minus, e )
else
别的
draw(tree.minus, e)
绘制(tree.minus, e )
rasterize tree.triangle
栅格化树.三角形
draw(tree.plus, e)
绘制(tree.plus, e )
The nice thing about that code is that it will work for any viewpoint e, so the tree can be precomputed. Note that, if each subtree is itself a tree, where the root triangle divides the other triangles into two groups relative to the plane containing it, the code will work as is. It can be made slightly more efficient by terminating the recursive calls one level higher, but the code will still be simple. A tree illustrating this code is shown in Figure 12.38. As discussed in Section 2.7.5, the implicit equation for a point p on a plane containing three non-colinear points a, b, and c is
该代码的优点在于它可以适用于任何视点e ,因此可以预先计算树。请注意,如果每个子树本身都是一棵树,其中根三角形将其他三角形相对于包含它的平面分成两组,则代码将按原样工作。可以通过将递归调用终止在更高一级来稍微提高效率,但代码仍然很简单。说明此代码的树如图 12.38所示。如第 2.7.5 节所述,包含三个非共线点a 、 b和c 的平面上点p的隐式方程为
Figure 12.38. Three triangles and a BSP tree that is valid for them. The “positive” and “negative” are encoded by right and left subtree position, respectively.
图 12.38。三个三角形和一棵对它们有效的 BSP 树。“正”和“负”分别由右子树和左子树位置编码。
It can be faster to store the (A,B,C,D) of the implicit equation of the form
存储隐式方程 ( A,B,C,D ) 的速度会更快
Equations (12.1) and (12.2) are equivalent, as is clear when you recall that the gradient of the implicit equation is the normal to the triangle. The gradient of Equation (12.2) is n = (A,B,C) which is just the normal vector
方程 (12.1) 和 (12.2) 是等价的,只要你回想一下隐式方程的梯度就是三角形的法向量,就会明白这一点。方程 (12.2) 的梯度是n = (A,B,C),它就是法向量
We can solve for D by plugging in any point on the plane, e.g., a:
我们可以通过代入平面上任意一点来求解 D,例如a :
This suggests the form:
这表明形式如下:
which is the same as Equation (12.1) once you recall that n is computed using the cross product. Which form of the plane equation you use and whether you store only the vertices, n and the vertices, or n, D, and the vertices, is probably a matter of taste—a classic time-storage tradeoff that will be settled best by profiling. For debugging, using Equation (12.1) is probably the best.
如果您回想一下n是使用叉积计算的,它与公式 (12.1) 相同。您使用哪种形式的平面方程以及是否仅存储顶点、 n和顶点,或者n 、 D和顶点,可能只是个人喜好问题 — 经典的时间存储权衡,最好通过分析来解决。对于调试,使用公式 (12.1) 可能是最好的。
The only issue that prevents the code above from working in general is that one cannot guarantee that a triangle can be uniquely classified on one side of a plane or the other. It can have two vertices on one side of the plane and the third on the other. Or it can have vertices on the plane. This is handled by splitting the triangle into smaller triangles using the plane to “cut” them.
唯一一个导致上述代码无法正常工作的问题是,无法保证三角形可以唯一地分类在平面的一侧或另一侧。它可以在平面的一侧有两个顶点,在另一侧有第三个顶点。或者它可以在平面上有顶点。这可以通过使用平面“切割”三角形将其分割成更小的三角形来处理。
If none of the triangles in the dataset cross each other’s planes, so that all triangles are on one side of all other triangles, a BSP tree that can be traversed using the code above can be built using the following algorithm:
如果数据集中的所有三角形都不与彼此的平面相交,即所有三角形都在所有其他三角形的一侧,则可以使用以下算法构建可使用上述代码遍历的 BSP 树:
tree-root = node(T1)
树根 = 节点(T 1 )
for i ∈{2,…,N} do
对于i ∈{2,…, N }
tree-root.add(Ti)
树根.添加(T)
function add ( triangle T)
函数添加(三角形T )
if (f(a) < 0 and f(b) < 0 and f(c) < 0) then
如果( f ( a ) < 0 且f ( b ) < 0 且f ( c ) < 0)则
if (negative subtree is empty) then
如果(负子树为空)则
negative-subtree = node(T)
负子树 = 节点( T )
else
别的
negative-subtree = node(T)
负子树 = 节点( T )
else if (f(a) > 0 and f(b) > 0 and f(c) > 0) then
否则,如果(f( a )> 0 且 f( b )> 0 且 f( c )> 0) ,则
if positive subtree is empty then
如果正子树为空,则
positive-subtree = node(T)
正子树 = 节点( T )
else
别的
positive-subtree = node(T)
正子树 = 节点( T )
else
别的
we have assumed this case is impossible
我们已经假设这种情况不可能发生
The only thing we need to fix is the case where the triangle crosses the dividing plane, as shown in Figure 12.39. Assume, for simplicity, that the triangle has vertices a and b on one side of the plane, and vertex c is on the other side. In this case, we can find the intersection points A and B and cut the triangle into three new triangles with vertices
我们唯一需要解决的是三角形与分割平面相交的情况,如图 12.39所示。为简单起见,假设三角形的顶点a和b在平面的一侧,顶点c在另一侧。在这种情况下,我们可以找到交点A和B ,并将三角形切割成三个新的三角形,顶点
Figure 12.39. When a tri-angle spans a plane, there will be one vertex on one side and two on the other.
图 12.39。当三角形跨越一个平面时,一侧有一个顶点,另一侧有两个顶点。
as shown in Figure 12.40. This order of vertices is important so that the direction of the normal remains the same as for the original triangle. If we assume that f(c) < 0, the following code could add these three triangles to the tree assuming the positive and negative subtrees are not empty:
如图 12.40所示。顶点的顺序很重要,这样法线的方向才能与原始三角形保持一致。如果我们假设f ( c ) < 0,则以下代码可以将这三个三角形添加到树中,假设正子树和负子树不为空:
positive-subtree = node (T1)
正子树 = 节点 ( T 1 )
positive-subtree = node (T2)
正子树 = 节点 ( T 2 )
negative-subtree = node (T3)
负子树 = 节点 ( T 3 )
Figure 12.40. When a triangle is cut, we break it into three triangles, none of which span the cutting plane.
图 12.40。当一个三角形被切割时,我们将其分成三个三角形,其中没有一个跨越切割平面。
A precision problem that will plague a naive implementation occurs when a vertex is very near the splitting plane. For example, if we have two vertices on one side of the splitting plane and the other vertex is only an extremely small distance on the other side, we will create a new triangle almost the same as the old one, a triangle that is a sliver, and a triangle of almost zero size. It would be better to detect this as a special case and not split into three new triangles. One might expect this case to be rare, but because many models have tessellated planes and triangles with shared vertices, it occurs frequently and thus must be handled carefully. Some simple manipulations that accomplish this are
当顶点非常接近分割平面时,会出现精度问题,这会困扰简单的实现。例如,如果我们在分割平面的一侧有两个顶点,而另一个顶点在另一侧距离极小,我们将创建一个与旧三角形几乎相同的新三角形、一个细长的三角形和一个几乎为零的三角形。最好将其检测为特殊情况,而不是分割成三个新三角形。人们可能认为这种情况很少见,但由于许多模型都有镶嵌平面和共享顶点的三角形,因此这种情况经常发生,因此必须小心处理。实现此目的的一些简单操作是
function add( triangle T)
函数添加(三角形T )
fa = f(a)
fa= f ( a )
fb = f(b)
fb= f ( b )
fc = f(c)
fc= f ( c )
if (abs(fa) < ϵ) then
如果( abs(fa) < ϵ)则
fa = 0
if (abs(fb) < ϵ) then
如果( abs(fb) < ϵ)则
fb = 0
if (abs(fc) < ϵ) then
如果( abs(fc) < ϵ)则
fc = 0
if (fa ≤ 0 and fb ≤ 0 and fc ≤ 0) then
如果( fa ≤ 0 且 fb ≤ 0 且 fc ≤ 0)则
if (negative subtree is empty) then
如果(负子树为空)则
negative-subtree = node(T)
负子树 = 节点(T)
else
别的
negative-subtree.add(T)
负子树.添加( T )
else if (fa ≥ 0 and fb ≥ 0 and fc ≥ 0) then
否则,如果(fa ≥ 0 且 fb ≥ 0 且 fc ≥ 0) ,则
if (positive subtree is empty) then
如果(正子树为空)则
positive-subtree = node(T)
正子树 = 节点(T)
else
别的
positive-subtree.add(T)
正子树.添加( T )
else
别的
cut triangle into three triangles and add to each side
将三角形切成三个三角形,并分别添加到每条边
This takes any vertex whose f value is within ϵ of the plane and counts it as positive or negative. The constant ϵ is a small positive real chosen by the user. The technique above is a rare instance where testing for floating-point equality is useful and works because the zero value is set rather than being computed. Comparing for equality with a computed floating-point value is almost never advisable, but we are not doing that.
这将取f值在平面 ϵ 范围内的任何顶点,并将其计为正数或负数。常数 ϵ 是用户选择的一个小的正实数。上述技术是一种罕见的情况,其中测试浮点相等性很有用,并且有效,因为零值是设置的而不是计算的。将相等性与计算的浮点值进行比较几乎从来都不是明智的,但我们不会这样做。
Filling out the details of the last case “cut triangle into three triangles and add to each side” is straightforward, but tedious. We should take advantage of the BSP tree construction as a preprocess where the highest efficiency is not key. Instead, we should attempt to have a clean compact code. A nice trick is to force many of the cases into one by ensuring that c is on one side of the plane and the other two vertices are on the other. This is easily done with swaps. Filling out the details in the final else statement (assuming the subtrees are nonempty for simplicity) gives
填写最后一个案例“将三角形切成三个三角形并添加到每条边”的细节很简单,但很乏味。我们应该利用 BSP 树构造作为预处理,其中最高效率不是关键。相反,我们应该尝试拥有一个干净紧凑的代码。一个不错的技巧是通过确保c在平面的一侧而其他两个顶点在另一侧来强制将许多案例合并为一个。这可以通过交换轻松完成。填写最后一个 else 语句中的详细信息(为简单起见假设子树非空)给出
if (fa * fc ≥ 0) then
如果( fa * fc ≥ 0)则
swap(fb,fc)
交换( fb,fc )
swap(b,c)
交换( b , c )
swap(fa,fb)
交换( fa,fb )
swap(a,b)
交换( a , b )
else if (fb * fc ≥ 0) then
否则,如果( fb * fc ≥ 0) ,则
swap(fa,fc)
交换( fa,fc )
swap(a,c)
交换( a , c )
swap(fa,fb)
交换( fa,fb )
swap(a,b)
交换( a , b )
compute A
计算A
compute B
计算B
T1 = (a,b,A)
T1 = ( a , b , A )
T2 = (b,B,A)
T2 = ( b , B , A )
T3 = (A,B,c)
T3 = ( A , B , C )
if (fc ≥ 0) then
如果( fc ≥ 0)则
negative-subtree.add(T1)
负子树.添加( T 1 )
negative-subtree.add(T2)
负子树.添加( T 2 )
positive-subtree.add(T3)
正子树.添加( T 3 )
else
别的
positive-subtree.add(T1)
正子树.add( T 1 )
positive-subtree.add(T2)
正子树.add( T2 )
negative-subtree.add(T3)
负子树.添加( T 3 )
This code takes advantage of the fact that the product of a and b are positive if they have the same sign—thus, the first if statement. If vertices are swapped, we must do two swaps to keep the vertices ordered counterclockwise. Note that exactly one of the vertices may lie exactly on the plane, in which case the code above will work, but one of the generated triangles will have zero area. This can be handled by ignoring the possibility, which is not that risky, because the rasterization code must handle zero-area triangles in screen space (i.e., edge-on triangles). You can also add a check that does not add zero-area triangles to the tree. Finally, you can put in a special case for when exactly one of fa, fb, and fc is zero which cuts the triangle into two triangles.
此代码利用了以下事实:如果 a 和 b 具有相同的符号,则它们的乘积为正数 — 因此是第一个 if 语句。如果交换顶点,我们必须进行两次交换以保持顶点按逆时针顺序排列。请注意,恰好有一个顶点可能恰好位于平面上,在这种情况下,上面的代码将起作用,但生成的三角形之一将具有零面积。这可以通过忽略这种可能性来处理,这并不那么危险,因为光栅化代码必须处理屏幕空间中的零面积三角形(即边缘三角形)。您还可以添加一个检查,不会将零面积三角形添加到树中。最后,您可以添加一个特殊情况,即 fa、fb 和 fc 中恰好有一个为零,这会将三角形切成两个三角形。
To compute A and B, a line segment and implicit plane intersection is needed. For example, the parametric line connecting a and c is
为了计算A和B ,需要线段和隐式平面交点。例如,连接a和c 的参数线是
The point of intersection with the plane n ⋅p + D = 0 is found by plugging p(t) into the plane equation:
将p ( t ) 代入平面方程,可以找到与平面n ⋅ p + D = 0 的交点:
and solving for t:
并求解t :
Calling this solution tA, we can write the expression for A:
将此解称为t A ,我们可以写出A的表达式:
A similar computation will give B.
类似的计算将得到B 。
The efficiency of tree creation is much less of a concern than tree traversal because it is a preprocess. The traversal of the BSP tree takes time proportional to the number of nodes in the tree. (How well balanced the tree is does not matter.) There will be one node for each triangle, including the triangles that are created as a result of splitting. This number can depend on the order in which triangles are added to the tree. For example, in Figure 12.41, if T1 is the root, there will be two nodes in the tree, but if T2 is the root, there will be more nodes, because T1 will be split.
与树遍历相比,树创建的效率不那么令人担忧,因为它是一个预处理过程。遍历 BSP 树所需的时间与树中的节点数成正比。(树的平衡程度并不重要。)每个三角形都会有一个节点,包括由于分裂而创建的三角形。这个数字可能取决于三角形添加到树中的顺序。例如,在图 12.41中,如果T 1是根,则树中将有两个节点,但如果T 2是根,则将有更多节点,因为T 1将被分裂。
Figure 12.41. Using T1 as the root of a BSP tree will result in a tree with two nodes. Using T2 as the root will require a cut and thus make a larger tree.
图 12.41.使用T 1作为 BSP 树的根将产生一棵有两个节点的树。使用T 2作为根将需要切割,从而产生一棵更大的树。
It is difficult to find the “best” order of triangles to add to the tree. For N triangles, there are N! orderings that are possible. So trying all orderings is not usually feasible. Alternatively, some predetermined number of orderings can be tried from a random collection of permutations, and the best one can be kept for the final tree.
很难找到添加到树中的三角形的“最佳”顺序。对于N 个三角形,可能有N ! 种排序。因此尝试所有排序通常是不可行的。或者,可以从随机排列集合中尝试一些预定数量的排序,并将最佳排序保留用于最终树。
The splitting algorithm described above splits one triangle into three triangles. It could be more efficient to split a triangle into a triangle and a convex quadrilateral. This is probably not worth it if all input models have only triangles, but would be easy to support for implementations that accommodate arbitrary polygons.
上面描述的分割算法将一个三角形分割成三个三角形。将一个三角形分割成一个三角形和一个凸四边形可能更有效。如果所有输入模型都只有三角形,这可能不值得,但对于容纳任意多边形的实现来说,这很容易支持。
Effectively utilizing the memory hierarchy is a crucial task in designing algorithms for modern architectures. Making sure that multidimensional arrays have data in a “nice” arrangement is accomplished by tiling, sometimes also called bricking. A traditional 2D array is stored as a 1D array together with an indexing mechanism; for example, an Nx by Ny array is stored in a 1D array of length NxNy and the 2D index (x,y) (which runs from (0,0) to (Nx - 1, Ny - 1)) maps to the 1D index (running from 0 to NxNy - 1) using the formula
有效利用内存层次结构是设计现代架构算法的关键任务。确保多维数组中的数据排列“良好”是通过平铺(有时也称为砖块化)来实现的。传统的二维数组与索引机制一起存储为一维数组;例如, N x N y数组存储在长度为N x N y的一维数组中,二维索引 ( x,y )(从 (0,0) 到 ( N x - 1, N y - 1))使用公式映射到一维索引(从 0 到N x N y - 1)
An example of how that memory lays out is shown in Figure 12.42. A problem with this layout is that although two adjacent array elements that are in the same row are next to each other in memory, two adjacent elements in the same column will be separated by Nx elements in memory. This can cause poor memory locality for large Nx. The standard solution to this is to use tiles to make memory locality for rows and columns more equal. An example is shown in Figure 12.43 where 2 × 2 tiles are used. The details of indexing such an array are discussed in the next section. A more complicated example, with two levels of tiling on a 3D array, is covered after that.
图 12.42显示了该内存布局的一个示例。此布局的一个问题是,尽管同一行中的两个相邻数组元素在内存中彼此相邻,但同一列中的两个相邻元素在内存中将由N x 个元素分隔开。当N x较大时,这会导致较差的内存局部性。该问题的标准解决方案是使用平铺使行和列的内存局部性更加相等。图 12.43显示了一个例子,其中使用了 2 × 2 个平铺。下一节将讨论对此类数组进行索引的细节。之后将介绍一个更复杂的例子,即在 3D 数组上使用两层平铺。
Figure 12.42. The memory layout for an untiled 2D array with Nx = 4 and Ny = 3.
图 12.42. N x = 4 和N y = 3 的未平整二维数组的内存布局。
Figure 12.43. The memory layout for a tiled 2D array with Nx = 4 and Ny = 3 and 2 × 2 tiles. Note that padding on the top of the array is needed because Ny is not a multiple of the tile size two.
图 12.43。N x = 4、 N y = 3 和 2 × 2 个图块的平铺二维数组的内存布局。请注意,需要在数组顶部填充,因为N y不是图块大小 2 的倍数。
A key question is what size to make the tiles. In practice, they should be similar to the memory-unit size on the machine. For example, if we are using 16-bit (2-byte) data values on a machine with 128-byte cache lines, 8 × 8 tiles fit exactly in a cache line. However, using 32-bit floating-point numbers, which fit 32 elements to a cache line, 5 × 5 tiles are a bit too small and 6 × 6 tiles are a bit too large. Because there are also coarser-sized memory units such as pages, hierarchical tiling with similar logic can be useful.
一个关键问题是要将分块分成多大尺寸。实际上,它们应该与机器上的内存单元大小相似。例如,如果我们在具有 128 字节缓存行的机器上使用 16 位(2 字节)数据值,则 8 × 8 分块正好适合缓存行。但是,使用 32 位浮点数(可将 32 个元素放入缓存行),5 × 5 分块有点太小,6 × 6 分块有点太大。由于还有大小更粗的内存单元(例如页面),因此具有类似逻辑的分层分块可能很有用。
If we assume an Nx × Ny array decomposed into square n × n tiles (Figure 12.44), then the number of tiles required is
如果我们假设一个N x × N y数组分解为 n × n 个正方形瓷砖(图 12.44 ),那么所需的瓷砖数量为
Figure 12.44. A tiled 2D array composed of Bx × By tiles each of size n by n.
图 12.44.由B x × B y个平铺二维数组组成,每个平铺的大小为n × n 。
Here, we assume that n divides Nx and Ny exactly. When this is not true, the array should be padded. For example, if Nx = 15 and n = 4, then Nx should be changed to 16. To work out a formula for indexing such an array, we first find the tile indices (bx,by) that give the row/column for the tiles (the tiles themselves form a 2D array):
这里,我们假设 n 能整除N x和N y 。如果情况不成立,则应该对数组进行填充。例如,如果N x = 15 且n = 4,则应该将N x改为 16。要计算出索引此类数组的公式,我们首先找到给出图块行/列的图块索引 (b x ,b y )(图块本身形成一个 2D 数组):
where ÷ is integer division, e.g., 12 ÷ 5 = 2. If we order the tiles along rows as shown in Figure 12.42, then the index of the first element of the tile (bx,by) is
其中 ÷ 是整数除法,例如 12 ÷ 5 = 2。如果我们按照图 12.42所示沿行对图块进行排序,则图块 ( b x ,b y ) 的第一个元素的索引为
The memory in that tile is arranged like a traditional 2D array as shown in Figure 12.43. The partial offsets (x′,y′) inside the tile are
该图块中的内存排列方式与传统二维数组类似,如图 12.43所示。图块内部的部分偏移量 ( x′,y′ ) 为
where mod is the remainder operator, e.g., 12 mod 5 = 2. Therefore, the offset inside the tile is
其中mod是余数运算符,例如12 mod 5 = 2。因此,图块内的偏移量为
Thus, the full formula for finding the 1D index element (x,y) in an Nx × Ny array with n × n tiles is
因此,在具有n × n 个图块的N x × N y数组中查找一维索引元素 ( x,y ) 的完整公式为
This expression contains many integer multiplication, divide, and modulus operations, which are costly on some processors. When n is a power of two, these operations can be converted to bitshifts and bitwise logical operations. However, as noted above, the ideal size is not always a power of two. Some of the multiplications can be converted to shift/add operations, but the divide and modulus operations are more problematic. The indices could be computed incrementally, but this would require tracking counters, with numerous comparisons and poor branch prediction performance.
此表达式包含许多整数乘法、除法和模数运算,这些运算在某些处理器上成本高昂。当 n 是 2 的幂时,这些运算可以转换为位移位和按位逻辑运算。但是,如上所述,理想的大小并不总是 2 的幂。一些乘法可以转换为移位/加法运算,但除法和模数运算更成问题。索引可以增量计算,但这需要跟踪计数器,需要进行大量比较,并且分支预测性能较差。
However, there is a simple solution; note that the index expression can be written as
不过,有一个简单的解决方案;请注意,索引表达式可以写成
where
在哪里
We tabulate Fx and Fy, and use x and y to find the index into the data array. These tables will consist of Nx and Ny elements, respectively. The total size of the tables will fit in the primary data cache of the processor, even for very large dataset sizes.
我们将F x和F y制成表格,并使用 x 和 y 查找数据数组中的索引。这些表格分别由N x和N y元素组成。即使对于非常大的数据集,表格的总大小也适合处理器的主数据缓存。
Effective TLB utilization is also becoming a crucial factor in algorithm performance. The same technique can be used to improve TLB hit rates in a 3D array by creating m × m × m bricks of n × n × n cells. For example, a 40 × 20 × 19 volume could be decomposed into 4 × 2 × 2 macrobricks of 2 × 2 × 2 bricks of 5 × 5 × 5 cells. This corresponds to m = 2 and n = 5. Because 19 cannot be factored by mn = 10, one level of padding is needed. Empirically useful sizes are m = 5 for 16-bit datasets and m = 6 for float datasets.
有效利用 TLB 也正在成为算法性能的一个关键因素。通过创建m × m × m个n × n × n个单元的块,可以使用相同的技术来提高 3D 阵列中的 TLB 命中率。例如,40 × 20 × 19 的体积可以分解为 4 × 2 × 2 个宏块,每个宏块由 2 × 2 × 2 个 5 × 5 × 5 个单元的块组成。这对应于m = 2 和n = 5。由于 19 不能被mn = 10 分解,因此需要一层填充。经验上有用的大小是 16 位数据集的 m = 5 和浮点数据集的m = 6。
TLB: translation lookaside buffer, a cache that is part of the virtual memory system.
TLB:转换后备缓冲区,是虚拟内存系统一部分的缓存。
The resulting index into the data array can be computed for any (x,y,z) triple with the expression
可以使用以下表达式计算任何 (x,y,z) 三元组的数据数组的结果索引
where Nx, Ny, and Nz are the respective sizes of the dataset.
其中N x 、 N y和N z分别是数据集的各自大小。
Note that, as in the simpler 2D one-level case, this expression can be written as
请注意,与更简单的二维单层情况一样,该表达式可以写成
where
在哪里
Does tiling really make that much difference in performance?
平铺真的会对性能造成那么大的差异吗?
On some volume rendering applications, a two-level tiling strategy made as much as a factor-of-ten performance difference. When the array does not fit in main memory, it can effectively prevent thrashing in some applications such as image editing.
在某些体积渲染应用中,两级平铺策略可使性能提高十倍之多。当阵列无法装入主内存时,它可以有效防止图像编辑等应用中的抖动。
How do I store the lists in a winged-edge structure?
如何将列表存储在翼边结构中?
For most applications, it is feasible to use arrays and indices for the references. However, if many delete operations are to be performed, then it is wise to use linked lists and pointers.
对于大多数应用程序来说,使用数组和索引作为引用是可行的。但是,如果要执行许多删除操作,则使用链表和指针是明智的。
The discussion of the winged-edge data structure is based on the course notes of Ching-Kuang Shene (2003). There are smaller mesh data structures than winged-edge. The tradeoffs in using such structures are discussed in Directed Edges—A Scalable Representation for Triangle Meshes (Campagna, Kobbelt, & Seidel, 1998). The tiled-array discussion is based on Interactive Ray Tracing for Volume Visualization (Parker et al., 1999). A structure similar to the triangle neighbor structure is discussed in a technical report by Charles Loop (Loop, 2000). A discussion of manifolds can be found in an introductory topology text (Munkres, 2000).
翼边数据结构的讨论基于Ching-Kuang Shene (2003) 的课程笔记。有比翼边更小的网格数据结构。使用此类结构的权衡在《定向边——三角网格的可扩展表示》(Campagna、Kobbelt 和 Seidel,1998)中进行了讨论。平铺阵列的讨论基于《用于体积可视化的交互式光线追踪》 (Parker 等人,1999)。Charles Loop(Loop,2000)在技术报告中讨论了一种类似于三角形邻域结构的结构。在拓扑学入门教材(Munkres,2000)中可以找到有关流形的讨论。
1. What is the memory difference for a simple tetrahedron stored as four independent triangles and one stored in a winged-edge data structure?
1.存储为四个独立三角形的简单四面体与存储在翼边数据结构中的简单四面体的内存差异是什么?
2. Diagram a scene graph for a bicycle.
2.绘制自行车的场景图。
3. How many lookup tables are needed for a single-level tiling of an n-dimensional array?
3.对 n 维数组进行单级平铺需要多少个查找表?
4. Given N triangles, what is the minimum number of triangles that could be added to a resulting BSP tree? What is the maximum number?
4.给定N 个三角形,可以添加到最终 BSP 树中的三角形的最小数量是多少?最大数量是多少?
Many applications in graphics require “fair” sampling of unusual spaces, such as the space of all possible lines. For example, we might need to generate random edges within a pixel, or random sample points on a pixel that vary in density according to some density function. This chapter provides the machinery for such probability operations. These techniques will also prove useful for numerically evaluating complicated integrals using Monte Carlo integration, also covered in this chapter.
图形学中的许多应用需要对不寻常的空间进行“公平”采样,例如所有可能的线的空间。例如,我们可能需要在像素内生成随机边缘,或者在像素上生成密度根据某个密度函数变化的随机采样点。本章提供了此类概率运算的机制。这些技术对于使用蒙特卡洛对复杂积分进行数值评估也很有用集成,也在本章中介绍。
Although the words “integral” and “measure” often seem intimidating, they relate to some of the most intuitive concepts found in mathematics, and they should not be feared. For our very non-rigorous purposes, a measure is just a function that maps subsets to ℝ+ in a manner consistent with our intuitive notions of length, area, and volume. For example, on the 2D real plane ℝ2, we have the area measure A which assigns a value to a set of points in the plane. Note that A is just a function that takes pieces of the plane and returns area. This means the domain of A is all possible subsets of ℝ2, which we denote as the power set P(ℝ2). Thus, we can characterize A in arrow notation:
尽管“积分”和“测度”这两个词常常看起来令人生畏,但它们与数学中一些最直观的概念有关,不应害怕它们。对于我们非常不严谨的目的,测度只是一个将子集映射到 ℝ + 的函数,其方式与我们对长度、面积和体积的直观概念一致。例如,在二维实平面 ℝ 2上,我们有面积测度 A,它将一个值分配给平面中的一组点。请注意,A 只是一个获取平面碎片并返回面积的函数。这意味着 A 的定义域是 ℝ 2的所有可能子集,我们将其表示为幂集P (ℝ 2 )。 因此,我们可以用箭头符号来描述 A:
An example of applying the area measure shows that the area of the square with side length one is one:
应用面积测量的一个例子表明,边长为 1 的正方形的面积为 1:
where (a,b) is just the lower left-hand corner of the square. Note that a single point such as (3,7) is a valid subset of ℝ2 and has zero area: A((3,7)) = 0. The same is true of the set of points S on the x-axis, S = (x,y) such that (x,y) ∈ ℝ2and y = 0, i.e., A(S) = 0. Such sets are called zero measure sets.
其中 ( a,b ) 就是正方形的左下角。注意,单个点(例如 (3,7))是 ℝ 2的有效子集,面积为零:A((3,7)) = 0。x 轴上的点集S也是如此, S = ( x,y ),使得 ( x,y ) ∈ ℝ 2且y = 0,即A(S) = 0。这样的集合称为零测量集。
To be considered a measure, a function has to obey certain area-like properties. For example, we have a function μ:P(𝕊) → ℝ+. For μ to be a measure, the following conditions must be true:
要被视为测度,函数必须遵循某些类似面积的性质。例如,我们有一个函数 μ: P (𝕊) → ℝ + 。要使μ成为测度,必须满足以下条件:
The measure of the empty set is zero: μ(∅) = 0.
空集的测度为零: μ (∅) = 0。
The measure of two distinct sets together is the sum of their measure alone. This rule with possible intersections is
两个不同集合的度量之和等于它们各自的度量之和。此规则可能存在交集,如下所示
where ∪is the set union operator, and ∩is the set intersection operator.
其中∪是集合并运算符,∩是集合交运算符。
When we actually compute measures, we usually use integration. We can think of integration as really just notation:
当我们实际计算度量时,我们通常使用积分。我们可以将积分视为实际上只是符号:
You can informally read the right-hand side as “take all points x in the region S, and sum their associated differential areas.” The integral is often written other ways including
你可以非正式地将右边理解为“取区域S中的所有点x ,并求出它们相关的微分面积之和”。积分通常以其他方式编写,包括
All of the above formulas represent “the area of region S.” We will stick with the first one we used, because it is so verbose it avoids ambiguity. To evaluate such integrals analytically, we usually need to lay down some coordinate system and use our bag of calculus tricks to solve the equations. But have no fear if those skills have faded, as we usually have to numerically approximate integrals, and that requires only the simple techniques described in Section 13.3.
上述所有公式都表示“区域 S 的面积”。我们将坚持使用第一个公式,因为它非常冗长,可以避免歧义。要以分析方式评估此类积分,我们通常需要设置一些坐标系并使用我们的微积分技巧来求解方程。但如果这些技能已经消失,也不必担心,因为我们通常必须对积分进行数值近似,而这只需要第 13.3 节中描述的简单技巧。
Given a measure on a set S, we can always create a new measure by weighting with a nonnegative function w : S → ℝ+. This is best expressed in integral notation. For example, we can start with the example of the simple area measure on [0,1]2:
给定集合 S 上的一个测度,我们总是可以通过用非负函数 w : S → ℝ +加权来创建一个新的测度。这最好用积分符号来表达。例如,我们可以从 [0,1] 2上的简单面积测度的例子开始:
and we can use a “radially weighted” measure by inserting a weighting function of radius squared:
我们可以通过插入半径平方的加权函数来使用“径向加权”度量:
To evaluate this analytically, we can expand using a Cartesian coordinate system with dA ≡ dx dy:
为了从分析角度评估这一点,我们可以使用笛卡尔坐标系来展开,其中dA ≡ dx dy :
The key thing here is that if you think of the ∥x∥2 term as married to the dA term, and that these together form a new measure, we can call that measure ν. This would allow us to write ν(S) instead of the whole integral. If this strikes you as just a bunch of notation and bookkeeping, you are right. But it does allow us to write down equations that are either compact or expanded depending on our preference.
这里的关键是,如果你认为 ∥ x ∥ 2项与 dA 项结合在一起,并且它们一起形成一个新的测度,我们可以将该测度称为 ν。这样我们就可以写出 ν(S) 而不是整个积分。如果你觉得这只是一堆符号和记账,那你是对的。但它确实允许我们根据自己的喜好写下紧凑或扩展的方程式。
Measures really start paying off when taking averages of a function. You can only take an average with respect to a particular measure, and you would like to select a measure that is “natural” for the application or domain. Once a measure is chosen, the average of a function f over a region S with respect to measure μ is
当对函数取平均值时,度量才真正开始发挥作用。您只能针对特定度量取平均值,并且您希望选择一个对应用程序或领域“自然”的度量。一旦选择了度量,函数 f 在区域 S 上关于度量μ 的平均值就是
For example, the average of the function f(x,y) = x2 over [0,2]2 with respect to the area measure is
例如,函数f ( x,y ) = x 2在 [0,2] 2上的面积测度的平均值是
This machinery helps solve seemingly hard problems where choosing the measure is the tricky part. Such problems often arise in integral geometry, a field that studies measures on geometric entities, such as lines and planes. For example, one might want to know the average length of a line through [0,1]2. That is, by definition,
这种机制有助于解决看似困难的问题,因为选择测度是棘手的部分。这类问题经常出现在积分几何中,该领域研究几何实体(如线和平面)的测度。例如,人们可能想知道通过 [0,1] 2 的一条线的平均长度。也就是说,根据定义,
All that is left, once we know that, is choosing the appropriate μ for the application. This is dealt with for lines in the next section.
一旦我们知道了,剩下的就是为应用选择合适的μ 。下一节将讨论线条的问题。
What measure μ is “natural”?
什么量度μ是“自然的”?
If you parameterize the lines as y = mx + b, you might think of a given line as a point (m,b) in “slope-intercept” space. An easy measure to use would be dm db, but this would not be a “good” measure in that not all equal size “bundles” of lines would have the same measure. More precisely, the measure would not be invariant with respect to change of coordinate system. For example, if you took all lines through the square [0,1]2, the measure of lines through it would not be the same as the measure through a unit square rotated 45∘. What we would really like is a “fair” measure that does not change with rotation or translation of a set of lines. This idea is illustrated in Figures 13.1 and 13.2.
如果将直线参数化为y = mx + b ,您可能会将给定直线视为“斜率截距”空间中的点 ( m,b )。一种易于使用的度量是dm db ,但这不是一个“好”的度量,因为并非所有相等大小的直线“束”都具有相同的度量。更准确地说,该度量不会随着坐标系的变化而保持不变。例如,如果取所有穿过正方形 [0,1] 2 的直线,则穿过该正方形的直线的度量将不同于穿过旋转 45∘ 的单位正方形的度量。我们真正想要的是一个“公平”的度量,它不会随着一组直线的旋转或平移而改变。图 13.1和13.2说明了这个想法。
Figure 13.1. These two bundles of lines should have the same measure. They have different intersection lengths with the y-axis so using db would be a poor choice for a differential measure.
图 13.1。这两束线应该具有相同的测量值。它们与y轴的交点长度不同,因此使用db进行差分测量不是一个好选择。
Figure 13.2. These two bundles of lines should have the same measure. Since they have different values for change in slope, using dm would be a poor choice for a differential measure.
图 13.2。这两束线应该具有相同的测量值。由于它们具有不同的斜率变化值,因此使用dm作为差分测量不是一个好选择。
To develop a natural measure on the lines, we should first start thinking of them as points in a dual space. This is a simple concept: the line y = mx + b can be specified as the point (m,b) in a slope-intercept space. This concept is illustrated in Figure 13.3. It is more straightforward to develop a measure in (ϕ,b) space. In that space, b is the y-intercept, while ϕ is the angle the line makes with the x-axis, as shown in Figure 13.4. Here, the differential measure dϕdb almost works, but it would not be fair due to the effect shown in Figure 13.1. To account for the larger span b that a constant width bundle of lines makes, we must add a cosine factor:
要对直线建立自然测度,我们首先应该将它们视为对偶空间中的点。这是一个简单的概念:直线y = mx + b可以指定为斜率截距空间中的点 ( m,b )。图 13.3说明了此概念。在 ( φ , b ) 空间中建立测度更为直接。在该空间中,b 是 y 截距,而φ是直线与x轴的夹角,如图13.4所示。这里,微分测度dφdb几乎有效,但由于图 13.1所示的效果,它并不公平。为了解释恒定宽度的线束所形成的较大跨度 b,我们必须添加一个余弦因子:
Figure 13.3. The set of points on the line y = m x + b in (x, y) space can also be represented by a single point in (m, b) space so the top line and the bottom point represent the same geometric entity: a 2D line.
图 13.3 ( x, y ) 空间中直线y = mx + b上的点集也可以用 ( m, b ) 空间中的单个点表示,因此顶部直线和底部点表示相同的几何实体:二维直线。
Figure 13.4. In angle-intercept space we parameterize the line by angle ϕ ∈ [-π∕2,π∕2) rather than slope.
图 13.4.在角度截距空间中,我们用角度ϕ ∈ [-π∕2,π∕2) 而不是斜率来参数化直线。
It can be shown that this measure, up to a constant, is the only one that is invariant with respect to rotation and translation.
可以证明,该测度(直到常数)是唯一一个对于旋转和平移不变的测度。
This measure can be converted into an appropriate measure for other parameterizations of the line. For example, the appropriate measure for (m,b) space is
该测度可以转换为适合其他线参数化的测度。例如,( m,b )空间的适当测度为
For the space of lines parameterized in (u,v) space,
对于 ( u,v ) 空间中参数化的线空间,
the appropriate measure is
适当的衡量标准是
For lines parameterized in terms of (a,b), the x-intercept and y-intercept, the measure is
对于以 ( a,b )、x 截距和y截距为参数的直线,度量为
Note that any of those spaces are equally valid ways to specify lines, and which is best depends upon the circumstances. However, one might wonder whether there exists a coordinate system where the measure of a set of lines is just an area in the dual space. In fact, there is such a coordinate system, and it is delightfully simple; it is the normal coordinates which specify a line in terms of the normal distance from the origin to the line, and the angle the normal of the line makes with respect to the x-axis (Figure 13.5). The implicit equation for such lines is
请注意,这些空间中的任何一种都是同样有效的指定直线的方法,哪种方法最好取决于具体情况。但是,人们可能想知道是否存在一个坐标系,其中一组直线的度量只是对偶空间中的一个面积。事实上,有这样一个坐标系,而且非常简单;它是法向坐标,根据从原点到直线的法向距离以及直线法向与 x 轴的夹角来指定一条直线(图 13.5 )。这种直线的隐式方程是
Figure 13.5. The normal coordinates of a line use the normal distance to the origin and an angle to specify a line.
图 13.5。线的法向坐标使用到原点的法向距离和角度来指定线。
And, indeed, the measure in that space is
事实上,那个空间的测量结果是
We shall use these measures to choose fair random lines in a later section.
我们将在后面的部分使用这些措施来选择公平的随机线。
In 3D, there are many ways to parameterize lines. Perhaps, the simplest way is to use their intersection with a particular plane along with some specification of their orientation. For example, we could chart the intersection with the xy plane along with the spherical coordinates of its orientation. Thus, each line would be specified as a (x,y,θ,ϕ) quadruple. This shows that lines in 3D are 4D entities; i.e., they can be described as points in a 4D space.
在三维空间中,有许多方法可以参数化线。也许最简单的方法是使用它们与特定平面的交点以及它们方向的某些指定。例如,我们可以绘制与 xy 平面的交点以及其方向的球面坐标。因此,每条线将被指定为 (x,y,θ, ϕ ) 四元组。这表明三维空间中的线是 4D 实体;即,它们可以描述为 4D 空间中的点。
The differential measure of a line should not vary with (x,y), but bundles of lines with equal cross section should have equal measure. Thus, a fair differential measure is
直线的微分测度不应随( x,y )变化,但具有相等横截面积的线束应具有相等的测度。因此,公平的微分测度是
Another way to parameterize lines is to chart the intersection with two parallel planes. For example, if the line intersects the plane z = 0 at (x = u,y = v) and the plane z = 1 at (x = s,y = t), then the line can be described by the quadruple (u,v,s,t). Note that like the previous parameterization, this one is degenerate for lines parallel to the xy plane. The differential measure is more complicated for this parameterization although it can be approximated as
参数化直线的另一种方法是绘制与两个平行平面的交点。例如,如果直线在 ( x = u,y = v ) 处与平面z = 0 相交,在 ( x = s,y = t ) 处与平面z = 1 相交,则该直线可以用四元组 ( u,v,s,t ) 来描述。请注意,与上一个参数化一样,对于与 xy 平面平行的直线,此参数化是退化的。此参数化的微分测度更为复杂,尽管可以将其近似为
for bundles of lines nearly parallel to the z-axis. This is the measure often implicitly used in image-based rendering.
表示与 z 轴几乎平行的线束。这是基于图像的渲染中经常隐式使用的度量。
For sets of lines that intersect a sphere, we can use the parameterization of the two points where the line intersects the sphere. If these are in spherical coordinates, then the point can be described by the quadruple (θ1,ϕ1,θ2,ϕ2) and the measure is just the differential area associated with each point:
对于与球面相交的线集,我们可以使用线与球面相交的两个点的参数化。如果这些是在球面坐标中,那么该点可以用四元组 ( θ 1 , ϕ 1 ,θ 2 , ϕ 2 ) 来描述,并且度量只是与每个点相关的微分面积:
This implies that picking two uniform random endpoints on the sphere results in a line with uniform density. This observation was used to compute form-factors by Mateu Sbert in his dissertation (Sbert, 1997).
这意味着在球体上随机选取两个均匀端点将产生一条密度均匀的线。Mateu Sbert 在其论文(Sbert,1997)中利用这一观察结果计算了形状因子。
Note that sometimes we want to parameterize directed lines, and sometimes we want the order of the endpoints not to matter. This is a bookkeeping detail that is especially important for rendering applications where the amount of light flowing along a line is different in the two directions along the line.
请注意,有时我们想要参数化有向线,有时我们不想让端点的顺序影响我们。这是一个记账细节,对于渲染应用来说尤其重要,因为在渲染应用中,沿线流动的光量在沿线的两个方向上是不同的。
Many graphics algorithms use probability to construct random samples to solve integration and averaging problems. This is the domain of applied continuous probability which has basic connections to measure theory.
许多图形算法利用概率构造随机样本来解决积分和平均问题。这是应用连续概率的领域,与测度论有基本的联系。
Loosely speaking, a continuous random variable x is a scalar or vector quantity that “randomly” takes on some value from the real line ℝ = (-∞,+∞). The behavior of x is entirely described by the distribution of values it takes. This distribution of values can be quantitatively described by the probability density function (pdf), p, associated with x (the relationship is denoted x ~ p). The probability that x assumes a particular value in some interval [a,b] is given by the following integral:
粗略地说,连续随机变量x 是“随机”从实数线 ℝ = (-∞,+∞) 中取某个值的标量或矢量。x 的行为完全由其取值的分布描述。该值的分布可以通过与 x 相关的概率密度函数(pdf) p定量描述(该关系表示为x ~ p )。x 在某个区间 [a,b] 中取特定值的概率由以下积分给出:
Loosely speaking, the probability density function p describes the relative likelihood of a random variable taking a certain value; if p(x1) = 6.0 and p(x2) = 3.0, then a random variable with density p is twice as likely to have a value “near” x1 than it is to have a value near x2. The density p has two characteristics:
粗略地说,概率密度函数 p 描述的是随机变量取某个值的相对可能性;如果p ( x 1 ) = 6.0 且p ( x 2 ) = 3.0,则密度为p的随机变量取“接近” x 1 的值的可能性是取“接近” x 2 的值的可能性的两倍。密度 p 具有两个特点:
As an example, the canonical random variable ξ takes on values between zero (inclusive) and one (non-inclusive) with uniform probability (here uniform simply means each value for ξ is equally likely). This implies that the probability density function q for ξ is
例如,正则随机变量 ξ 以均匀概率取 0(包括)和 1(不包括)之间的值(此处均匀仅表示 ξ 的每个值都具有同等可能性)。这意味着 ξ 的概率密度函数 q 为
The space over which ξ is defined is simply the interval [0,1). The probability that ξ takes on a value in a certain interval [a,b] ∈ [0,1) is
定义 ξ 的空间就是区间 [0,1)。ξ 在某个区间 [ a,b ] ∈ [0,1) 内取值的概率为
The average value that a real function f of a one-dimensional random variable with underlying pdf p will take on is called its expected value, E(f(x)) (sometimes written Ef(x)):
具有基础概率密度函数p的一维随机变量的实函数f的平均值称为其期望值, E(f(x)) (有时写为Ef(x)) :
The expected value of a one-dimensional random variable can be calculated by setting f(x) = x. The expected value has a surprising and useful property: the expected value of the sum of two random variables is the sum of the expected values of those variables:
一维随机变量的期望值可以通过设置f(x) = x来计算。期望值具有令人惊讶且有用的特性:两个随机变量之和的期望值是这些变量的期望值之和:
for random variables x and y. Because functions of random variables are themselves random variables, this linearity of expectation applies to them as well:
对于随机变量x和y 。由于随机变量的函数本身也是随机变量,因此期望的线性也适用于它们:
An obvious question to ask is whether this property holds if the random variables being summed are correlated (variables that are not correlated are called independent). This linearity property in fact does hold whether or not the variables are independent! This summation property is vital for most Monte Carlo applications.
一个显而易见的问题是,如果被求和的随机变量是相关的(不相关的变量称为独立变量),该属性是否成立。事实上,无论变量是否独立,该线性属性都成立!该求和属性对于大多数蒙特卡罗应用至关重要。
The discussion of random variables and their expected values extends naturally to multidimensional spaces. Most graphics problems will be in such higher-dimensional spaces. For example, many lighting problems are phrased on the surface of the hemisphere. Fortunately, if we define a measure μ on the space the random variables occupy, everything is very similar to the one-dimensional case. Suppose the space S has associated measure μ; for example, S is the surface of a sphere and μ measures area. We can define a pdf p : S↦ℝ, and if x is a random variable with x ~ p, then the probability that x will take on a value in some region Si ⊂ S is given by the integral
对随机变量及其期望值的讨论自然延伸到多维空间。大多数图形问题都将在这种高维空间中。例如,许多照明问题都是在半球表面上提出的。幸运的是,如果我们在随机变量占据的空间上定义一个测度μ ,那么一切都与一维情况非常相似。假设空间 S 具有相关测度μ ;例如, S是球体的表面, μ测量面积。我们可以定义一个 pdf p : S ↦ℝ,如果x是一个随机变量, x ~ p ,那么 x 在某个区域 S ⊂ S 中取值的概率由积分给出
Here, Probability (event) is the probability that event is true, so the integral is the probability that x takes on a value in the region Si.
这里,概率(事件)是事件为真的概率,因此积分是x在区域 S 中取值的概率。
In graphics, S is often an area (dμ = dA = dxdy) or a set of directions (points on a unit sphere: dμ = dω = sinθdθdϕ). As an example, a two-dimensional random variable α is a uniformly distributed random variable on a disk of radius R. Here, uniformly means uniform with respect to area, e.g., the way a bad dart player’s hits would be distributed on a dart board. Since it is uniform, we know that p(α) is some constant. From the fact that the area of the disk is πr2 and that the total probability is one, we can deduce that
在图形学中,S 通常是一个面积( dμ = dA = dxdy)或一组方向(单位球面上的点: dμ = dω = sinθdθdϕ )。例如,二维随机变量 α 是半径为 R 的圆盘上的均匀分布随机变量。这里,均匀表示面积均匀,例如,一个糟糕的飞镖选手的命中次数在飞镖盘上的分布方式。由于它是均匀的,我们知道 p(α) 是某个常数。根据圆盘面积为πr 2且总概率为 1 的事实,我们可以推断出
This means that the probability that α is in a certain subset S1 of the disk is just
这意味着 α 在磁盘的某个子集S 1中的概率只是
This is all very abstract. To actually use this information, we need the integral in a form we can evaluate. Suppose Si is the portion of the disk closer to the center than the perimeter. If we convert to polar coordinates, then α is represented as a (r,ϕ) pair, and S1 is the region where r < R∕2. Note that just because α is uniform, it does not imply that ϕ or r is necessarily uniform (in fact, ϕ is uniform, and r is not uniform). The differential area dA is just r dr dϕ. Thus,
这一切都非常抽象。要实际使用这些信息,我们需要以可求值的形式来表示积分。假设 S 是圆盘中距离中心比距离周长更近的部分。如果我们转换为极坐标,则 α 表示为 (r, φ ) 对,而 S 1是 r < R∕2 的区域。请注意,仅仅因为 α 是均匀的,并不意味着φ或 r 一定是均匀的(事实上, φ是均匀的,而 r 不是均匀的)。微分面积 dA 就是r dr dφ 。因此,
The formula for expected value of a real function applies to the multidimensional case:
实函数期望值的公式适用于多维情况:
where x ∈ S and f : S↦ℝ, and p : S↦ℝ. For example, on the unit square S = [0,1] × [0,1] and p(x,y) = 4xy, the expected value of the x coordinate for (x,y) ~ p is
其中x ∈ S且f : S ↦ℝ,且p : S ↦ℝ。例如,在单位正方形 S = [0,1] × [0,1] 和 p( x,y ) = 4xy 上,( x,y ) ~ p 的 x 坐标的预期值为
Note that here f(x,y) = x.
注意这里 f( x,y )= x 。
The variance, V (x), of a one-dimensional random variable is, by definition, the expected value of the square of the difference between x and E(x):
根据定义,一维随机变量的方差 V(x)是x与E(x)之差的平方的期望值:
Some algebraic manipulation gives the non-obvious expression:
一些代数运算给出了非显而易见的表达式:
The expression E([x − E(x)]2) is more useful for thinking intuitively about variance, while the algebraically equivalent expression E(x2) − [E(x)]2 is usually convenient for calculations. The variance of a sum of random variables is the sum of the variances if the variables are independent. This summation property of variance is one of the reasons it is frequently used in analysis of probabilistic models. The square root of the variance is called the standard deviation, σ, which gives some indication of expected absolute deviation from the expected value.
表达式E([x − E(x)] 2 ) 更适合直观地思考方差,而代数等价表达式E ( x 2 ) − [ E(x) ] 2通常便于计算。如果变量是独立的,则随机变量之和的方差是方差之和。方差的这种求和性质是它经常用于概率模型分析的原因之一。方差的平方根称为标准差,σ,它给出了与预期值的预期绝对偏差的某种指示。
Many problems involve sums of independent random variables xi, where the variables share a common density p. Such variables are said to be independent identically distributed (iid) random variables. When the sum is divided by the number of variables, we get an estimate of E(x):
许多问题涉及独立随机变量x的和,其中变量共享一个共同的密度 p。这样的变量被称为独立同分布(iid) 随机变量。当和除以变量数量时,我们得到E(x)的估计值:
As N increases, the variance of this estimate decreases. We want N to be large enough so that we have confidence that the estimate is “close enough.” However, there are no sure things in Monte Carlo; we just gain statistical confidence that our estimate is good. To be sure, we would have to have N = ∞. This confidence is expressed by the Law of Large Numbers:
随着N 的增加,该估计值的方差会减小。我们希望N足够大,这样我们才能确信估计值“足够接近”。然而,蒙特卡洛中没有确定的事情;我们只是从统计上确信我们的估计值是好的。要确定,我们必须让N = ∞。这种信心可以用以下公式来表达大数定律:
In this section, the basic Monte Carlo solution methods for definite integrals are outlined. These techniques are then straightforwardly applied to certain integral problems. All of the basic material of this section is also covered in several of the classic Monte Carlo texts. (See the “Notes” section at the end of this chapter.)
本节概述了定积分的基本蒙特卡罗求解方法。然后,这些技术直接应用于某些积分问题。本节的所有基本材料也在几本经典的蒙特卡罗教材中有所介绍。(请参阅本章末尾的“注释”部分。)
As discussed earlier, given a function f : S↦ℝ and a random variable x ~ p, we can approximate the expected value of f(x) by a sum:
如前所述,给定一个函数 f : S↦ℝ 和一个随机变量 x ~ p,我们可以通过以下总和来近似 f(x) 的预期值:
Because the expected value can be expressed as an integral, the integral is also approximated by the sum. The form of Equation (13.4) is a bit awkward; we would usually like to approximate an integral of a single function g rather than a product fp. We can accomplish this by substituting g = fp as the integrand:
因为期望值可以表示为积分,所以积分也可以用和来近似。方程 (13.4) 的形式有点尴尬;我们通常希望近似单个函数 g 的积分,而不是乘积 fp。我们可以通过将g = fp代入被积函数来实现这一点:
For this formula to be valid, p must be positive when g is nonzero.
为了使该公式有效,当g非零时, p必须为正。
So to get a good estimate, we want as many samples as possible, and we want the g∕p to have a low variance (g and p should have a similar shape). Choosing p intelligently is called importance sampling, because if p is large where g is large, there will be more samples in important regions. Equation (13.4) also shows the fundamental problem with Monte Carlo integration: diminishing return. Because the variance of the estimate is proportional to 1/N, the standard deviation is proportional to . Since the error in the estimate behaves similarly to the standard deviation, we will need to quadruple N to halve the error.
因此,为了得到一个好的估计,我们需要尽可能多的样本,并且我们希望g ∕ p具有较低的方差( g和p应该具有相似的形状)。明智地选择 p 称为重要性抽样,因为如果 p 在 g 较大时较大,则重要区域中的样本会更多。等式 (13.4) 还显示了蒙特卡洛积分的基本问题:收益递减。由于估计的方差与 1/ N成正比,因此标准差与1 /否由于估计中的误差表现与标准差类似,我们需要将N增加四倍才能将误差减半。
Another way to reduce variance is to partition S, the domain of the integral, into several smaller domains Si, and evaluate the integral as a sum of integrals over the Si. This is called stratified sampling, the technique that jittering employs in pixel sampling (Chapter 4). Normally, only one sample is taken in each Si (with density pi), and in this case, the variance of the estimate is
减少方差的另一种方法是将积分域 S 划分为几个较小的域 S,并将积分作为 S 上的积分之和来求值。这称为分层抽样,抖动在像素抽样中采用的技术(第 4 章)。通常,每个 S(密度为 p)只进行一次抽样,在这种情况下,估计值的方差为
It can be shown that the variance of stratified sampling is never higher than unstratified if all strata have equal measure:
可以证明,如果所有层都有相同的度量,则分层抽样的方差永远不会高于非分层抽样的方差:
The most common example of stratified sampling in graphics is jittering for pixel sampling.
图形中分层采样最常见的例子是像素采样的抖动。
As an example of the Monte Carlo solution of an integral I, set g(x) equal to x over the interval (0, 4):
作为积分 I 的蒙特卡洛解的一个例子,设 g(x) 等于区间 (0, 4) 上的 x:
The impact of the shape of the function p on the variance of the N sample estimates is shown in Table 13.1. Note that the variance is reduced when the shape of p is similar to the shape of g. The variance drops to zero if p = g∕I, but I is not usually known or we would not have to resort to Monte Carlo. One important principle illustrated in Table 13.1 is that stratified sampling is often far superior to importance sampling (Mitchell, 1996). Although the variance for this stratification on I is inversely proportional to the cube of the number of samples, there is no general result for the behavior of variance under stratification. There are some functions for which stratification does no good. One example is a white noise function, where the variance is constant for all regions. On the other hand, most functions will benefit from stratified sampling, because the variance in each subcell will usually be smaller than the variance of the entire domain.
表 13.1显示了函数p的形状对 N 个样本估计值的方差的影响。请注意,当p的形状与 g 的形状相似时,方差会减小。如果p = g∕I ,则方差降至零,但I通常未知,否则我们就不必求助于蒙特卡罗了。表 13.1说明的一个重要原则是,分层抽样通常远远优于重要性抽样(Mitchell,1996)。虽然这种分层对 I 的方差与样本数量的立方成反比,但是对于分层下的方差行为并没有普遍的结果。有些函数不适合分层。一个例子是白噪声函数,其中的方差对于所有区域都是恒定的。另一方面,大多数函数都将受益于分层抽样,因为每个子单元的方差通常小于整个域的方差。
A popular method for quadrature is to replace the random points in Monte Carlo integration with quasi-random points. Such points are deterministic, but are in some sense uniform. For example, on the unit square [0,1]2, a set of N quasi-random points should have the following property on a region of area A within the square:
一种常用的求积方法是用拟随机点代替蒙特卡洛积分中的随机点。这些点是确定性的,但在某种意义上是均匀的。例如,在单位正方形 [0,1] 2上,一组 N 个拟随机点在正方形内面积为 A 的区域上应具有以下属性:
For example, a set of regular samples in a lattice has this property.
例如,格中的一组规则样本具有此属性。
Quasi-random points can improve performance in many integration applications. Sometimes, care must be taken to make sure that they do not introduce aliasing. It is especially nice that, in any application where calls are made to random or stratified points in [0,1]d, one can substitute d-dimensional quasi-random points with no other changes.
准随机点可以提高许多积分应用程序的性能。有时,必须小心确保它们不会引入混叠。特别好的是,在任何调用 [0,1] d中的随机或分层点的应用程序中,都可以用 d 维准随机点替换而无需进行其他更改。
The key intuition motivating quasi–Monte Carlo integration is that when estimating the average value of an integrand, any set of sample points will do, provided they are “fair.”
激发准蒙特卡洛积分的关键直觉是,当估计被积函数的平均值时,任何一组样本点都可以,只要它们是“公平的”。
We often want to generate sets of random or pseudorandom points on the unit square for applications such as distribution ray tracing. There are several methods for doing this, e.g., jittering. These methods give us a set of N reasonably equidistributed points on the unit square [0,1]2 : (u1,v1) through (uN,vN).
我们经常想在单位正方形上生成一组随机或伪随机点,用于诸如分布射线追踪之类的应用。有几种方法可以做到这一点,例如抖动。这些方法为我们提供了一组N 个在单位正方形 [0,1] 2上合理均匀分布的点:( u 1 , v 1 )至( u N ,v N )。
Sometimes, our sampling space may not be square (e.g., a circular lens) or may not be uniform (e.g., a filter function centered on a pixel). It would be nice if we could write a mathematical transformation that would take our equidistributed points (ui,vi) as input and output a set of points in our desired sampling space with our desired density. For example, to sample a camera lens, the transformation would take (ui,vi) and output (ri,ϕi) such that the new points are approximately equidistributed on the disk of the lens. While we might be tempted to use the transform
有时,我们的采样空间可能不是正方形(例如,圆形镜头)或可能不是均匀的(例如,以像素为中心的滤波函数)。如果我们可以编写一个数学变换,将均匀分布的点 (u,v) 作为输入,并输出我们所需采样空间中具有所需密度的一组点,那就太好了。例如,要对相机镜头进行采样,变换将采用 (u,v) 并输出 (r,ϕ),以使新点在镜头圆盘上大致均匀分布。虽然我们可能倾向于使用变换
it has a serious problem. While the points do cover the lens, they do so nonuniformly (Figure 13.6). What we need in this case is a transformation that takes equal-area regions to equal-area regions—one that takes uniform sampling distributions on the square to uniform distributions on the new domain.
它有一个严重的问题。虽然这些点确实覆盖了镜头,但它们的覆盖范围并不均匀(图 13.6 )。在这种情况下,我们需要的是一个将等面积区域转换为等面积区域的变换——将正方形上的均匀采样分布转换为新域上的均匀分布。
Figure 13.6. The transform that takes the horizontal and vertical dimensions uniformly to (r,ϕ) does not preserve relative area; not all of the resulting areas are the same.
图 13.6.将水平和垂直维度统一到 ( r , ϕ ) 的变换不保留相对面积;并非所有得到的面积都是相同的。
There are several ways to generate such nonuniform points or uniform points on non-rectangular domains, and the following sections review the three most often used: function inversion, rejection, and Metropolis.
有几种方法可以在非矩形域上生成这种非均匀点或均匀点,以下章节回顾了最常用的三种方法:函数反转、拒绝和 Metropolis。
If the density f(x) is one-dimensional and defined over the interval x ∈ [xmin,xmax], then we can generate random numbers αi that have density f from a set of uniform random numbers ξi, where ξi ∈ [0,1]. To do this, we need the cumulative probability distribution function P(x):
如果密度f(x)是一维的,且定义在区间x ∈ [ x min , x max ] 上,那么我们可以从一组均匀随机数 ξ 中生成密度为 f 的随机数 α,其中 ξ ∈ [0,1]。为此,我们需要累积概率分布函数P(x) :
To get αi, we simply transform ξi:
为了得到 α,我们只需变换 ξ:
where P-1 is the inverse of P. If P is not analytically invertible, then numerical methods will suffice, because an inverse exists for all valid probability distribution functions.
其中P -1是P的逆。如果P不是解析可逆的,那么数值方法就足够了,因为所有有效的概率分布函数都存在逆。
Note that analytically inverting a function is more confusing than it should be due to notation. For example, if we have the function
请注意,由于符号的原因,解析地求函数的逆比它应该有的更令人困惑。例如,如果我们有函数
for x > 0, then the inverse function is expressed in terms of y as a function of x:
对于x > 0,则反函数用y来表示为x的函数:
When the function is analytically invertible, it is almost always that simple. However, things are a little more opaque with the standard notation:
当函数解析可逆时,它几乎总是那么简单。然而,使用标准符号时,事情会变得有点不透明:
Here, x is just a dummy variable. You may find it easier to use the less standard notation:
这里, x只是一个虚拟变量。你可能会发现使用不太标准的符号更容易:
while keeping in mind that these are inverse functions of each other.
同时要记住,它们是互为反函数。
For example, to choose random points xi that have density
例如,选择具有密度的随机点x
on [-1,1], we see that
在 [-1,1] 上,我们看到
and
和
so we can “warp” a set of canonical random numbers (ξ1,…,ξN) to the properly distributed numbers
因此我们可以将一组规范随机数 (ξ 1 ,…,ξ N ) “扭曲”为适当分布的数字
Of course, this same warping function can be used to transform “uniform” jittered samples into nicely distributed samples with the desired density.
当然,同样的扭曲函数可用于将“均匀”抖动样本转换为具有所需密度的分布良好的样本。
If we have a random variable α = (αx,αy) with two-dimensional density (x,y) defined on [xmin,xmax] × [ymin,ymax], then we need the two-dimensional distribution function:
如果我们有一个随机变量 α = (α x ,α y ),其二维密度 ( x,y ) 定义在 [ x min , x max ] × [ y min , y max ] 上,那么我们需要二维分布函数:
We first choose an xi using the marginal distribution F(x,ymax) and then choose yi according to F(xi,y)∕F(xi,ymax). If f(x,y) is separable (expressible as g(x)h(y)), then the one-dimensional techniques can be used on each dimension.
我们首先使用边际分布F(x , y max ) 选择一个x ,然后根据F ( x ,y)∕F( x , y max ) 选择y 。如果 f( x,y ) 是可分离的(可表示为g(x)h(y) ),则可以在每个维度上使用一维技术。
Returning to our earlier example, suppose we are sampling uniformly from the disk of radius R, so p(r,ϕ) = 1∕(πR2). The two-dimensional distribution function is
回到我们之前的例子,假设我们从半径为R的圆盘上均匀采样,因此p(r , ϕ ) = 1∕( πR2 )。二维分布函数为
This means that a canonical pair (ξ1,ξ2) can be transformed to a uniform random point on the disk:
这意味着正则对 (ξ 1 ,ξ 2 ) 可以变换为磁盘上的均匀随机点:
This mapping is shown in Figure 13.7.
该映射如图 13.7所示。
Figure 13.7. A mapping that takes equal area regions in the unit square to equal area regions in the disk.
图 13.7.将单位正方形中的等面积区域映射到磁盘中的等面积区域。
To choose reflected ray directions for some realistic rendering applications, we choose points on the unit hemisphere according to the density:
为了为某些真实的渲染应用选择反射射线方向,我们根据密度选择单位半球上的点:
where n is a Phong-like exponent, θ is the angle from the surface normal and θ ∈ [0,π∕2] (is on the upper hemisphere), and ϕ is the azimuthal angle (ϕ ∈ [0,2π]). The cumulative distribution function is
其中n是 Phong 型指数,θ 是与表面法线的角度,θ ∈ [0,π∕2](位于上半球), φ是方位角( φ ∈ [0,2π])。累积分布函数为
Thesinθ′ term arises because, on the sphere, dω = cosθdθdϕ. When the marginal densities are found, p (as expected) is separable, and we find that a (ξ1,ξ2) pair of canonical random numbers can be transformed to a direction by
sinθ′ 项的出现是因为,在球面上,dω = cosθdθd ϕ 。当找到边缘密度时,p(如预期)是可分离的,我们发现一对(ξ 1 ,ξ 2 )正则随机数可以通过以下方式转换为一个方向
Again, a nice thing about this is that a set of jittered points on the unit square can be easily transformed to a set of jittered points on the hemisphere with the desired distribution. Note that if n is set to 1, we have a diffuse distribution, as is often needed.
同样,这样做的好处是,单位正方形上的一组抖动点可以轻松转换为半球上的一组抖动点,并具有所需的分布。请注意,如果 n 设置为 1,我们将得到一个弥散分布,这通常是需要的。
Often, we must map the point on the sphere into an appropriate direction with respect to a uvw basis. To do this, we can first convert the angles to a unit vector :
通常,我们必须将球面上的点映射到相对于uvw基的适当方向。为此,我们可以首先将角度转换为单位向量一个→ :
As an efficiency improvement, we can avoid taking trigonometric functions of inverse trigonometric functions (e.g.,cos(arccosθ)). For example, when n = 1 (a diffuse distribution), the vector a simplifies to
为了提高效率,我们可以避免使用反三角函数的三角函数(例如,cos(arccosθ))。例如,当n = 1(弥散分布)时,向量a简化为
A rejection method chooses points according to some simple distribution and rejects some of them that are in a more complex distribution. There are several scenarios where rejection is used, and we show some of these by example.
拒绝方法根据一些简单分布选择点,并拒绝一些分布更复杂的点。有几种使用拒绝的场景,我们通过示例展示其中的一些。
Suppose we want uniform random points within the unit circle. We can first choose uniform random points (x,y) ∈ [-1,1]2 and reject those outside the circle. If the function r() returns a canonical random number, then the procedure is
假设我们想要单位圆内的均匀随机点。我们可以先选择均匀随机点 ( x,y ) ∈ [-1,1] 2并拒绝圆外的点。如果函数 r() 返回一个规范随机数,则过程如下
done = false
完成 = 错误
while (not done) do
当(未完成)做
x = -1 + 2r()
x = -1 + 2r ()
y = -1 + 2r()
y = -1 + 2r ()
if (x2 + y2 < 1) then
如果(x 2 + y 2 < 1)则
done = true
完成 = 真
If we want a random number x ~ p and we know that p : [a,b]↦ℝ, and that for all x, p(x) < m, then we can generate random points in the rectangle [a,b] × [0,m] and take those where y < p(x):
如果我们想要一个随机数x ~ p ,并且我们知道p : [a,b] ↦ℝ,且对于所有的x , p(x) < m ,那么我们可以在矩形[a,b] × [0, m ] 内生成随机点,并取y < p(x)处的点:
done = false
完成 = 错误
while (not done) do
当(未完成)做
x = a + r()(b - a)
x = a + r()(b - a)
y = r()m
y = r()m
if (y < p(x)) then
如果( y < p(x) )那么
done = true
完成 = 真
This same idea can be applied to take random points on the surface of a sphere. To pick a random unit vector with uniform directional distribution, we first pick a random point in the unit sphere and then treat that point as a direction vector by taking the unit vector in the same direction:
同样的思路也适用于在球体表面随机取点。要选取具有均匀方向分布的随机单位向量,我们首先在单位球体中选取一个随机点,然后通过沿相同方向取单位向量将该点视为方向向量:
done = false
完成 = 错误
while (not done) do
当(未完成)做
x = -1 + 2r()
x = -1 + 2r ()
y = -1 + 2r()
y = -1 + 2r ()
z = -1 + 2r()
z = -1 + 2r ()
if
如果( (升=十2 +是2 +是2 ) ) < 1然后
done = true
完成 = 真
x = x∕l
x = x ∕ l
y = y∕l
y = y ∕ l
z = z∕l
z = z ∕ l
Although the rejection method is usually simple to code, it is rarely compatible with stratification. For this reason, it tends to converge more slowly and should thus be used mainly for debugging or in particularly difficult circumstances.
尽管拒绝方法通常易于编码,但它很少与分层兼容。因此,它往往收敛得更慢,因此主要用于调试或特别困难的情况。
The Metropolis method uses random mutations to produce a set of samples with a desired density. This concept is used extensively in the Metropolis Light Transport algorithm referenced in the chapter notes. Suppose we have a random point x0 in a domain S. Furthermore, suppose for any point x, we have a way to generate random y ~ px. We use the marginal notation px(y) ≡ p(x → y) to denote this density function. Now, suppose we let x1 be a random point in S selected with underlying density p(x0 → x1). We generate x2 with density p(x1 → x0) and so on. In the limit, where we generate an infinite number of samples, it can be proved that the samples will have some underlying density determined by p regardless of the initial point x0.
Metropolis方法使用随机突变来生成一组具有所需密度的样本。这一概念在章节注释中引用的Metropolis Light Transport算法中被广泛使用。假设我们在域 S 中有一个随机点x 0 。此外,假设对于任何点 x ,我们都有办法生成随机 y ~ p x 。我们使用边际符号 p x (y) ≡ p(x → y) 来表示该密度函数。现在,假设我们让x 1成为 S 中选定的一个随机点,其基础密度为 p( x 0 → x 1 )。我们生成密度为 p( x 1 → x 0 ) 的x 2 ,依此类推。在极限情况下,我们生成无限数量的样本,可以证明无论初始点x 0是什么,样本都会具有由 p 决定的某个基础密度。
Now, suppose we want to choose p such that the underlying density of samples to which we converge is proportional to a function f(x) where f is a nonnegative function with domain S. Furthermore, suppose we can evaluate f, but we have little or no additional knowledge about its properties (such functions are common in graphics). Also, suppose we have the ability to make “transitions” from xi to xi+1 with underlying density function t(xi → xi+1). To add flexibility, further suppose we add the potentially nonzero probability that xi transitions to itself, i.e., xi+1 = xi. We phrase this as generating a potential candidate y ~ t(xi → y) and “accepting” this candidate (i.e., xi+1 = y) with probability a(xi → y) and rejecting it (i.e., xi+1 = xi) with probability 1 - a(xi → y). Note that the sequence x0,x1,x2,… will be a random set, but there will be some correlation among samples. They will still be suitable for Monte Carlo integration or density estimation, but analyzing the variance of those estimates is much more challenging.
现在,假设我们要选择 p,使得我们收敛到的样本的底层密度与函数 f(x) 成比例,其中 f 是定义域为 S 的非负函数。此外,假设我们可以评估 f,但是我们对其属性知之甚少或一无所知(此类函数在图形学中很常见)。此外,假设我们能够使用底层密度函数 t( x → x i+1 ) 从x到x i +1进行“转换”。为了增加灵活性,进一步假设我们添加x转换到自身的潜在非零概率,即x i+1 = x 。我们将其表述为生成潜在候选 y ~ t( x → y) 并以概率 a( x → y)“接受”该候选(即x i+1 = y)并以概率 1 - a( x → y) 拒绝它(即x i+1 = x )。请注意,序列x 0 、 x 1 、 x 2 、… 将是一个随机集,但样本之间会存在一些相关性。它们仍然适用于蒙特卡罗积分或密度估计,但分析这些估计的方差要困难得多。
Now, suppose we are given a transition function t(x → y) and a function f(x) of which we want to mimic the distribution, can we use a(y → x) such that the points are distributed in the shape of f? Or more precisely,
现在,假设我们给定一个转换函数t(x → y)和一个函数f(x),我们想模仿它们的分布,我们可以使用 a(y → x) 使得点呈f的形状分布吗?或者更准确地说,
It turns out this can be forced by making sure the xi are stationary in some strong sense. If you visualize a huge collection of sample points x, you want the “flow” between two points to be the same in each direction. If we assume the density of points near x and y are proportional to f(x) and f(y), respectively, then the flow in the two directions should be the same:
事实证明,这可以通过确保x在某种意义上是静止的来强制实现。如果你想象出一个巨大的样本点 x 集合,你希望两个点之间的“流动”在每个方向上都是相同的。如果我们假设x和y附近的点的密度分别与f(x)和f(y)成比例,那么两个方向上的流动应该是相同的:
where k is some positive constant. Setting these two flows constant gives a constraint on a:
其中k是某个正常数。将这两个流设为常数会对 a 施加约束:
Thus, if either a(y → x) or a(x → y) is known, so is the other. Making them larger improves the chance of acceptance, so the usual technique is to set the larger of the two to 1.
因此,如果a(y → x)或a(x → y)已知,则另一个也已知。使它们更大可以提高接受的机会,因此通常的方法是将两者中较大的一个设置为 1。
A difficulty in using the Metropolis sample generation technique is that it is hard to estimate how many points are needed before the set of points is “good.” Things are accelerated if the first n points are discarded, although choosing n wisely is nontrivial.
使用 Metropolis 样本生成技术的一个困难是,很难估计需要多少个点才能使点集达到“良好”状态。如果丢弃前 n 个点,速度会加快,尽管明智地选择 n 个点并非易事。
As an example of the full process of designing a sampling strategy, consider the problem of finding random lines that intersect the unit square [0,1]2. We want this process to be fair; that is, we would like the lines to be uniformly distributed within the square. Intuitively, we can see that there is some subtlety to this problem; there are “more” lines at an oblique angle than in horizontal or vertical directions. This is because the cross section of the square is not uniform.
作为设计采样策略的完整过程的一个例子,考虑寻找与单位正方形 [0,1] 2相交的随机线的问题。我们希望这个过程是公平的;也就是说,我们希望线条在正方形内均匀分布。直观地看,我们可以看到这个问题有一些微妙之处;斜角的线条比水平或垂直方向的线条“更多”。这是因为正方形的横截面并不均匀。
Our first goal is to find a function-inversion method, if one exists, and then to fall back on rejection or Metropolis if that fails. This is because we would like to have stratified samples in line space. We try using normal coordinates first, because the problem of choosing random lines in the square is just the problem of finding uniform random points in whatever part of (r,θ) space corresponds to lines in the square.
我们的第一个目标是找到一种函数反演方法(如果存在的话),然后如果失败,则求助于拒绝或 Metropolis。这是因为我们希望在线空间中有分层样本。我们首先尝试使用正常坐标,因为在正方形中选择随机线的问题就是在 ( r,θ ) 空间中与正方形中的线相对应的任何部分中找到均匀随机点的问题。
Consider the region where−π∕2 < θ < 0. What values of r correspond to lines that hit the square? For those angles, r < cosθ are all the lines that hit the square as shown in Figure 13.8. Similar reasoning in the other four quadrants finds the region in (r,θ) space that must be sampled, as shown in Figure 13.9. The equation of the boundary of that region rmax(θ)is
考虑−π/2 < θ < 0 的区域。r 的哪些值对应于与正方形相交的线?对于这些角度,r < cos θ是所有与正方形相交的线,如图 13.8所示。在其他四个象限中进行类似推理可以找到 (r, θ ) 空间中必须采样的区域,如图 13.9所示。该区域 r max ( θ ) 的边界方程为
Figure 13.8. The largest distance r corresponds to a line hitting the square for θ ∈[ - π∕2, 0]. Because the square has sidelength one, r= cos θ.
图 13.8。最大距离 r 对应于一条线段,它与θ ∈[-π ∕2,0] 的正方形相交。由于正方形的边长为 1,所以r = cosθ 。
Figure 13.9. The maximum radius for lines hitting the unit square [0,1]2 as a function of θ.
图 13.9。与单位正方形 [0,1] 2相交的线的最大半径与θ的关系。
Because the region under rmax(θ) is a simple function bounded below by r = 0, we can sample it by first choosing θ according to the density function:
因为r max ( θ ) 下的区域是一个由r = 0 界定的简单函数,所以我们可以先根据密度函数选择θ来对其进行采样:
The denominator here is 4. Now, we can compute the cumulative probability distribution function:
这里的分母是 4。现在,我们可以计算累积概率分布函数:
We can invert this by manipulating ξ1 = P(θ) into the form θ = g(ξ1). This yields
我们可以通过将 ξ 1 = P( θ ) 转换为θ = g(ξ 1 ) 来反转它。这得到
Once we have θ, then r is simply
一旦我们有了θ ,那么r就是
As discussed earlier, there are many parameterizations of the line, and each has an associated “fair” measure. We can generate random lines in any of these spaces as well. For example, in slope-intercept space, the region that hits the square is shown in Figure 13.10. By similar reasoning to the normal space, the density function for the slope is
如前所述,直线有许多参数化,每个参数化都有一个相关的“公平”度量。我们也可以在任何这些空间中生成随机线。例如,在斜率截距空间中,与正方形相交的区域如图 13.10所示。通过与正常空间类似的推理,斜率的密度函数为
Figure 13.10. The region of (m,b) space that contains lines that intersect the unit square [0,1]2.
图 13.10。 ( m,b ) 空间中包含与单位正方形 [0,1] 2相交的线的区域。
with respect to the differential measure
关于微分测度
This gives rise to the cumulative distribution function:
这产生了累积分布函数:
These can be inverted by solving two quadratic equations. Given an m generated using ξ1, we then have
这些可以通过求解两个二次方程来求逆。给定使用 ξ 1生成的m ,我们得到
This is not a better way than using normal coordinates; it is just an alternative way.
这并不是比使用普通坐标更好的方法;它只是一种替代方法。
This chapter discussed probability but not statistics. What is the distinction?
本章讨论的是概率,而不是统计学。两者的区别是什么?
Probability is the study of how likely an event is. Statistics infers characteristics of large, but finite, populations of random variables. In that sense, statistics could be viewed as a specific type of applied probability.
概率是研究某事件发生的可能性。统计学可以推断大量但有限的随机变量群体的特征。从这个意义上讲,统计学可以被视为一种特定类型的应用概率。
Is Metropolis sampling the same as the Metropolis Light Transport Algorithm?
Metropolis 采样和 Metropolis Light Transport 算法一样吗?
No. The Metropolis Light Transport (Veach & Guibas, 1997). Algorithm uses Metropolis sampling as part of its procedure, but it is specifically for rendering, and it has other steps as well.
不是。Metropolis Light Transport (Veach & Guibas,1997)。算法使用 Metropolis 采样作为其程序的一部分,但它专门用于渲染,并且还有其他步骤。
The classic reference for geometric probability is Geometric Probability (Solomon, 1978). Another method for picking random edges in a square is given in Random–Edge Discrepancy of Supersampling Patterns (Dobkin & Mitchell, 1993). More information on quasi–Monte Carlo methods for graphics can be found in Efficient Multidimensional Sampling (Kollig & Keller, 2002). Three classic and very readable books on Monte Carlo methods are Monte Carlo Methods (Hammersley & Handscomb, 1964), Monte Carlo Methods, Basics (Kalos & Whitlock, 1986), and The Monte Carlo Method (Sobel, Stone, & Messer, 1975).
几何概率的经典参考书目是《几何概率》 (Solomon,1978 年)。另一种在正方形中随机选取边的方法在《超采样模式的随机边差异》 (Dobkin & Mitchell,1993 年)中给出。有关图形拟蒙特卡罗方法的更多信息,请参阅《高效多维采样》 (Kollig & Keller,2002 年)。关于蒙特卡罗方法的三本经典且非常易读的书籍是《蒙特卡罗方法》 (Hammersley & Handscomb,1964 年)、《蒙特卡罗方法基础》 (Kalos & Whitlock,1986 年)和《蒙特卡罗方法》 (Sobel、Stone 和 Messer,1975 年)。
1. What is the average value of the function xyz in the unit cube (x,y,z) ∈ [0,1]3?
1.单位立方体 ( x,y,z ) ∈ [0,1] 3中函数 xyz 的平均值是多少?
2. What is the average value of r on the unit-radius disk: (r,ϕ) ∈ [0,1] × [0,2π)?
2.单位半径圆盘上 r 的平均值是多少: ( r , ϕ ) ∈ [0,1] × [0,2π)?
3. Show that the uniform mapping of canonical random points (ξ1,ξ2) to the barycentric coordinates of any triangle is: , and γ = (1 - u)ξ2.
3.证明标准随机点 (ξ 1 ,ξ 2 ) 到任意三角形重心坐标的均匀映射为: β = 1 − 1 − ξ 1 ,且 γ = (1 - u )ξ 2 。
4. What is the average length of a line inside the unit square? Verify your answer by generating ten million random lines in the unit square and averaging their lengths.
4.单位正方形内直线的平均长度是多少?在单位正方形内生成一千万条随机直线,并计算其平均长度,以验证您的答案。
5. What is the average length of a line inside the unit cube? Verify your answer by generating ten million random lines in the unit cube and averaging their lengths.
5.单位立方体内线的平均长度是多少?通过在单位立方体中生成一千万条随机线并计算其平均长度来验证您的答案。
6. Show from the definition of variance that V(x) = E(x2) – [E(x)]2.
6.根据方差的定义证明V(x) = E ( x 2 ) – [ E(x) ] 2 。
While all rendering is to some extent or another “physics-based,” the term “physics-based” implies in practice that we will adhere strictly to physics models rather than being “phenomenological,” which captures subjective perceptual features heuristically, such as an empirical formula that puts a highlight in the “right” place. This chapter covers physics-based rendering at a high level, defines the units and terms used in the area, and provides a brute force “path tracing” algorithm that can produce very accurate images very slowly. We do not delve into the details of the many rendering algorithms out there, but almost all of them can be viewed as improvements only over the brute force algorithm. Note that improvements in efficiency are why we have realistic graphics in movies and games and is nothing to be sneezed at; we don’t cover these details because they are a moving target and have good coverage out there in the ecosystem, most notably the superb PBRT book and code-base of Pharr et al. (2016).
虽然所有渲染在某种程度上都是“基于物理的”,但“基于物理”一词在实践中意味着我们将严格遵守物理模型,而不是“现象学的”,后者启发式地捕捉主观感知特征,例如将高光放在“正确”位置的经验公式。本章从高层次介绍了基于物理的渲染,定义了该领域使用的单位和术语,并提供了一种可以非常缓慢地生成非常精确图像的强力“路径跟踪”算法。我们不会深入研究现有的许多渲染算法的细节,但几乎所有算法都可以看作是对强力算法的改进。请注意,效率的提高是我们在电影和游戏中拥有逼真图形的原因,这一点不容小觑;我们不讨论这些细节,因为它们是一个移动目标,并且在生态系统中已经得到了很好的覆盖,最著名的是 Pharr 等人 (2016) 的出色 PBRT 书籍和代码库。
To aid our intuition, we will describe radiometry in terms of collections of large numbers of photons, and this section establishes what is meant by a photon in this context. Note that in graphics, “photon” does not necessarily mean precisely what it does in physics and many physicists are confused when they read discussions of things like “photon tracing” written by graphics people. For us, a photon is usually just an energy packet that behaves in a way that obeys geometric optics (where light travels in straight lines and doesn’t have wave properties).
为了帮助我们理解,我们将用大量光子的集合来描述辐射测量,本节将介绍光子在此上下文中的含义。请注意,在图形中,“光子”并不一定意味着它在物理学中的含义,许多物理学家在阅读图形人员撰写的有关“光子追踪”等内容的讨论时会感到困惑。对我们来说,光子通常只是一个能量包,其行为遵循几何光学(光沿直线传播,不具有波动性)。
More precisely, for us a photon is a packet of light that has a position, direction of propagation, and a wavelength λ. Somewhat strangely, the SI unit used for wavelength is nanometer (nm). This is mainly for historical reasons, and 1 nm = 1−9m. Another unit, the angstrom, is sometimes used, and one nanometer is ten angstroms. A photon also has a speed c that depends only on the refractive index n of the medium through which it propagates. Sometimes the frequency f = c∕λ is also used for light. This is convenient because unlike λ and c, f does not change when the photon refracts into a medium with a new refractive index. Another invariant measure is the amount of energy q carried by a photon, which is given by the following relationship:
更准确地说,对我们来说,光子是一束光,它有位置、传播方向和波长 λ。有点奇怪的是,波长的 SI 单位是纳米(nm)。这主要是出于历史原因,1 nm = 1 −9 m。有时也会使用另一个单位埃,一纳米等于十埃。光子也具有速度 c,该速度仅取决于其传播介质的折射率 n。有时也使用光的频率f = c∕λ 。这很方便,因为与 λ 和c 不同,当光子折射到具有新折射率的介质中时,f不会改变。另一个不变的度量是光子携带的能量q ,它由以下关系给出:
where h = 6.63 × 10−34 J s is Plank’s Constant. Although these quantities can be measured in any unit system, we will use SI units whenever possible.
其中h = 6.63 × 10 −34 J s 是普朗克常数。虽然这些量可以用任何单位制来测量,但只要有可能,我们就会使用 SI 单位。
For a smooth metal, light either reflects specularly as described in Section 4.5.4 or is refracted into the surface and then quickly absorbed (with very thin metal coats as sometimes done to glass, you can see light making it all the way through and that metals are not actually opaque). The amount of light reflected is determined by the Fresnel equations. These equations are straightforward, but cumbersome. In addition, their values vary with the polarization of the light, a characteristic usually ignored in graphics. The main visual effect of the Fresnel equations is that the reflectance increases with the incident angle, particularly near grazing angles where it goes to 100%.
对于光滑的金属,光线要么像第 4.5.4 节中所述进行镜面反射,要么被折射到表面然后迅速被吸收(如果金属涂层非常薄,就像玻璃有时采用的方式,您可以看到光线完全穿过金属,而金属实际上并非不透明)。反射光的量由菲涅尔方程确定。这些方程很简单,但很麻烦。此外,它们的值会随着光的偏振而变化,而这一特性在图形中通常会被忽略。菲涅尔方程的主要视觉效果是反射率随入射角的增加而增加,特别是在掠射角附近,反射率达到 100%。
Almost all graphics programs use a simple approximation for the Fresnel equations developed by Schlick (1994). For a metal, we typically specify the reflectance at normal incidence R0(λ). The reflectance should vary according to the Fresnel equations, and a good approximation is given by Schlick (1994)
几乎所有图形程序都使用 Schlick (1994) 开发的菲涅尔方程的简单近似值。对于金属,我们通常指定法向入射的反射率 R 0 (λ)。反射率应根据菲涅尔方程而变化,Schlick (1994) 给出了一个很好的近似值
where θ is the angle between the direction of light propagation and the surface normal. Here, this approximation allows us to just set the normal reflectance of the metal either from data or by eye.
其中θ是光传播方向与表面法线之间的角度。在这里,这种近似允许我们根据数据或肉眼设定金属的法线反射率。
Dielectrics are clear materials that refract light, and it is not a bad heuristic that if they are not a metal, they are dielectrics. So skin, milk, hair, cloth, and almost all everyday materials are dielectrics, although that is not obvious as they tend to be opaque because they are a mixture of different refractive indices and light-absorbing impurities. But smooth homogeneous dielectrics are transparent, and examples are glass, water, and the lens in the eye. For a smooth dielectric, there are only three important properties:
电介质是折射光线的透明材料,并且如果它们不是金属,那么它们就是电介质,这是一个不错的启发。因此,皮肤、牛奶、头发、衣服和几乎所有日常材料都是电介质,尽管这并不明显,因为它们往往是不透明的,因为它们是不同折射率和吸光杂质的混合物。但光滑的均质电介质是透明的,例如玻璃、水和眼睛中的晶状体。对于光滑的电介质,只有三个重要属性:
How much light is reflected at each incident angle and wavelength.
每个入射角度和波长反射多少光。
What fraction of light is absorbed as it travels through the material for a given distance and wavelength.
当光以给定的距离和波长穿过材料时,有多少比例的光被吸收。
what are the directions of the reflected and refracted light.
反射和折射光的方向是什么。
How light bends geometrically and what fraction is reflected/transmitted depends on the refractive index n(λ) of the material. For a dielectric, the same Schlick Equation 14.2 for reflectance works as does with metals. However, when one of the materials is air, we can set R0(λ) in terms of the n(λ)
光的几何弯曲程度以及反射/透射的比例取决于材料的折射率n (λ)。对于电介质,反射率的 Schlick 方程 14.2 与金属相同。但是,当其中一种材料是空气时,我们可以根据 n(λ) 设定R 0 (λ)
In the case where the refractive indices on either side of the equations are not 1.0 (like air or vacuum), then this formula applies:
如果等式两边的折射率不是 1.0(如空气或真空),则适用以下公式:
Typically, n does not vary with wavelength, but for applications where dispersion (the different wavelengths disperse from each other and we get rainbows) is important, n can vary. The refractive indices that are often useful include water (n = 1.33), glass (n = 1.4 to n = 1.7), and diamond (n = 2.4).
通常, n不会随波长而变化,但对于色散(不同波长相互分散,产生彩虹)很重要的应用, n可能会变化。通常有用的折射率包括水( n = 1.33)、玻璃( n = 1.4 至n = 1.7)和钻石( n = 2.4)。
The amount of light transmitted is whatever is not reflected (a result of energy conservation). So we don’t need to explicitly compute a formula for the transmitted fraction.
透射光量是未被反射的光量(能量守恒的结果)。因此,我们不需要明确计算透射分数的公式。
Dielectrics also filter and refract light; some glass filters out more red and blue light than green light, so the glass takes on a green tint. When a ray travels from a medium with refractive index n into one with a refractive index nt, some of the light is transmitted, and it bends. This is shown for nt > n in Figure 14.1. Snell’s law tells us that
电介质也会过滤和折射光线;有些玻璃过滤掉的红光和蓝光比绿光多,因此玻璃会呈现绿色色调。当光线从折射率为n的介质传播到折射率为n t 的介质时,部分光线会被透射,并且会发生弯曲。图 14.1中 n t > n显示了这种情况。斯涅尔定律告诉我们
Figure 14.1. Snell’s law describes how the angle ϕ depends on the angle θ and the refractive indices of the object and the surrounding medium.
图 14.1.斯涅尔定律描述了角度ϕ如何依赖于角度θ以及物体和周围介质的折射率。
Example values of n: air: 1.00; water: 1.33–1.34; window glass: 1.51; optical glass: 1.49–1.92; diamond: 2.42.
n的示例值:空气:1.00;水:1.33–1.34;窗玻璃:1.51;光学玻璃:1.49–1.92;钻石:2.42。
Computing the sine of an angle between two vectors is usually not as convenient as computing the cosine, which is a simple dot product for the unit vectors such as we have here. Using the trigonometric identity sin2θ + cos2θ = 1, we can derive a refraction relationship for cosines:
计算两个向量之间的角度的正弦通常不如计算余弦那么方便,余弦就是我们这里得到的单位向量的简单点积。使用三角恒等式 sin 2 θ + cos 2 θ = 1,我们可以得出余弦的折射关系:
Note that if n and nt are reversed, then so are θ and ϕ as shown in right of Figure 14.1.
请注意,如果n和n t反转,则 θ 和ϕ也反转,如图 14.1右侧所示。
To convert sinϕ and cosϕ into a 3D vector, we can set up a 2D orthonormal basis in the plane of the surface normal, n, and the ray direction, d.
为了将 sin ϕ和 cos ϕ转换为三维矢量,我们可以在表面法线n和射线方向d的平面上建立二维正交基。
From Figure 14.2, we can see that n and b form an orthonormal basis for the plane of refraction. By definition, we can describe the direction of the transformed ray, t, in terms of this basis:
从图 14.2中我们可以看出, n和b构成了折射平面的正交基。根据定义,我们可以用这个基来描述变换后的射线t的方向:
Figure 14.2. The vectors n and b form a 2D orthonormal basis that is parallel to the transmission vector t.
图 14.2.矢量n和b形成与透射矢量t平行的二维正交基。
Since we can describe d in the same basis, and d is known, we can solve for b:
因为我们可以用相同的基础描述d ,并且d是已知的,所以我们可以解出b :
This means that we can solve for t with known variables:
这意味着我们可以用已知变量来求解t :
Note that this equation works regardless of which of n and nt is larger. An immediate question is, “What should you do if the number under the square root is negative?” In this case, there is no refracted ray and all of the energy is reflected. This is known as total internal reflection,, and it is responsible for much of the rich appearance of glass objects.
请注意,无论 n 和 n t中哪个更大,此方程都适用。一个直接的问题是,“如果平方根下的数字是负数,你该怎么办?”在这种情况下,没有折射光线,所有能量都被反射。这被称为全内反射,它是玻璃物体丰富外观的主要原因。
For homogeneous impurities, as is found in typical colored glass, a light-carrying ray’s intensity will be attenuated according to Beer’s Law. As the ray travels through the medium, it loses intensity according to dI = −CI dx, where dx is distance. Thus, dI∕dx = -CI. We can solve this equation and get the exponential I = k exp(-Cx). The degree of attenuation is described by the RGB attenuation constant a, which is the amount of attenuation after one unit of distance. Putting in boundary conditions, we know that I(0) = I0, and I(1) = aI(0). The former implies I(x) = I0 exp(-Cx). The latter implies I0a = I0 exp(-C), so - C = ln(a). Thus, the final formula is
对于均匀的杂质(如在典型的有色玻璃中发现的杂质),根据比尔定律,光线的强度会衰减。当光线穿过介质时,其强度会根据dI = − CI dx而减弱,其中dx是距离。因此, dI∕dx = -CI 。我们可以解这个方程并得到指数I = k exp( -Cx )。衰减程度由 RGB 衰减常数 a 描述,即单位距离后的衰减量。代入边界条件,我们知道I (0) = I 0和I (1) = aI (0)。前者意味着I(x) = I 0 exp(- Cx )。后者意味着I 0 a = I 0 exp(- C ),所以 - C = ln( a )。因此,最终公式为
where I(s) is the intensity of the beam at distance s from the interface. Because the exponential of the log is there, we can also write that as
其中I(s)是距离界面 s 处的光束强度。由于对数的指数存在,我们也可以将其写为
In practice, we reverse-engineer a by eye, because such data are rarely easy to find. The effect of Beer’s law can be seen in Figure 14.3, where the glass takes on a green tint.
在实践中,我们用肉眼进行逆向工程,因为这样的数据很少容易找到。比尔定律的影响可以在图 14.3中看到,其中玻璃呈现出绿色色调。
Figure 14.3. The color of the glass is affected by total internal reflection and Beer’s law. The amount of light transmitted and reflected is determined by the Fresnel equations. Note that when light comes in at normal incidence, more is transmitted.
图 14.3。玻璃的颜色受全内反射和比尔定律的影响。透射和反射的光量由菲涅尔方程确定。请注意,当光以法线入射时,透射的光更多。
This effect works for transmitted light as well. These ideas are shown diagrammatically in Figure 14.4. Note that the light is repeatedly reflected and refracted as shown in Figure 14.5. Usually, only one or two of the reflected images is easily visible.
这种效应也适用于透射光。图 14.4以图表形式显示了这些想法。请注意,光线反复反射和折射,如图 14.5所示。通常,只有一个或两个反射图像清晰可见。
Figure 14.4. The amount of light reflected and transmitted by glass varies with the angle.
图 14.4.玻璃反射和透射的光量随角度而变化。
Figure 14.5. Light is repeatedly reflected and refracted by glass, with the fractions of energy shown.
图 14.5.光在玻璃上反复反射和折射,图中显示了能量的分数。
With just smooth dielectrics we can render a surprising array of materials. Many surfaces that look “matte” and opaque can be simulated as multiple dielectrics. Consider a perfect ice cube. It will look like a block of glass, just with less bend in its refraction (ice has a lower refractive index than glass does). Now place many small spherical air pockets inside that ice cube, and it will become increasingly opaque as more air bubbles are added.
仅使用光滑的电介质,我们就可以渲染出令人惊讶的多种材质。许多看起来“无光泽”和不透明的表面都可以模拟为多种电介质。考虑一个完美的冰块。它看起来像一块玻璃,只是折射弯曲较少(冰的折射率低于玻璃)。现在在冰块内放置许多小球形气穴,随着气泡的增加,它会变得越来越不透明。
This basic idea of scattering elements that could be air or other substances is responsible for much to the opacity we see. Another way to make the ice cube look opaque is to roughen the surface. This could be done on a computer graphics model by finely tessellating the surface and then doing a small random perturbation of the position of every triangle vertex. This would have an effect much like that of frosted glass (which is essentially that sort of surface) where opacity emerges the more rough the perturbation is.
这种散射元素(可能是空气或其他物质)的基本思想在很大程度上决定了我们看到的不透明度。另一种让冰块看起来不透明的方法是使表面粗糙。这可以在计算机图形模型上完成,方法是精细地细分表面,然后对每个三角形顶点的位置进行微小的随机扰动。这将产生与毛玻璃(本质上是那种表面)非常类似的效果,扰动越粗糙,不透明度就越高。
Further complexity can be achieved by inserting particles that have a color and thus activate Beer’s law. This is a fairly easy and surprisingly accurate way to simulate paint.
通过插入有颜色的粒子并激活比尔定律,可以实现进一步的复杂性。这是一种相当简单且出奇准确的油漆模拟方法。
There are a wealth of materials whose complexity can be simulated with explicit modeling of the microstructure. For example, human skin can modeled as a rough surface and layers of slightly different refractive index, pigment particles (obeying Beer’s law), and blood (obeying Beer’s law).
很多材料的复杂性都可以通过显式的微观结构建模来模拟。例如,人类皮肤可以建模为粗糙的表面和折射率略有不同的层、色素颗粒(遵循比尔定律)和血液(遵循比尔定律)。
Suppose we do model a scene as dielectrics with microstructure and one light-emitting object? How could we most simply render it and produce an image? In this section, we discuss how to just simulate the photons brute force, and then how to do that in reverse sending adjoint (backward) photons from the sensor. This can actually produce some of the best pictures possible in graphics, with very little code, but will be extremely slow.
假设我们确实将场景建模为具有微结构的电介质和一个发光物体?我们如何才能最简单地渲染它并生成图像?在本节中,我们将讨论如何仅模拟光子的强力,然后如何反向执行此操作,从传感器发送伴随(向后)光子。这实际上可以用很少的代码生成图形中最好的一些图片,但速度会非常慢。
In order to produce an image, we must have some concept of an image capture device. A simple one would be a simple array of sensors (like a CCD) and a box with a small hole in it so it acts like a pinhole camera. Each sensor in the array essentially acts as a photon counter (with some wavelengths causing more of a response than others). After bombarding it with photons, the sensor array will have values that can be written out as an image. Sensors that receive few photons will be written to black, and those that receive a lot will be written to white, with various grayscales in between.
为了生成图像,我们必须对图像捕获设备有一些概念。一个简单的设备就是一个简单的传感器阵列(如 CCD)和一个带有小孔的盒子,这样它就可以像针孔相机一样工作。阵列中的每个传感器本质上都充当光子计数器(某些波长比其他波长引起的响应更多)。在用光子轰击它之后,传感器阵列将具有可以写出为图像的值。接收少量光子的传感器将被写入黑色,而接收大量光子的传感器将被写入白色,中间有各种灰度。
To produce a color image, we can put red green and blue filters in some pattern in front of the sensors. The simplest filters would be bandpass filters:
为了生成彩色图像,我们可以在传感器前面以某种图案放置红、绿和蓝滤光片。最简单的滤光片是带通滤光片:
blue no response except for λ ∈ [400, 500 nm] which have full response.
蓝色除 λ ∈ [400, 500 nm] 有完全响应外,无响应。
green no response except for λ ∈ [500, 600 nm] which have full response.
绿色除 λ ∈ [500, 600 nm] 有完全响应外,无响应。
red no response except for λ ∈ [600, 700 nm] which have full response.
红色除 λ ∈ [600, 700 nm] 有完全响应外,没有响应。
If you initialize all the sensors to zero, then full response just means to increment the number stored in the sensor when the photon hits it.
如果将所有传感器初始化为零,则完全响应仅意味着当光子撞击传感器时增加存储在传感器中的数字。
To trace the photons, pick a random point on the light source, and pick a random direction and a random wavelength between 400 and 700 nm (other wavelengths won’t influence the sensor array so don’t need to be computed). Now trace the ray in the same way as described in Chapter 4. When it hits a surface, compute its reflectance and decide whether to reflect or refract. This decision is made by evaluating the Schlick approximation for that wavelength and incident angle and let’s call that R. Now generate a uniform random number ξ ∈ [0,1).
要追踪光子,请在光源上随机选取一个点,并随机选取一个方向和一个介于 400 到 700 nm 之间的随机波长(其他波长不会影响传感器阵列,因此无需计算)。现在以第 4 章中描述的方式追踪射线。当它撞击表面时,计算其反射率并决定是反射还是折射。此决定是通过评估该波长和入射角的 Schlick 近似来做出的,我们将其称为R。现在生成一个均匀随机数 ξ ∈ [0,1)。
if (ξ < R)
generate a reflection photon (ray) and trace it
else
generate a refraction photon (ray) and trace it
If the surface is metal, then this changes only in that the photon is absorbed in the refraction case and we start a new one at the light.
如果表面是金属,那么变化的唯一原因是光子在折射情况下被吸收,而我们在光处开始一个新的光子。
If the photon enters a dielectric and the Beer’s law coefficient is not one (meaning it absorbs light like the green glass), then the photon might probabilistically be absorbed (and thus die). We use the same basic technique as with deciding between reflection and refraction to decide:
如果光子进入电介质,并且比尔定律系数不是 1(意味着它像绿色玻璃一样吸收光线),那么光子可能会被吸收(从而消亡)。我们使用与决定反射和折射相同的基本技术来决定:
if (ξ < probability of absorption)
absorb and start new photon
else
allow this photon to exit medium at the next hit
The procedure above is all that is needed to create great images! Just it will be slow.
上述步骤就是创建出色图像所需的全部步骤!只是速度会比较慢。
The pinhole camera above will produce very sharp images just like a real pinhole camera will. But it will need long exposures because very few photons are lucky enough to make it through the pinhole before being absorbed.
上面的针孔相机将像真正的针孔相机一样产生非常清晰的图像。但它需要长时间曝光,因为很少有光子能够在被吸收之前幸运地穿过针孔。
We can solve this by adding a lens to the camera by making the pinhole bigger and placing the lens in that hole. We can model a real set of compound glass lens, or we can insert an ideal thin lens.
我们可以通过给相机添加镜头来解决这个问题,方法是将针孔弄大,然后将镜头放在那个孔中。我们可以模拟一组真实的复合玻璃透镜,也可以插入一个理想的薄透镜。
The simplest glass lens we could make it the intersection of two spheres (a “bi-convex spherical lens”). This has decent imaging properties (though not as good as the compound lenses in most real cameras) and is pretty easy to do ray intersection code for.
我们可以把最简单的玻璃透镜做成两个球体的交点(“双凸球面透镜”)。它具有不错的成像特性(尽管不如大多数真实相机中的复合透镜那么好),并且很容易编写射线相交代码。
A thin lens is an idealized lens that is infinitely thin (so would be a disk in a ray tracing program) and is specified only with a radius (physical size of the lens) and a focal length f. The thin lens can be implemented by enforcing the following three properties:
薄透镜是一种理想化的透镜,其厚度无限薄(因此在光线追踪程序中为圆盘),并且仅指定半径(透镜的物理尺寸)和焦距 f。薄透镜可以通过执行以下三个属性来实现:
A ray leaving a 3D point p that goes to the lens center will not be bent (we call this line from that point through the center line of p).
离开三维点p并到达镜头中心的射线不会弯曲(我们称从该点穿过p的中心的这条线为 )。
A all lines leaving p that hits the lens will converge at a point q on the center line.
A 从p出发到达镜头的所有线都会汇聚在中心线上的点q处。
The distance a along the lens optical axis of the point p and the distance b of the point q along the lens optical axis obey the thin lens rule: 1∕a + 1∕b = 1∕f.
点p沿透镜光轴的距离a、点q沿透镜光轴的距离b遵循薄透镜规则:1∕ a +1∕ b =1∕ f 。
By “along the optical axis,” we wean distance along the axis perpendicular to the lens (so “aiming” along the ray).
通过“沿光轴”,我们减少沿垂直于镜头的轴的距离(因此沿射线“瞄准”)。
Note that having a real or ideal lens will automatically give the kind of blurring one gets in real photos, which is usually called defocus blur or depth of field.
请注意,拥有真实或理想的镜头将自动产生真实照片中所呈现的那种模糊,这通常称为散焦模糊或景深。
To account for motion blur, where moving objects are blurred in a photo, or where a moving camera blurs objects not moving at the same speed as a camera (e.g., think of a photo taken from a train where the train interior is sharp and the scenery is blurry). This effect comes automatically if we support two features:
为了解释运动模糊,即照片中的移动物体变得模糊,或者移动相机使移动速度与相机不同物体变得模糊(例如,想象从火车上拍摄的照片,火车内部清晰,而风景模糊)。如果我们支持以下两个功能,此效果就会自动出现:
Photons are emitted from light with a random time within the time interval where the camera is recording (or the camera shutter is open).
光子是在相机记录(或相机快门打开)的时间间隔内以随机时间从光中发射出来的。
The ray tracer has the concept of moving objects to the ray intersection code takes time as an argument.
光线追踪器具有将物体移动到光线相交点的概念,代码以时间作为参数。
A simple example of a moving object would be a sphere whose center follows a line based on time:
运动物体的一个简单例子是球体,其中心随时间沿着一条线运动:
The photon tracer above will work and work well, but will be very slow because even after you insert a lens, most photons will never hit the lens (if you simulate the sun as the light source, you will be lucky to have any of the photons make it to the camera).
上述光子示踪器可以工作并且效果很好,但是速度会非常慢,因为即使在插入镜头后,大多数光子也不会撞击镜头(如果模拟太阳作为光源,那么任何光子能够到达相机就算幸运了)。
Something usually done in graphics is to reverse time and send rays from the camera and record them when they hit the light. This is remarkably easy in the photon tracer we have made. Instead, we send photons (technically adjoint photons) from each pixel and see where they hit light sources. The wavelength of these photons is determined by treating the colored filters as light emission curves, and when we hit a light, we record the photon (on the pixel sensor array) with a weight of the emitted light.
图形学中通常会做一件事,即反转时间并从相机发出光线,并在光线击中光线时记录下来。在我们制作的光子追踪器中,这非常简单。相反,我们从每个像素发出光子(技术上称为伴随光子),并查看它们击中光源的位置。这些光子的波长是通过将彩色滤光片视为光发射曲线来确定的,当我们击中光线时,我们会用发射光的权重记录光子(在像素传感器阵列上)。
In practice, the brute force renderer of the last section is not feasible for applications. So rather than modeling microgeometry, we model bulk behavior. This is done using the the tools for the practical issues of measuring light, usually called radiometry. The terms that arise in radiometry may at first seem strange and have terminology and notation that may be hard to keep straight. The most important quantity of this section is radiance, but most graphics people if they do learn what radiance is map it to the intuitive concepts of brightness/color/intensity, and in practice, this works 99% of the time. But sometimes, the actual definitions are needed we provide them here.
实际上,上一节的强力渲染器对于应用程序来说是不可行的。因此,我们不是建模微几何,而是建模体积行为。这是使用用于测量光的实际问题的工具完成的,通常称为辐射测量。辐射测量中出现的术语乍一看可能很奇怪,而且术语和符号可能很难理解。本节最重要的量是辐射度,但大多数图形人员如果了解了辐射度是什么,会将其映射到亮度/颜色/强度的直观概念上,在实践中,这 99% 的时间都是有效的。但有时,需要实际的定义,我们在这里提供它们。
Although we can define radiometric units in many systems, we use SI (International System of Units) units. Familiar SI units include the metric units of meter (m) and gram (g). Light is fundamentally a propagating form of energy, so it is useful to define the SI unit of energy, which is the joule (J).
虽然我们可以在许多系统中定义辐射单位,但我们使用的是SI (国际单位制)单位。常见的 SI 单位包括公制单位米( m ) 和克( g )。光从根本上来说是一种传播的能量形式,因此定义能量的 SI 单位(焦耳( J ))很有用。
If we have a large collection of photons, their total energy Q can be computed by summing the energy qi of each photon. A reasonable question to ask is “How is the energy distributed across wavelengths?” An easy way to answer this is to partition the photons into bins, essentially histogramming them. We then have an energy associated with an interval. For example, we can count all the energy between λ = 500 nm and λ = 600 nm and have it turn out to be 10.2 J, and this might be denoted q[500,600] = 10.2. If we divided the wavelength interval into two 50 nm intervals, we might find that q[500,550] = 5.2 and q[550,600] = 5.0. This tells us there was a little more energy in the short wavelength half of the interval [500,600]. If we divide into 25 nm bins, we might find q[500,525] = 2.5, and so on. The nice thing about the system is that it is straightforward. The bad thing about it is that the choice of the interval size determines the number.
如果我们有大量的光子,它们的总能量Q可以通过将每个光子的能量q相加来计算。一个合理的问题是“能量在各个波长上是如何分布的?”回答这个问题的一个简单方法是将光子分成不同的箱体,本质上是对它们进行直方图化。然后我们得到与间隔相关的能量。例如,我们可以计算 λ = 500 nm 和 λ = 600 nm 之间的所有能量,结果是 10.2 J,这可以表示为q [500,600] = 10.2。如果我们将波长间隔分成两个 50 nm 的间隔,我们可能会发现q [500,550] = 5.2 和q [550,600] = 5.0。这告诉我们在间隔 [500,600] 的短波长一半中能量稍多一些。如果我们将其分成 25 nm 个区间,我们可能会发现q [500,525] = 2.5,依此类推。该系统的优点在于它很简单。缺点在于区间大小的选择决定了数字。
A more commonly used system is to divide the energy by the size of the interval. So instead of q[500,600] = 10.2, we would have
更常用的系统是将能量除以间隔的大小。因此,我们不会得到q [500,600] = 10.2,而是
This approach is nice, because the size of the interval has much less impact on the overall size of the numbers. An immediate idea would be to drive the interval size Δλ to zero. This could be awkward, because for a sufficiently small Δλ, Qλ will either be zero or huge depending on whether there is a single photon or no photon in the interval. There are two schools of thought to solve that dilemma. The first is to assume that Δλ is small, but not so small that the quantum nature of light comes into play. The second is to assume that the light is a continuum rather than individual photons, so a true derivative dQ∕dλ is appropriate. Both ways of thinking about it are appropriate and lead to the same computational machinery. In practice, it seems that most people who measure light prefer small, but finite, intervals, because that is what they can measure in the lab. Most people who do theory or computation prefer infinitesimal intervals, because that makes the machinery of calculus available.
这种方法很好,因为间隔的大小对数字的总体大小影响要小得多。一个直接的想法是将间隔大小 Δλ 设为零。这可能很尴尬,因为对于足够小的 Δλ,Q λ要么为零,要么为巨大,这取决于间隔中是否有单个光子或没有光子。有两种思想流派可以解决这个难题。第一种是假设 Δλ 很小,但不会小到光的量子性质发挥作用。第二种是假设光是连续体而不是单个光子,因此真正的导数 dQ∕dλ 是合适的。这两种思考方式都是合适的,并且会导致相同的计算机制。在实践中,似乎大多数测量光的人都喜欢小而有限的间隔,因为这是他们可以在实验室中测量的。大多数从事理论或计算的人更喜欢无穷小的间隔,因为这样就可以使用微积分机制。
The quantity Qλ is called spectral energy, and it is an intensive quantity as opposed to an extensive quantity such as energy, length, or mass. Intensive quantities can be thought of as density functions that tell the density of an extensive quantity at an infinitesimal point. For example, the energy Q at a specific wavelength is probably zero, but the spectral energy (energy density) Qλ is a meaningful quantity. A probably more familiar example is that the population of a country may be 25 million, but the population at a point in that country is meaningless. However, the population density measured in people per square meter is meaningful, provided it is measured over large enough areas. Much like with photons, population density works best if we pretend that we can view population as a continuum where population density never becomes granular even when the area is small.
这个量Q λ被称为谱能量,它是一个强度量,与能量、长度或质量等广延量相对。强度量可以看作是密度函数,它表示广延量在无穷小点的密度。例如,特定波长下的能量Q可能为零,但谱能量(能量密度) Q λ是一个有意义的量。一个可能更为熟悉的例子是,一个国家的人口可能是 2500 万,但该国某一点的人口毫无意义。然而,以每平方米人口数来衡量的人口密度是有意义的,只要它是在足够大的区域内测量的。与光子非常相似,如果我们假设可以将人口视为一个连续体,即使面积很小,人口密度也不会变得颗粒状,那么人口密度的效果最好。
We will follow the convention of graphics where spectral energy is almost always used, and energy is rarely used. This results in a proliferation of λ subscripts if “proper” notation is used. Instead, we will drop the subscript and use Q to denote spectral energy. This can result in some confusion when people outside of graphics read graphics papers, so be aware of this standards issue. Your intuition about spectral energy might be aided by imagining a measurement device with a sensor that measures light energy Δq. If you place a colored filter in front of the sensor that allows only light in the interval [λ - Δλ∕2,λ + Δλ∕2], then the spectral energy at λ is Q = Δq∕Δλ.
我们将遵循图形学的惯例,其中几乎总是使用光谱能量,而很少使用能量。如果使用“正确”的符号,这会导致 λ 下标的激增。相反,我们将删除下标并使用Q来表示光谱能量。这可能会使图形学以外的人阅读图形论文时产生一些混淆,因此请注意这个标准问题。想象一个带有传感器的测量设备可能会有助于您理解光谱能量。如果您在传感器前面放置一个彩色滤光片,只允许区间 [λ - Δλ∕ 2 ,λ + Δλ∕2] 内的光通过,则 λ 处的光谱能量为Q = Δ q ∕Δλ。
It is useful to estimate a rate of energy production for light sources. This rate is called power, and it is measured in Watts, W, which is another name for joules per second. This is easiest to understand in a steady state, but because power is an intensive quantity (a density over time), it is well defined even when energy production is varying over time. The units of power may be more familiar, e.g., a 100-watt light bulb. Such bulbs draw approximately 100 J of energy each second. The power of the light produced will actually be less than 100 W because of heat loss, etc., but we can still use this example to help understand more about photons. For example, we can get a feel for how many photons are produced in a second by a 100 W light. Suppose the average photon produced has the energy of a λ = 500 nm photon. The frequency of such a photon is
估算光源的能量产生率很有用。该速率称为功率,以瓦特( W )为单位,这是焦耳/秒的另一个名称。这在稳定状态下最容易理解,但由于功率是一个密集量(随时间变化的密度),因此即使能量产生随时间变化,功率也有明确的定义。功率的单位可能更为熟悉,例如 100 瓦的灯泡。这种灯泡每秒消耗约 100 焦耳的能量。由于热损失等原因,产生的光的功率实际上将小于 100 W,但我们仍然可以使用此示例来帮助更多地了解光子。例如,我们可以了解 100 W 灯在一秒钟内产生了多少个光子。假设产生的平均光子具有 λ = 500 nm 光子的能量。这种光子的频率为
The energy of that photon is hf ≈ 4 × 10-19 J. That means a staggering 1020 photons are produced each second, even if the bulb is not very efficient. This explains why simulating a camera with a fast shutter speed and directly simulated photons is an inefficient choice for producing images.
该光子的能量为hf ≈ 4 × 10 -19 J。这意味着即使灯泡效率不高,每秒也会产生惊人的 10 20 个光子。这解释了为什么模拟具有快速快门速度的相机并直接模拟光子对于生成图像而言是一种低效的选择。
As with energy, we are really interested in spectral power measured in W(nm)-1. Again, although the formal standard symbol for spectral power is Φλ, we will use Φ with no subscript for convenience and consistency with most of the graphics literature. One thing to note is that the spectral power for a light source is usually a smaller number than the power. For example, if a light emits a power of 100 W evenly distributed over wavelengths 400–800 nm, then the spectral power will be 100 W/400 nm = 0.25 W(nm)-1. This is something to keep in mind if you set the spectral power of light sources by hand for debugging purposes.
与能量一样,我们真正感兴趣的是光谱功率(以 W(nm) -1为单位)。同样,尽管光谱功率的正式标准符号是 Φ λ ,但为了方便起见并与大多数图形文献保持一致,我们将使用不带下标的 Φ。需要注意的一点是,光源的光谱功率通常小于功率。例如,如果光源发出的功率为 100 W,均匀分布在 400-800 nm 的波长上,则光谱功率为 100 W/400 nm = 0.25 W(nm) -1 。如果您出于调试目的手动设置光源的光谱功率,则需要记住这一点。
The measurement device for spectral energy in the last section could be modified by taking a reading with a shutter that is open for a time interval Δt centered at time t. The spectral power would then be Φ = Δq∕(ΔtΔλ).
上一节中光谱能量的测量装置可以进行修改,使用以时间 t 为中心的时间间隔 Δt 的快门进行读数。光谱功率将为 Φ = Δq∕(ΔtΔλ)。
The quantity irradiance arises naturally if you ask the question “How much light hits this point?” Of course, the answer is “none,” and again, we must use a density function. If the point is on a surface, it is natural to use area to define our density function. We modify the device from the last section to have a finite ΔA area sensor that is smaller than the light field being measured. The spectral irradiance H is just the power per unit area ΔΦ∕ΔA. Fully expanded this is
如果您问“有多少光照射到这个点?”,那么辐照度这个量自然就会出现。当然,答案是“没有”,同样,我们必须使用密度函数。如果该点位于表面上,那么使用面积来定义我们的密度函数是很自然的。我们修改了上一节中的设备,使其具有一个有限的 Δ A面积传感器,该传感器小于被测光场。光谱辐照度 H 就是单位面积的功率 ΔΦ∕Δ A 。完全展开后为
Thus, the full units of irradiance are J m-2s-1(nm)-1. Note that the SI units for radiance include inverse-meter-squared for area and inverse-nanometer for wavelength. This seeming inconsistency (using both nanometer and meter) arises because of the natural units for area and visible light wavelengths.
因此,辐照度的完整单位是 J m -2 s -1 (nm) -1 。请注意,辐照度的 SI 单位包括面积的平方米倒数和波长的纳米倒数。这种看似不一致的情况(同时使用纳米和米)是由于面积和可见光波长的自然单位而产生的。
When the light is leaving a surface, e.g., when it is reflected, the same quantity as irradiance is called radiant exitance, E. It is useful to have different words for incident and exitant light, because the same point has potentially different irradiance and radiant exitance.
当光离开表面时,例如当它被反射时,与辐照度相同的量称为辐射出射度,E。对入射光和出射光使用不同的词很有用,因为同一点可能具有不同的辐照度和辐射出射度。
Although irradiance tells us how much light is arriving at a point, it tells us little about the direction that light comes from. To measure something analogous to what we see with our eyes, we need to be able to associate “how much light” with a specific direction. We can imagine a simple device to measure such a quantity (Figure 14.6). We use a small irradiance meter and add a conical “baffler” which limits light hitting the counter to a range of angles with solid angle Δσ. The response of the detector is as follows:
虽然辐照度告诉我们有多少光到达某一点,但它几乎不能告诉我们光来自哪个方向。要测量类似于我们用眼睛看到的东西,我们需要能够将“光量”与特定方向联系起来。我们可以想象一个简单的设备来测量这样的量(图 14.6 )。我们使用一个小型辐照度计,并添加一个锥形“挡板”,将照射到计数器上的光限制在立体角为 Δσ 的角度范围内。探测器的响应如下:
Figure 14.6. By adding a blinder that shows only a small solid angle Δσ to the irradiance detector, we measure radiance.
图 14.6.通过向辐照度探测器添加一个仅显示小立体角 Δσ 的遮光器,我们可以测量辐照度。
This is the spectral radiance of light traveling in space. Again, we will drop the “spectral” in our discussion and assume that it is implicit.
这是光在空间中传播的光谱辐射度。同样,我们将在讨论中放弃“光谱”一词,并假设它是隐含的。
Radiance is what we are usually computing in graphics programs. A wonderful property of radiance is that it does not vary along a line in space. To see why this is true, examine the two radiance detectors both looking at a surface as shown in Figure 14.7. Assume the lines the detectors are looking along are close enough together that the surface is emitting/reflecting light “the same” in both of the areas being measured. Because the area of the surface being sampled is proportional to squared distance, and because the light reaching the detector is inversely proportional to squared distance, the two detectors should have the same reading.
辐射度是我们通常在图形程序中计算的。辐射度的一个奇妙特性是它不会沿空间中的一条线变化。要了解为什么这是事实,请检查两个辐射度检测器,它们都观察一个表面,如图 14.7所示。假设检测器观察的线足够近,以至于表面在两个测量区域中发射/反射的光“相同”。因为被采样表面的面积与平方距离成正比,并且到达检测器的光与平方距离成反比,所以两个检测器应该具有相同的读数。
Figure 14.7. The signal a radiance detector receives does not depend on the distance to the surface being measured. This figure assumes the detectors are pointing at areas on the surface that are emitting light in the same way.
图 14.7。辐射探测器接收的信号与被测表面的距离无关。该图假设探测器指向以相同方式发光的表面区域。
It is useful to measure the radiance hitting a surface. We can think of placing the cone baffler from the radiance detector at a point on the surface and measuring the irradiance H on the surface originating from directions within the cone (Figure 14.8). Note that the surface “detector” is not aligned with the cone. For this reason, we need to add a cosine correction term to our definition of radiance:
测量照射到表面的辐射度很有用。我们可以将辐射度探测器的锥形挡板放置在表面上的某个点,然后测量来自锥体内方向的表面辐射度H (图 14.8 )。请注意,表面“探测器”未与锥体对齐。因此,我们需要在辐射度定义中添加余弦校正项:
Figure 14.8. The irradiance at the surface as masked by the cone is smaller than that measured at the detector by a cosine factor.
图 14.8.被锥体遮挡的表面辐照度比探测器测量的辐照度小一个余弦因子。
As with irradiance and radiant exitance, it is useful to distinguish between radiance incident at a point on a surface and exitant from that point. Terms for these concepts sometimes used in the graphics literature are surface radiance Ls for the radiance of (leaving) a surface, and field radiance Lf for the radiance incident at a surface. Both require the cosine term, because they both correspond to the configuration in Figure 14.8:
与辐照度和辐射出射度一样,区分入射到表面某点的辐射度和从该点发出的辐射度很有用。图形文献中有时使用这些概念的术语,如表面辐射度L s表示(离开)表面的辐射度,场辐射度L f表示入射到表面的辐射度。两者都需要余弦项,因为它们都对应于图 14.8中的配置:
If we have a surface whose field radiance is Lf, then we can derive all of the other radiometric quantities from it. This is one reason radiance is considered the “fundamental” radiometric quantity. For example, the irradiance can be expressed as
如果我们有一个表面,其场辐射度为L f ,那么我们可以从中推导出所有其他辐射量。这就是辐射度被视为“基本”辐射量的原因之一。例如,辐照度可以表示为
This formula has several notational conventions that are common in graphics that make such formulae opaque to readers not familiar with them (Figure 14.9). First, k is an incident direction and can be thought of as a unit vector, a direction, or a (θ,ϕ) pair in spherical coordinates with respect to the surface normal. The direction has a differential solid angle dσ associated with it. The field radiance is potentially different for every direction, so we write it as a function L(k).
此公式有几个图形学中常见的符号约定,使得不熟悉这些公式的读者难以理解(图 14.9 )。首先, k是入射方向,可以被认为是相对于表面法线的球坐标中的单位向量、方向或 ( θ , ϕ ) 对。该方向具有与之相关的微分立体角 dσ。场辐射率对于每个方向都可能不同,因此我们将其写为函数 L( k )。
Figure 14.9. The direction k has a differential solid angle dσ associated with it.
图 14.9.方向k具有与之相关的微分立体角d σ。
As an example, we can compute the irradiance H at a surface that has constant field radiance Lf in all directions. To integrate, we use a classic spherical coordinate system and recall that the differential solid angle is
例如,我们可以计算表面的辐照度 H,该表面在所有方向上的场辐射度L f均为常数。为了积分,我们使用经典的球面坐标系,并回想一下微分立体角是
so the irradiance is
所以辐照度是
This relation shows us our first occurrence of a potentially surprising constant π. These factors of π occur frequently in radiometry and are an artifact of how we chose to measure solid angles; i.e., the area of a unit sphere is a multiple of π rather than a multiple of one.
这种关系首次向我们展示了可能令人惊讶的常数 π。这些 π 因子在辐射测量中经常出现,是我们选择测量立体角的方式的产物;即单位球面的面积是 π 的倍数,而不是 1 的倍数。
Similarly, we can find the power hitting a surface by integrating the irradiance across the surface area:
类似地,我们可以通过积分整个表面面积的辐照度来找到照射到表面的功率:
where x is a point on the surface, and dA is the differential area associated with that point. Note that we don’t have special terms or symbols for incoming versus outgoing power. That distinction does not seem to come up enough to have encouraged the distinction.
其中x是表面上的一个点, dA是与该点相关的微分面积。请注意,我们没有专门的术语或符号来表示输入功率和输出功率。这种区别似乎不足以鼓励人们进行区分。
The photon tracing of Section 14.5 assumed all surfaces are smooth at the level of ray interactions, and we assume potentially complex geometry with very fine geometric details. In practice, the bulk properties of a region are averaged to make an area behave like the fine geometry but without storing it. The most important concept of this section is that for a rough surface (e.g., brushed steel), rather than representing all the tiny scratches with actual geometry, we statistically characterize the scratches and make the smooth surface randomly reflect light in multiple directions as if there were invisibly small details in the surface. This function is called the bidirectional reflectance distribution function (BRDF). The second important concept is a very similar idea for how light is scattered in a volume (e.g., the bubbles in an ice cube or the water droplets in a cloud). Rather than representing all the little particles/bubbles/droplets in the volume, we just make a statistical model of how likely light is to scatter in any given 3D location, how likely it is to be absorbed, and what is the directional distribution of the scattered light. That directional distribution is just a PDF and is called the phase function of the volume. We don’t discuss phase functions or volume transport further; see the chapter notes for more information on those subjects.
14.5 节的光子追踪假设所有表面在射线相互作用的层面上都是光滑的,并且我们假设潜在的复杂几何形状具有非常精细的几何细节。实际上,区域的整体属性被平均化,以使区域的行为类似于精细几何形状,但不存储它。本节最重要的概念是,对于粗糙表面(例如,拉丝钢),我们不是用实际几何形状来表示所有微小划痕,而是用统计方式表征划痕,并使光滑表面随机地向多个方向反射光线,就好像表面上有看不见的小细节一样。这个函数称为双向反射分布函数(BRDF)。第二个重要概念与光在体积中散射的方式非常相似(例如冰块中的气泡或云中的水滴)。我们不是代表体积中的所有小颗粒/气泡/水滴,而是建立一个统计模型,以说明光在任何给定的 3D 位置散射的可能性、被吸收的可能性以及散射光的方向分布。该方向分布只是一个 PDF,称为体积的相位函数。我们不进一步讨论相位函数或体积传输;有关这些主题的更多信息,请参阅章节注释。
Because we are interested in surface appearance, we would like to characterize how a surface reflects light. At an intuitive level, for any incident light coming from direction ki, there is some fraction scattered in a small solid angle near the outgoing direction ko. There are many ways we could formalize such a concept, and not surprisingly, the standard way to do so is inspired by building a simple measurement device. Such a device is shown in Figure 14.10, where a small light source is positioned in direction ki as seen from a point on a surface, and a detector is placed in direction ko. For every directional pair (ki,ko), we take a reading with the detector.
因为我们对表面外观感兴趣,所以我们想描述表面如何反射光。从直观的层面上讲,对于来自方向k的任何入射光,在出射方向k o附近的小立体角内都会有一部分散射光。我们可以通过多种方式形式化这一概念,而标准方式不出意料地是受构建一个简单的测量设备的启发。图 14.10显示了这样的设备,其中从表面上的某个点看,一个小光源位于方向k处,探测器位于方向k o处。对于每个方向对 ( k , k o ),我们都使用探测器进行读数。
Figure 14.10. A simple measurement device for directional reflectance. The positions of light and detector are moved to each possible pair of directions. Note that both ki and ko point away from the surface to allow reciprocity.
图 14.10.简单的定向反射测量装置。光和探测器的位置移动到每对可能的方向。请注意, k和k o都指向远离表面的方向以允许互易。
Now we just have to decide how to measure the strength of the light source and make our reflection function independent of this strength. For example, if we replaced the light with a brighter light, we would not want to think of the surface as reflecting light differently. We could place a radiance meter at the point being illuminated to measure the light. However, for this to get an accurate reading that would not depend on the Δσ of the detector, we would need the light to subtend a solid angle bigger than Δσ. Unfortunately, the measurement taken by our roving radiance detector in direction ko will also count light that comes from points outside the new detector’s cone. So this does not seem like a practical solution.
现在我们只需决定如何测量光源的强度,并使我们的反射函数独立于该强度。例如,如果我们用更亮的光替换光源,我们不会认为表面反射光的方式不同。我们可以在被照亮的点放置一个辐射计来测量光。但是,为了获得不依赖于探测器 Δσ 的准确读数,我们需要光所对的立体角大于 Δσ。不幸的是,我们的移动辐射探测器在k o方向进行的测量也会计算来自新探测器锥体外部点的光。所以这似乎不是一个实用的解决方案。
Alternatively, we can place an irradiance meter at the point on the surface being measured. This will take a reading that does not depend strongly on subtleties of the light source geometry. This suggests characterizing reflectance as a ratio:
或者,我们可以将辐照度计放置在被测表面上的点上。这样得到的读数不会很大程度上依赖于光源几何形状的细微差别。这表明将反射率表征为一个比率:
where this fraction ρ will vary with incident and exitant directions ki and ko, H is the irradiance for light position ki, and Ls is the surface radiance measured in direction ko. If we take such a measurement for all direction pairs, we end up with a 4D function ρ(ki,ko). This function is called the bidirectional reflectance distribution function (BRDF). The BRDF is all we need to know to characterize the directional properties of how a surface reflects light.
其中,该分数 ρ 会随着入射和出射方向k和k o而变化,H 是光源位置k处的辐照度,L s是在方向k o 处测量的表面辐射度。如果我们对所有方向对进行这样的测量,我们最终会得到一个四维函数 ρ( k , k o )。该函数称为双向反射分布函数(BRDF)。我们只需了解 BRDF 即可表征表面反射光的方向性。
Given a BRDF, it is straightforward to ask, “What fraction of incident light is reflected?” However, the answer is not so easy; the fraction reflected depends on the directional distribution of incoming light. For this reason, we typically only set a fraction reflected for a fixed incident direction ki. This fraction is called the directional hemispherical reflectance. This fraction, R(ki), is defined by
给定一个 BRDF,我们很容易问:“入射光中有多少部分被反射?”然而,答案并不那么简单;反射的比例取决于入射光的方向分布。因此,我们通常只为固定的入射方向k设置一个反射比例。这个比例称为定向半球反射率。这个比例R ( k ) 定义为
Note that this quantity is between zero and one for reasons of energy conservation. If we allow the incident power Φi to hit on a small area ΔA, then the irradiance is Φi∕ΔA. Also, the ratio of the incoming power is just the ratio of the radiant exitance to irradiance:
请注意,出于能量守恒的原因,该量介于 0 和 1 之间。如果我们允许入射功率 Φ 照射到小面积 Δ A上,则辐照度为 Φ∕Δ A 。此外,入射功率的比率只是辐射出射度与辐照度的比率:
The radiance in a particular direction resulting from this power is by the definition of BRDF:
根据 BRDF 的定义,由该功率产生的特定方向的辐射度:
And from the definition of radiance, we also have
根据辐射的定义,我们还有
where E is the radiant exitance of the small patch in direction ko. Using these two definitions for radiance, we get
其中E是小斑块在k o方向上的辐射出射度。使用这两个辐射度定义,我们得到
Rearranging terms, we get
重新排列项,我们得到
This is just the small contribution to E∕H that is reflected near the particular ko. To find the total R(ki), we sum over all outgoing ko. In integral form, this is
这只是E∕H的微小贡献,反映在特定的k o附近。为了找到总R ( k ),我们将所有传出的k o相加。以积分形式,这是
An idealized diffuse surface is called Lambertian. Such surfaces are impossible in nature for thermodynamic reasons, but mathematically, they do conserve energy. The Lambertian BRDF has ρ equal to a constant for all angles. This means the surface will have the same radiance for all viewing angles, and this radiance will be proportional to the irradiance.
理想化的漫反射表面称为朗伯表面。由于热力学原因,这种表面在自然界中不可能存在,但从数学上讲,它们确实节约能源。朗伯 BRDF 的 ρ 在所有角度下都等于一个常数。这意味着表面在所有视角下都具有相同的辐射度,并且该辐射度与辐照度成正比。
If we compute R(ki) for a Lambertian surface with ρ = C, we get
如果我们计算ρ = C的朗伯表面的R ( k ) ,我们得到
Thus, for a perfectly reflecting Lambertian surface (R = 1), we have ρ = 1∕π, and for a Lambertian surface where R(ki) = r, we have
因此,对于完美反射的朗伯表面 ( R = 1 ),我们有 ρ = 1∕π;对于R ( k ) = r的朗伯表面,我们有
This is another example where the use of a steradian for the solid angle determines the normalizing constant and thus introduces factors of π.
这是另一个例子,其中使用立体角的立体角决定了标准化常数,从而引入了 π 因子。
With the definition of BRDF, we can describe the radiance of a surface in terms of the incoming radiance from all different directions. Because in computer graphics, we can use idealized mathematics that might be impractical to instantiate in the lab, we can also write the BRDF in terms of radiance only. If we take a small part of the light with solid angle Δσi with radiance Li and “measure” the reflected radiance in direction ko due to this small piece of the light, we can compute a BRDF (Figure 14.11). The irradiance due to the small piece of light is H = Li cosθiΔσi. Thus, the BRDF is
根据 BRDF 的定义,我们可以根据来自各个方向的入射辐射来描述表面的辐射度。因为在计算机图形学中,我们可以使用理想化的数学,但在实验室中可能无法实现,所以我们也可以仅用辐射度来表示 BRDF。如果我们取一小部分立体角为 Δσ、辐射度为 L 的光,并“测量”这一小部分光在方向k o上的反射辐射度,我们就可以计算出 BRDF(图 14.11 )。这一小部分光的辐照度为 H = L cosθΔσ。因此,BRDF 为
Figure 14.11. The geometry for the transport equation in its directional form.
图 14.11.方向形式的传输方程的几何形状。
This form can be useful in some situations. Rearranging terms, we can write down the part of the radiance that is due to light coming from direction ki:
这种形式在某些情况下很有用。重新排列项,我们可以写下来自方向k 的光产生的辐射部分:
If there is light coming from many directions Li(ki), we can sum all of them. In integral form, with notation for surface and field radiance, this is
如果有来自多个方向的光 L( k ),我们可以将它们全部相加。以积分形式表示,并标明表面和场辐射度,则为
This is often called the rendering equation (Kajiya, 1986) in computer graphics and was first written in the form by (Immel, Cohen, & Greenberg, 1986).
这通常被称为计算机图形学中的渲染方程(Kajiya,1986)首次由(Immel,Cohen,& Greenberg,1986)以如下形式写出。
Sometimes, it is useful to write the transport equation in terms of surface radiances only. Note, that in a closed environment, the field radiance Lf(ki) comes from some surface with surface radiance Ls(-ki) = Lf(ki) (Figure 14.12). The solid angle subtended by the point x′ in the figure is given by
有时,仅以表面辐射度的形式写出传输方程很有用。请注意,在封闭环境中,场辐射度 L f ( k ) 来自某个表面,表面辐射度为 L s (- k ) = L f ( k )(图 14.12 )。图中点x ′ 所对的立体角由下式给出
Figure 14.12. The light coming into one point comes from another point.
图 14.12。进入一个点的光线来自另一个点。
where ΔA′ is the area we associate with x′. Substituting for Δσi in terms of ΔA′ suggests the following transport equation:
其中 Δ A ′ 是与x ′ 相关的面积。用 ΔA′ 代替 Δσ 可得到以下传输方程:
Note that we are using a non-normalized vector x -x′ to indicate the direction from x′ to x. Also note that we are writing Ls as a function of position and direction.
请注意,我们使用非规范化向量x - x ′ 来表示从x ′ 到x的方向。另请注意,我们将 L s写为位置和方向的函数。
The only problem with this new transport equation is that the domain of integration is awkward. If we introduce a visibility function, we can trade off complexity in the domain with complexity in the integrand:
这个新传输方程的唯一问题是积分域很尴尬。如果我们引入可见性函数,我们可以在域的复杂性和被积函数的复杂性之间进行权衡:
where
在哪里
Many real materials have a visible structure at normal viewing distances. For example, most carpets have easily visible pile that contributes to appearance. For our purposes, such structure is not part of the material property but is, instead, part of the geometric model. Structure whose details are invisible at normal viewing distances, but which do determine macroscopic material appearance, is part of the material property. For example, the fibers in paper have a complex appearance under magnification, but they are blurred together into an homogeneous appearance when viewed at arm’s length. This distinction between microstructure that is folded into BRDF is somewhat arbitrary and depends on what one defines as “normal” viewing distance and visual acuity, but the distinction has proven quite useful in practice.
许多真实材料在正常观察距离下具有可见的结构。例如,大多数地毯都有明显可见的绒头,这有助于形成外观。就我们的目的而言,这种结构不是材料属性的一部分,而是几何模型的一部分。结构的细节在正常观察距离下不可见,但确实决定了宏观材料外观,因此是材料属性的一部分。例如,纸张中的纤维在放大后具有复杂的外观,但在手臂长度处观察时,它们会模糊地融合成均匀的外观。这种折叠成 BRDF 的微结构之间的区别有些武断,取决于人们定义的“正常”观察距离和视觉敏锐度,但这种区别在实践中已被证明非常有用。
There are many BRDF models in the literature and used in the industry. There are many fields that use BRDF models including remote sensing, heat transfer, materials science, and of course, computer graphics. Unfortunately, there not a standard set of terms across these fields but also not even within graphics. However, these are some terms most people agree on and those are shown in Figure 14.13.
文献中和业界使用的许多 BRDF 模型。许多领域都使用 BRDF 模型,包括遥感、传热、材料科学,当然还有计算机图形学。不幸的是,这些领域没有一套标准的术语,甚至图形学中也没有。然而,这些是大多数人都同意的一些术语,如图 14.13所示。
Figure 14.13. A taxonomy of material terms advocated by McGuire et al. (2020)
图 14.13. McGuire 等人 (2020) 提出的材料术语分类
Practice in the industry has been to classify materials into these categories and has a different BRDF model for each term. These models terms are then combined with either constants or simple weights determined by Fresnel (Schlick) variation. A key question is then what terms to use. Increasingly practice has settled on variations of the Burley/Disney model developed for computer-generated animation (Burley, 2012), but it is widely adopted now in games and product design as well. It includes terms for each of the major categories in Figure 14.13, as well as an additional lobe: sheen. That was not included in the taxonomy of Figure 14.13 because it is not a term that is yet widely used inside graphics, and not used as a technical term outside of graphics. The sheen term is used to account for grazing angle effects such as rim lighting that can be seen especially on skin and fabrics, and more information can be found in Estevez, Imageworks, and Kulla (n.d.). The retro-reflective terms are very important in some circumstances, but is not very commonly used in graphics.
行业惯例是将材质分为这些类别,并为每个术语使用不同的 BRDF 模型。然后将这些模型术语与由菲涅尔 (Schlick) 变化确定的常量或简单权重相结合。那么关键问题是使用什么术语。越来越多的实践已经确定了为计算机生成动画开发的 Burley/Disney 模型的变体 (Burley, 2012),但现在它也被广泛用于游戏和产品设计。它包括图 14.13中每个主要类别的术语,以及一个附加叶:光泽。这未包含在图 14.13的分类法中,因为它不是图形内部广泛使用的术语,并且未用作图形之外的技术术语。光泽术语用于解释掠射角效应,例如尤其在皮肤和织物上可以看到的边缘照明,更多信息可以在 Estevez、Imageworks 和 Kulla (nd) 中找到。在某些情况下,回射术语非常重要,但在图形中并不常用。
The two most universal terms used in BRDF models in practice are the diffuse and glossy lobes. A constant is often used for diffuse, but there are other variations that get darker at grazing angles where the glossy lobe takes over. The most dominant form of glossy lobe at present is the GGX microfacet lobe (Walter et al., 2007). The microfacet lobes are very closely related to our brute force representation of a rough surface, but just find the statistical distribution of surface normals of the microfacets. Note that these microfacet methods are an approximation to an actual surface with microfacets. This subject is discussed in depth with an eye toward implementation by Heitz (2014).
实践中,BRDF 模型中最常用的两个术语是漫反射和光泽叶瓣。通常使用常数表示漫反射,但也有其他变化,在光泽叶瓣占据主导地位的掠射角处会变得更暗。目前光泽叶瓣最主要的形式是GGX 微表面叶瓣 (Walter 等人,2007)。微表面叶瓣与我们对粗糙表面的强力表示非常接近,但只需找到微表面表面法线的统计分布即可。请注意,这些微表面方法是对具有微表面的实际表面的近似。Heitz (2014) 深入讨论了这个主题,并着眼于实现。
Once we have expressed lighting as an integral, we can solve it using Monte Carlo Integration (see Section 2.12). Recall that for an integral of f(x), the Monte Carlo integral is just the average from many random samples:
一旦我们将照明表示为积分,我们就可以使用蒙特卡洛积分来求解它(参见第 2.12 节)。回想一下,对于 f(x) 的积分,蒙特卡洛积分只是来自许多随机样本的平均值:
where p is a probability density function over the domain S. If we apply that to Equation 14.4, then for one random sample direction q, we get
其中p是域S上的概率密度函数。如果我们将其代入公式 14.4,那么对于一个随机样本方向q ,我们得到
Note for a less noisy image, we would average over many noisy samples. So how do we do that? First, we need a way to generate random directions for some PDF p. Remember p can be any valid PDF, so let’s do uniform: p = 1∕(2π∕). That is the value because the integral is over “the solid angle of all incoming directions above the surface” and the solid angle is the area of the projection onto the unit sphere, and the directions “above” the surface are half the sphere, which has an area of 2π.
请注意,对于噪声较少的图像,我们会对许多噪声样本取平均值。那么我们该怎么做呢?首先,我们需要一种方法来为某个 PDF p生成随机方向。请记住, p可以是任何有效的 PDF,因此我们将其设为均匀分布: p = 1∕(2π∕)。之所以取该值,是因为积分是在“表面上方所有入射方向的立体角”上进行的,立体角是投影到单位球面上的面积,而表面“上方”的方向是球面的一半,球面的面积为 2π。
In code, this would look something like this for an incident ray direction a:
在代码中,对于事件射线方向a来说,这看起来像这样:
pick random direction q
随机选取方向q
color = ρ(q,a)Lf(q)cosθi∕p(q)
颜色 = ρ( q , a ) L f ( q )cosθ∕ p ( q )
But what is Lf, the color coming from direction q? We can in fact apply Monte Carlo integration recursively (this is not obvious, but can be shown using the property of random variables that expected values sum even when the terms of the sum are not independent). If we write a function L(o,d) that returns the color at a point p coming from the direction d, and add emitted light so there is something to see, then we can write a recursive function:
但是L f ,即来自方向q 的颜色,又是什么呢?事实上,我们可以递归地应用蒙特卡罗积分(这并不明显,但可以利用随机变量的性质来证明,即使和的项不独立,期望值也会相加)。如果我们编写一个函数L ( o , d ),返回来自方向d 的点p处的颜色,并添加发射光,这样就可以看见一些东西,那么我们可以编写一个递归函数:
function rgb radiance(o, d)
函数rgb 辐射度( o , d )
if ray o + td hits something then
如果射线o + t d击中某物,则
p = hit point
p = 生命值
q = random direction
q = 随机方向
return emitted(p) + (ρ(q,–d)cosθi∕p(q))* radiance(p, q)
返回发射( p )+(ρ( q , -d )cosθ∕p( q ))*辐射度( p , q )
else
别的
return background(o, d)
返回背景( o , d )
Note that if the environment is closed that function never terminates, so some termination bailout should be added for closed environments. The background function could either return a constant or look up into an environment map or other function that varies with direction. Finally, there don’t need to be lights other than the background to get good pictures.
请注意,如果环境是封闭的,该函数将永不终止,因此应为封闭环境添加一些终止救助。背景函数可以返回一个常数,也可以查找环境地图或其他随方向变化的函数。最后,除了背景之外,不需要其他灯光就可以获得良好的图片。
You may note this is very similar to the adjoint photon tracer developed earlier in this chapter. The method of derivation is different, taking an explicit radiometric integration approach, but it does reach a similar conclusion.
你可能会注意到这与本章前面开发的伴随光子示踪器非常相似。推导方法不同,采用显式辐射积分方法,但确实得出了类似的结论。
In practice, that algorithm will be very noisy when the lights are small because the values of emitted() can be large and for small lights rare. Instead, people often break out direct lighting by computing the contribution of light emitting objects separately. For example,
实际上,当灯光较小时,该算法会非常嘈杂,因为emit()的值可能很大,而对于小灯光来说很少见。相反,人们经常通过单独计算发光物体的贡献来分解直接照明。例如,
function rgb radiance(o, d)
函数rgb 辐射度( o , d )
if ray o + td hits something then
如果射线o + t d击中某物,则
p = hit point
p = 生命值
q = random direction
q = 随机方向
return directLightAt(p) + (ρ(q,–d)cosθi∕p(q))* radiance(p, q)
返回 directLightAt( p ) + (ρ( q , –d )cosθ∕p( q ))* radiance( p , q )
else
别的
return background(o, d)
返回背景( o , d )
Here, directLightAt(p) computes the direct lighting, i.e., the color due to photons that leave the lights and get to p without any intervening surfaces. This code has the idiosyncrasy that lights seen directly will look black (there is no explicit emitted term in the code) so there in practice real codes need to deal with this somehow.
这里,directLightAt( p ) 计算直接照明,即光子离开光源并到达p而不经过任何中间表面而产生的颜色。此代码具有以下特性:直接看到的光源看起来是黑色(代码中没有明确的发射项),因此实际代码实际上需要以某种方式处理此问题。
Direct lighting is often also computed using Monte Carlo, but usually using an area measure to pick samples only on lights sources and thus evaluate Equation 14.5.
直接照明通常也使用蒙特卡洛计算,但通常使用面积测量仅在光源上选取样本,从而评估公式 14.5。
Note that the above method works well as long as the material are not perfectly smooth, i.e., impulses in Figure 14.13. Technically for those materials, the BRDF is a delta function with infinite value at exactly one direction so it works formally but not with computer floating point arithmetic. There we need an “if” for such materials which we handle just like in Chapter 4 for rays.
请注意,只要材料不是完全光滑的,即图 14.13中的脉冲,上述方法就可以很好地工作。从技术上讲,对于这些材料,BRDF 是一个在恰好一个方向上具有无限值的delta 函数,因此它在形式上有效,但不适用于计算机浮点运算。对于此类材料,我们需要一个“if”,我们处理它的方式与第 4 章中处理射线的方式相同。
What is “intensity”?
什么是“强度”?
The term intensity is used in a variety of contexts and its use varies with both era and discipline. In practice, it is no longer meaningful as a specific radiometric quantity, but it is useful for intuitive discussion. Most papers that use it do so in place of radiance.
术语“强度”用于各种语境,其用法随时代和学科而变化。在实践中,它不再作为特定的辐射量有意义,但它对于直观讨论很有用。大多数使用它的论文都是代替辐射度。
What is “radiosity”?
什么是“光能传递”?
The term radiosity is used in place of radiant exitance in some fields. It is also sometimes used to describe world-space light transport algorithms.
在某些领域,术语“辐射度”用于替代辐射出射度。有时也用于描述世界空间光传输算法。
My images look too smooth, even with a complex BRDF. What am I doing wrong?
我的图像看起来太平滑了,即使使用了复杂的 BRDF。我做错了什么?
BRDFs only capture subpixel detail that is too small to be resolved by the eye. Most real surfaces also have some small variations, such as the wrinkles in skin, that can be seen. If you want true realism, some sort of texture or displacement map is needed.
BRDF 仅捕获肉眼无法分辨的亚像素细节。大多数真实表面也有一些可见的细微变化,例如皮肤上的皱纹。如果您想要真正的真实感,则需要某种纹理或位移图。
How do I integrate the BRDF with texture mapping?
如何将 BRDF 与纹理映射结合起来?
Texture mapping can be used to control any parameter on a surface. So any kinds of colors or control parameters used by a BRDF should be programmable.
纹理映射可用于控制表面上的任何参数。因此,BRDF 使用的任何颜色或控制参数都应该是可编程的。
I have very pretty code except for my material class. What am I doing wrong?
除了材料类之外,我的代码非常漂亮。我做错了什么?
You are probably doing nothing wrong. Material classes tend to be the ugly thing in everybody’s programs. If you find a nice way to deal with it, please let us know! Our own codes uses a shader architecture (Hanrahan & Lawson, 1990) which makes the material include much of the rendering algorithm.
你可能没有做错什么。材质类往往是每个人程序中最丑陋的东西。如果你找到了处理它的好方法,请告诉我们!我们自己的代码使用着色器架构(Hanrahan & Lawson,1990),这使得材质包含大部分渲染算法。
What models do people use for phase functions?
人们使用什么模型来表示相位函数?
Almost the only model used is the Henyey–Greenstein function which has a single parameter that controls how “stretched” it is.
几乎唯一使用的模型是 Henyey-Greenstein 函数,它有一个控制其“拉伸”程度的参数。
The BRDF and the phase function seem pretty similar so why are they treated so differently?
BRDF 和相位函数看起来非常相似,那么为什么对它们的处理会如此不同呢?
The BRDF actually can be treated as a phase function and scattering albedo, but for historical and practical measurement reasons, they are usually treated differently. Giving them a unified representation in code works fine, but will need some explanation to people used to BRDFs for surfaces.
BRDF 实际上可以视为相位函数和散射反照率,但由于历史和实际测量原因,它们通常被区别对待。在代码中为它们提供统一的表示形式是可行的,但需要向习惯于表面 BRDF 的人进行一些解释。
There are many, many other advanced methods that can be implemented in the ray-tracing framework. Some resources for further information are Glassner’s An Introduction to Ray Tracing and Principles of Digital Image Synthesis, Shirley’s Ray Tracing in One Weekend series, and Pharr et al.’s Physically Based Rendering: From Theory to Implementation, Akenine-Möller et al.’s Real Time Rendering, the two Ray Tracing Gems collections, and McGuire’s Graphics Codex.
光线追踪框架中可以实现许多其他高级方法。有关更多信息的一些资源包括 Glassner 的An光线追踪和数字图像合成原理简介、Shirley 的《一个周末的光线追踪》系列、Pharr 等人的《基于物理的渲染:从理论到实现》 、Akenine-Möller 等人的《实时渲染》 、两本《光线追踪珍品集》和 McGuire 的《图形法典》 。
A common radiometric quantity not described in this chapter is radiant intensity (I), which is the spectral power per steradian emitted from an infinitesimal point source. It should usually be avoided in graphics programs because point sources cause implementation problems. A more rigorous treatment of radiometry can be found in Analytic Methods for Simulated Light Transport (Arvo, 1995a). The radiometric and photometric terms in this chapter are from the Illumination Engineering Society’s standard that is increasingly used by all fields of science and engineering (American National Standard Institute, 1986). A broader discussion of radiometric and appearance standards can be found in Principles of Digital Image Synthesis (Glassner, 1995).
本章未介绍的一种常见辐射量是辐射强度(I),即从无穷小点源发射的每立体角的光谱功率。图形程序中通常应避免使用辐射强度,因为点源会导致实现问题。在《模拟光传输分析方法》 (Arvo,1995a)中可以找到对辐射测量的更严格处理。本章中的辐射和光度术语来自照明工程学会的标准,该标准越来越多地被所有科学和工程领域使用(美国国家标准研究所,1986)。在《数字图像合成原理》 (Glassner,1995)中可以找到有关辐射和外观标准的更广泛讨论。
There are many BRDF models described in the literature, and only a few of them have been described here. Others include (Cook & Torrance, 1982; He et al., 1992; Oren & Nayar, 1994; Schlick, 1994; Lafortune, Foo, Torrance, & Greenberg, 1997; Stam, 1999; Ashikhmin, Premože, & Shirley, 2000; Ershov, Kolchin, & Myszkowski, 2001; Matusik, Pfister, Brand, & McMillan, 2003; Lawrence, Rusinkiewicz, & Ramamoorthi, 2004; Stark, Arvo, & Smits, 2005). The desired characteristics of BRDF models are discussed in Making Shaders More Physically Plausible (Lewis, 1994). The activity at modern film and games studios is very much paying attention to as material models are still advancing, with Unity, Solid Angle, Disney, and Sony being interesting examples. For the glossy term the dominant model, by far, is the GGX model, which is covered extensively in the PBRT book and many papers co-authored by Eric Heitz, notably (Heitz & d’Eon, 2014).
文献中描述了许多 BRDF 模型,本文仅描述了其中的几个。其他模型包括(Cook & Torrance,1982;He et al.,1992;Oren & Nayar,1994;Schlick,1994;Lafortune、Foo、Torrance 和 Greenberg,1997;Stam,1999;Ashikhmin、Premože 和 Shirley,2000;Ershov、Kolchin 和 Myszkowski,2001;Matusik、Pfister、Brand 和 McMillan,2003;Lawrence、Rusinkiewicz 和 Ramamoorthi,2004;Stark、Arvo 和 Smits,2005)。BRDF 模型的期望特性在“使着色器在物理上更合理” (Lewis,1994)中进行了讨论。现代电影和游戏工作室的活动备受关注,因为材料模型仍在不断发展,Unity、Solid Angle、迪士尼和索尼就是有趣的例子。就光鲜亮丽的术语而言,迄今为止占主导地位的模型是GGX模型,PBRT 书籍和 Eric Heitz 合著的许多论文(尤其是 Heitz & d'Eon,2014)对此进行了广泛介绍。
1. Suppose that instead of the Lambertian BRDF, we used a BRDF of the form Ccosaθi. What must C be to conserve energy?
1.假设我们使用 Ccos a θ 形式的 BRDF 来代替 Lambertian BRDF。C 必须等于多少才能节省能量?
2. The BRDF in Exercise 1 is not reciprocal. Can you modify it to be reciprocal?
2.练习 1 中的 BRDF 不是互易的。你能将其修改为互易的吗?
3. Something like a highway sign is a retroreflector. This means that the BRDF is large when ki and ko are near each other. Make a model inspired by the Phong model that captures retroreflection behavior while being reciprocal and conserving energy.
3.高速公路标志之类的东西就是一个回射器。这意味着当k和k o彼此靠近时,BRDF 很大。制作一个受 Phong 模型启发的模型,该模型可以捕捉回射行为,同时具有互易性和能量守恒性。
4. For a diffuse surface with outgoing radiance L, what is the radiant exitance?
4.对于具有出射辐射度L的漫射表面,其辐射出射度是多少?
5. What is the total power exiting a diffuse surface with an area of 4 m2 and a radiance of L?
5.从面积为 4 m 2 、辐射率为L的漫射表面发出的总功率是多少?
6. If a fluorescent light and an incandescent light both consume 20 Watts of power, why is the fluorescent light usually preferred?
6.如果荧光灯和白炽灯都消耗 20 瓦电力,为什么人们通常选择荧光灯?
Michael Gleicher
Intuitively, think of a curve as something you can draw with a pen. The curve is the set of points that the pen traces over an interval of time. While we usually think of a pen writing on paper (e.g., a curve that is in a 2D space), the pen could move in 3D to generate a space curve, or you could imagine the pen moving in some other kind of space.
直观地看,曲线就像是用笔画的东西。曲线是笔在一段时间内描画的点的集合。虽然我们通常认为笔在纸上书写(例如,二维空间中的曲线),但笔可以在三维空间中移动以生成空间曲线,或者你可以想象笔在其他类型的空间中移动。
Mathematically, definitions of curve can be seen in at least two ways:
从数学上来说,曲线的定义至少有两种:
the continuous image of some interval in an n-dimensional space;
n维空间中某个区间的连续像;
a continuous map from a one-dimensional space to an n-dimensional space.
从一维空间到n维空间的连续映射。
Both of these definitions start with the idea of an interval range (the time over which the pen traces the curve). However, there is a significant difference: in the first definition, the curve is the set of points the pen traces (the image), while in the second definition, the curve is the mapping between time and that set of points. For this chapter, we use the first definition.
这两个定义都以间隔范围(笔描绘曲线的时间)的概念开始。但是,它们之间存在一个显著差异:在第一个定义中,曲线是笔描绘的点集(图像),而在第二个定义中,曲线是时间与该点集之间的映射。在本章中,我们使用第一个定义。
A curve is an infinitely large set of points. The points in a curve have the property that any point has two neighbors, except for a small number of points that have one neighbor (these are the endpoints). Some curves have no endpoints, either because they are infinite (like a line) or they are closed (loop around and connect to themselves).
曲线是一组无限大的点。曲线上的点具有这样的特性:任何点都有两个邻居,只有少数点有一个邻居(这些点是端点)。有些曲线没有端点,因为它们是无限的(像一条线)或者是闭合的(环绕并连接到自身)。
Because the “pen” of the curve is thin (infinitesimally), it is difficult to create filled regions. While space-filling curves are possible (by having them fold over themselves infinitely many times), we do not consider such mathematical oddities here. Generally, we think of curves as the outlines of things, not the “insides.”
由于曲线的“笔”很细(无穷小),因此很难创建填充区域。虽然可以实现空间填充曲线(通过让它们无限次折叠),但我们在此不考虑这种数学怪异现象。通常,我们认为曲线是事物的轮廓,而不是“内部”。
The problem that we need to address is how to specify a curve–to give a name or representation to a curve so that we can represent it on a computer. For some curves, the problem of naming them is easy since they have known shapes: line segments, circles, elliptical arcs, etc. A general curve that does not have a “named” shape is sometimes called a free-form curve. Because a free-form curve can take on just about any shape, they are much harder to specify.
我们需要解决的问题是如何指定曲线——为曲线命名或表示,以便我们可以在计算机上表示它。对于某些曲线,命名它们的问题很容易,因为它们具有已知的形状:线段、圆、椭圆弧等。没有“命名”形状的一般曲线有时被称为自由曲线。由于自由曲线几乎可以呈现任何形状,因此它们更难指定。
There are three main ways to specify curves mathematically:
数学上指定曲线的主要方法有三种:
Implicit curve representations define the set of points on a curve by giving a procedure that can test to see if a point in on the curve. Usually, an implicit curve representation is defined by an implicit function of the form
隐式曲线表示通过给出一个可以测试某个点是否在曲线上的程序来定义曲线上的点集。通常,隐式曲线表示由以下形式的隐式函数定义
so that the curve is the set of points for which this equation is true. Note that the implicit function f is a scalar function (it returns a single real number).
因此曲线是满足该方程的点集。请注意,隐式函数f是标量函数(它返回单个实数)。
Parametric curve representations provide a mapping from a free parameter to the set of points on the curve. That is, this free parameter provides an index to the points on the curve. The parametric form of a curve is a function that assigns positions to values of the free parameter. Intuitively, if you think of a curve as something you can draw with a pen on a piece of paper, the free parameter is time, ranging over the interval from the time that we began drawing the curve to the time that we finish. The parametric functionof this curve tells us where the pen is at any instant in time:
参数曲线表示提供了从自由参数到曲线上点集的映射。也就是说,这个自由参数为曲线上的点提供了索引。曲线的参数形式是一个将位置分配给自由参数值的函数。直观地说,如果你把曲线想象成可以用笔在纸上画的东西,那么自由参数就是时间,范围从我们开始画曲线到我们完成画曲线的时间间隔。这条曲线的参数函数告诉我们笔在任何时刻的位置:
Note that the parametric function is a vector-valued function. This example is a 2D curve, so the output of the function is a 2-vector; in 3D it would be a 3-vector.
请注意,参数函数是向量值函数。此示例是一条 2D 曲线,因此函数的输出是 2 向量;在 3D 中它将是 3 向量。
Generative or procedural curve representations provide procedures that can generate the points on the curve that do not fall into the first two categories. Examples of generative curve descriptions include subdivision schemes and fractals.
生成式或程序式曲线表示提供了可以生成曲线上不属于前两类的点的程序。生成式曲线描述的示例包括细分方案和分形。
Remember that a curve is a set of points. These representations give us ways to specify those sets. Any curve has many possible representations. For this reason, mathematicians typically are careful to distinguish between a curve and its representations. In computer graphics we are often sloppy, since we usually only refer to the representation, not the actual curve itself. So when someone says “an implicit curve,” they are either referring to the curve that is represented by some implicit function or to the implicit function that is one of the representations of some curve. Such distinctions are not usually important, unless we need to consider different representations of the same curve. We will consider different curve representations in this chapter, so we will be more careful. When we use a term like “polynomial curve,” we will mean the curve that can be represented by the polynomial.
请记住,曲线是一组点。这些表示为我们提供了指定这些集合的方法。任何曲线都有许多可能的表示。因此,数学家通常会小心区分曲线及其表示。在计算机图形学中,我们经常很马虎,因为我们通常只指表示,而不是实际曲线本身。因此,当有人说“隐式曲线”时,他们要么指的是某个隐式函数表示的曲线,要么是指作为某条曲线的表示之一的隐式函数。这种区别通常并不重要,除非我们需要考虑同一条曲线的不同表示。我们将在本章中考虑不同的曲线表示,因此我们会更加小心。当我们使用“多项式曲线”这样的术语时,我们指的是可以用多项式表示的曲线。
By the definition given at the beginning of this chapter, for something to be a curve it must have a parametric representation. However, many curves have other representations. For example, a circle in 2D with its center at the origin and radius equal to 1 can be written in implicit form as
根据本章开头给出的定义,曲线必须具有参数表示。然而,许多曲线还有其他表示。例如,一个圆心在原点、半径等于 1 的二维圆可以隐式写成
or in parametric form as
或者以参数形式
The parametric form need not be the most convenient representation for a given curve. In fact, it is possible to have curves with simple implicit or generative representations for which it is difficult to find a parametric representation.
参数形式不一定是给定曲线最方便的表示。事实上,可能存在具有简单隐式或生成表示的曲线,但很难找到参数表示。
Different representations of curves have advantages and disadvantages. For example, parametric curves are much easier to draw, because we can sample the free parameter. Generally, parametric forms are the most commonly used in computer graphics since they are easier to work with. Our focus will be on parametric representations of curves.
不同的曲线表示法各有优缺点。例如,参数曲线更容易绘制,因为我们可以对自由参数进行采样。一般来说,参数形式是计算机图形学中最常用的形式,因为它们更容易使用。我们将重点介绍曲线的参数表示。
A parametric curve refers to the curve that is given by a specific parametric function over some particular interval. To be more precise, a parametric curve has a given function that is a mapping from an interval of the parameters. It is often convenient to have the parameter run over the unit interval from 0 to 1. When the free parameter varies over the unit interval, we often denote the parameter as u.
参数曲线是指由特定参数函数在特定区间内给出的曲线。更准确地说,参数曲线具有给定函数,该函数是从参数区间映射而来。让参数在从 0 到 1 的单位区间内运行通常很方便。当自由参数在单位区间内变化时,我们通常将参数表示为u 。
If we view the parametric curve to be a line drawn with a pen, we can consider u = 0 as the time when the pen is first set down on the paper and the unit of time to be the amount of time it takes to draw the curve (u = 1 is the end of the curve).
如果我们将参数曲线看作是用笔画出的一条线,我们可以将u = 0 视为笔第一次放在纸上的时间,将时间单位视为绘制曲线所需的时间( u = 1 是曲线的终点)。
The curve can be specified by a function that maps time (in these unit coordinates) to positions. Basically, the specification of the curve is a function that can answer the question, “Where is the pen at time u?”
曲线可以通过将时间(在这些单位坐标中)映射到位置的函数来指定。基本上,曲线的规范是一个可以回答以下问题的函数:“笔在时间u处在哪里?”
If we are given a function f(t) that specifies a curve over interval [a, b],we can easily define a new function f2(u) that specifies the same curve over the unit interval. We can first define
如果我们给定一个函数f ( t ),它指定区间 [ a, b ] 上的曲线,我们可以轻松定义一个新的函数f 2 ( u ),它指定单位区间上的同一条曲线。我们可以首先定义
and then
进而
The two functions, f and f2 both represent the same curve; however, they provide different parameterizations of the curve. The process of creating a new parameterization for an existing curve is called reparameterization, and the mapping from old parameters to the new ones (g, in this example) is called the reparameterization function.
这两个函数f和f 2都表示同一条曲线;但是,它们提供了不同的曲线参数化。为现有曲线创建新参数化的过程称为重新参数化,从旧参数到新参数(本例中为g )的映射称为重新参数化函数。
If we have defined a curve by some parameterization, infinitely many others exist (because we can always reparameterize). Being able to have multiple parameterizations of a curve is useful, because it allows us to create parameterizations that are convenient. However, it can also be problematic, because it makes it difficult to compare two functions to see if they represent the same curve.
如果我们通过某种参数化定义了一条曲线,那么就会存在无数其他的参数化(因为我们总是可以重新参数化)。能够对一条曲线进行多种参数化很有用,因为它允许我们创建方便的参数化。然而,这也可能有问题,因为它使得比较两个函数以查看它们是否代表同一条曲线变得困难。
The essence of this problem is more general: the existence of the free parameter (or the element of time) adds an invisible, potentially unknown element to our representation of the curves. When we look at the curve after it is drawn, we don’t necessarily know the timing. The pen might have moved at a constant speed over the entire time interval, or it might have started slowly and sped up. For example, while u = 0.5 is halfway through the parameter space, it may not be halfway along the curve if the motion of the pen starts slowly and speeds up at the end. Consider the following representations of a very simple curve:
这个问题的本质更为普遍:自由参数(或时间元素)的存在为我们对曲线的表示添加了一个不可见的、可能未知的元素。当我们在绘制曲线后查看它时,我们不一定知道时间。笔可能在整个时间间隔内以恒定速度移动,或者它可能一开始很慢然后加速。例如,虽然u = 0.5 位于参数空间的一半,但如果笔的运动一开始很慢然后在最后加速,它可能不在曲线的一半。考虑以下一个非常简单的曲线的表示:
All three functions represent the same curve on the unit interval; however when u is not 0 or 1, f(u) refers to a different point depending on the representation of the curve.
这三个函数在单位区间上表示相同的曲线;但是当u不为 0 或 1 时, f ( u ) 会根据曲线的表示指向不同的点。
If we are given a parameterization of a curve, we can use it directly as our specification of the curve, or we can develop a more convenient parameterization.
如果我们给出了一条曲线的参数化,我们可以直接用它作为曲线的规范,或者我们可以开发一个更方便的参数化。
Usually, the natural parameterization is created in a way that is convenient (or natural) for specifying the curve, so we don’t have to know about how the speed changes along the curve.
通常,自然参数化是以方便(或自然)的方式指定曲线的,因此我们不必知道速度沿曲线如何变化。
If we know that the pen moves at a constant velocity, then the values of the free parameters have more meaning. Halfway through parameter space is halfway along the curve. Rather than measuring time, the parameter can be thought to measure length along the curve. Such parameterizations are called arc-lengthparameterizations because they define curves by functions that map from the distance along the curve (known as the arc length) to positions. We often use the variable s to denote an arc-length parameter.
如果我们知道笔以恒定速度移动,那么自由参数的值就更有意义了。参数空间的一半就是曲线的一半。参数可以被认为是测量沿曲线的长度,而不是测量时间。这种参数化称为弧长参数化,因为它们通过从沿曲线的距离(称为弧长)映射到位置的函数来定义曲线。我们经常使用变量s来表示弧长参数。
Technically, a parameterization is an arc-length parameterization if the magnitude of its tangent (that is, the derivative of the parameterization with respect to the parameter) has constant magnitude. Expressed as an equation,
从技术角度上讲,如果参数化的正切值(即参数化相对于参数的导数)具有常数值,则该参数化为弧长参数化。用公式表示如下:
Computing the length along a curve can be tricky. In general, it is defined by the integral of the magnitude of the derivative (intuitively, the magnitude of the derivative is the velocity of the pen as it moves along the curve). So, given a value for the parameter v, you can compute s (the arc-length distance along the curve from the point f(0) to the point f(v))as
计算曲线的长度可能很棘手。通常,它由导数幅值的积分定义(直观地讲,导数幅值是笔沿曲线移动的速度)。因此,给定参数v的值,您可以计算s (从点f (0) 到点f ( v ) 沿曲线的弧长距离),如下所示
where f(t) is a function that defines the curve with a natural parameterization.
其中f ( t ) 是一个用自然参数化定义曲线的函数。
Using the arc-length parameterization requires being able to solve Equation (15.1) for t,given s. For many of the kinds of curves we examine, it cannot be done in a closed-form (simple) manner and must be done numerically.
使用弧长参数化需要能够求解方程 (15.1) 中的t ,给定s 。对于我们研究的许多类型的曲线,它不能以封闭形式(简单)的方式完成,必须通过数值方式完成。
Generally, we use the variable u to denote free parameters that range over the unit interval, s to denote arc-length free parameters, and t to represent parameters that aren’t one of the other two.
一般来说,我们用变量u来表示在单位间隔内变化的自由参数,用s来表示弧长自由参数,用t来表示不是另外两个之一的参数。
For some curves, defining a parametric function that represents their shape is easy. For example, lines, circles, and ellipses all have simple functions that define the points they contain in terms of a parameter. For many curves, finding a function that specifies their shape can be hard. The main strategy that we use to create complex curves is divide-and-conquer: we break the curve into a number of simpler smaller pieces, each of which has a simple description.
对于某些曲线,定义表示其形状的参数函数很容易。例如,直线、圆和椭圆都有简单的函数,可以根据参数定义它们包含的点。对于许多曲线,找到一个指定其形状的函数可能很困难。我们用来创建复杂曲线的主要策略是分而治之:我们将曲线分成许多更简单的小部分,每个部分都有一个简单的描述。
For example, consider the curves in Figure 15.1. The first two curves are easily specified in terms of two pieces. In the case of the curve in Figure 15.1(b), we need two different kinds of pieces: a line segment and a circle.
例如,考虑图 15.1中的曲线。前两条曲线很容易用两段来指定。对于图 15.1(b)中的曲线,我们需要两种不同类型的段:线段和圆。
Figure 15.1. (a) A curve that can be easily represented as two lines; (b) a curve that can be easily represented as a line and a circular arc; (c) a curve approximating curve (b) with five line segments.
图 15.1。 (a) 一条可以很容易地表示为两条线的曲线;(b) 一条可以很容易地表示为一条线和一条圆弧的曲线;(c) 一条用五条线段近似曲线 (b) 的曲线。
To create a parametric representation of a compound curve (like the curve in Figure 15.1(b)), we need to have our parametric function switch between the functions that represent the pieces. If we define our parametric functions over the range 0 ≤ u ≤ 1, then the curve in Figures 15.1(a) or (b) might be defined as
要创建复合曲线的参数表示(如图15.1(b)中的曲线),我们需要让参数函数在表示各个部分的函数之间切换。如果我们在 0 ≤ u ≤ 1 的范围内定义参数函数,则图 15.1(a)或(b)中的曲线可能定义为
where f1 is a parameterization of the first piece, f2 is a parameterization of the second piece, and both of these functions are defined over the unit interval.
其中f 1是第一部分的参数化, f 2是第二部分的参数化,这两个函数都是在单位间隔内定义的。
We need to be careful in defining the functions f1 and f2 to make sure that the pieces of the curve fit together. Iff1(1) ≠ f2(0), then our curve pieces will not connect and will not form a single continuous curve.
我们需要小心定义函数f 1和f 2 ,以确保曲线各部分能够紧密贴合。如果f 1 (1) ≠ f 2 (0),则曲线各部分将不会连接,也不会形成一条连续的曲线。
To represent the curve in Figure 15.1(b), we needed to use two different types of pieces: a line segment and a circular arc. For simplicity’s sake, we may prefer to use a single type of piece. If we try to represent the curve in Figure 15.1(b) with only one type of piece (line segments), we cannot exactly re-create the curve (unless we use an infinite number of pieces). While the new curve made of line segments (as in Figure 15.1(c)) may not be exactly the same shape as in Figure 15.1(b), it might be close enough for our use. In such a case, we might prefer the simplicity of using the simpler line segment pieces to having a curve that more accurately represents the shape.
为了表示图 15.1(b)中的曲线,我们需要使用两种不同类型的片段:线段和圆弧。为简单起见,我们可能更愿意使用单一类型的片段。如果我们尝试仅使用一种类型的片段(线段)来表示图 15.1(b)中的曲线,则我们无法精确地重新创建曲线(除非我们使用无数个片段)。虽然由线段组成的新曲线(如图 15.1(c)所示)的形状可能与图 15.1(b)不完全相同,但对于我们使用来说可能足够接近。在这种情况下,我们可能更喜欢使用更简单的线段片段,而不是使用更准确地表示形状的曲线。
Also, notice that as we use an increasing number of pieces, we can get a better approximation. In the limit (using an infinite number of pieces), we can exactly represent the original shape.
另外,请注意,随着我们使用的碎片数量增加,我们可以得到更好的近似值。在极限情况下(使用无限数量的碎片),我们可以精确地表示原始形状。
One advantage to using a piecewise representation is that it allows us to make a tradeoff between
使用分段表示的一个优点是,它允许我们在
how well our represented curve approximates the real shape we are trying to represent;
我们所表示的曲线与我们试图表示的真实形状的近似程度如何;
how complicated the pieces that we use are;
我们使用的部件有多复杂;
how many pieces we use.
我们使用了多少件。
So, if we are trying to represent a complicated shape, we might decide that a crude approximation is acceptable and use a small number of simple pieces. To improve the approximation, we can choose between using more pieces and using more complicated pieces.
因此,如果我们试图表示一个复杂的形状,我们可能会决定粗略的近似是可以接受的,并使用少量的简单部件。为了改进近似,我们可以选择使用更多部件或使用更复杂的部件。
In computer graphics practice, we tend to prefer using relatively simple curve pieces (either line segments, arcs, or polynomial segments).
在计算机图形学实践中,我们倾向于使用相对简单的曲线片段(线段、圆弧或多项式段)。
Before computers, when draftsmen wanted to draw a smooth curve, one tool they employed was a stiff piece of metal that they would bend into the desired shape for tracing. Because the metal would bend, not fold, it would have a smooth shape. The stiffness meant that the metal would bend as little as possible to make the desired shape. This stiff piece of metal was called a spline.
在计算机出现之前,绘图员想要绘制平滑曲线时,他们使用的工具之一是一块坚硬的金属,他们可以将其弯曲成所需的形状以便描图。由于金属会弯曲而不是折叠,因此它会具有平滑的形状。硬度意味着金属会尽可能少地弯曲以形成所需的形状。这块坚硬的金属被称为样条线。
Mathematicians found that they could represent the curves created by a draft-man’s spline with piecewise polynomial functions. Initially, they used the term spline to mean a smooth, piecewise polynomial function. More recently, the term spline has been used to describe any piecewise polynomial function. We prefer this latter definition.
数学家发现,他们可以用分段多项式函数来表示绘图员样条线创建的曲线。最初,他们使用样条线一词来表示平滑的分段多项式函数。最近,样条线一词已被用来描述任何分段多项式函数。我们更喜欢后一种定义。
For us, a spline is a piecewise polynomial function. Such functions are very useful for representing curves.
对于我们来说,样条函数是分段多项式函数。此类函数对于表示曲线非常有用。
To describe a curve, we need to give some facts about its properties. For “named” curves, the properties are usually specific according to the type of curve. For example, to describe a circle, we might provide its radius and the position of its center. For an ellipse, we might also provide the orientation of its major axis and the ratio of the lengths of the axes. For free-form curves however, we need to have a more general set of properties to describe individual curves.
要描述一条曲线,我们需要给出一些关于其属性的事实。对于“命名”曲线,属性通常根据曲线的类型而特定。例如,要描述一个圆,我们可能会提供其半径和圆心的位置。对于椭圆,我们可能还会提供其长轴的方向和轴长比。然而,对于自由曲线,我们需要一组更通用的属性来描述单个曲线。
Some properties of curves are attributed to only a single location on the curve, while other properties require knowledge of the whole curve. For an intuition of the difference, imagine that the curve is a train track. If you are standing on the track on a foggy day, you can tell that the track is straight or curved and whether or not you are at an endpoint. These are local properties. You cannot tell whether or not the track is a closed curve, or crosses itself, or how long it is. We call this type of property, a global property.
曲线的某些属性仅归因于曲线上的单个位置,而其他属性则需要了解整个曲线。为了直观地了解差异,请将曲线想象成火车轨道。如果您在雾天站在轨道上,您可以判断轨道是直的还是弯的,以及您是否位于终点。这些都是局部属性。您无法判断轨道是否是闭合曲线,或者是否与自身相交,或者它有多长。我们将这种类型的属性称为全局属性。
The study of local properties of geometric objects (curves and surfaces) is known as differential geometry. Technically, to be a differential property, there are some mathematical restrictions about the properties (roughly speaking, in the train-track analogy, you would not be able to have a GPS or a compass). Rather than worry about this distinction, we will use the term local property rather than differential property.
研究几何对象(曲线和曲面)局部属性的学科称为微分几何。从技术上讲,作为微分属性,属性存在一些数学限制(粗略地说,在火车轨道类比中,您不能拥有 GPS 或指南针)。我们不必担心这种区别,而是使用术语“局部属性”而不是“微分属性”。
Local properties are important tools for describing curves because they do not require knowledge about the whole curve. Local properties include
局部属性是描述曲线的重要工具,因为它们不需要了解整个曲线。局部属性包括
position at a specificplaceonthecurve,
在曲线上的特定位置,
direction at a specific place on the curve,
曲线上特定位置的方向,
curvature (and other derivatives).
曲率(和其他衍生物)。
Often, we want to specify that a curve includes a particular point. A curve is said to interpolate a point if that point is part of the curve. A function f interpolates a value v if there is some value of the parameter u for which f (t) = v. We call the place of interpolation, that is the value of t, the site.
我们经常想指定曲线包含某个特定点。如果该点是曲线的一部分,则称该曲线插入了该点。如果参数 u的某个值满足f ( t ) = v,则函数f插入值v。我们将插入的位置(即t 的值)称为站点。
It will be very important to understand the local properties of a curve where two parametric pieces come together. If a curve is defined using an equation like Equation (15.2), then we need to be careful about how the pieces are defined. If f1(1) = f2(0), then the curve will be “broken”–we would not be able to draw the curve in a continuous stroke of a pen. We call the condition that the curve pieces fit together continuity conditions because if they hold, the curve can be drawn as a continuous piece. Because our definition of ”curve” at the beginning of this chapter requires a curve to be continuous, technically a ”broken curve” is not a curve.
理解曲线中两个参数片段相交处的局部属性非常重要。如果使用方程 (15.2) 之类的方程来定义曲线,那么我们需要小心定义片段的方式。如果f 1 (1) = f 2 (0),则曲线将“断开”——我们将无法用笔连续地绘制曲线。我们将曲线片段相交的条件称为连续性条件,因为如果它们成立,曲线就可以绘制为连续片段。由于我们在本章开头对“曲线”的定义要求曲线是连续的,因此从技术上讲,“断开的曲线”不是曲线。
In addition to the positions, we can also check that the derivatives of the pieces match correctly. If , then the combined curve will have an abrupt change in its first derivative at the switching point; the first derivative will not be continuous. In general, we say that a curve is Cn continuous if all of its derivatives up to n match across pieces. We denote the position itself as the zeroth derivative, so that the C0 continuity condition means that the positions of the curve are continuous, and C1 continuity means that positions and first derivatives are continuous. The definition of curve requires the curve to be C0.
除了位置之外,我们还可以检查碎片的导数是否正确匹配。如果f 1 ′ ( 0 ) ≠ f 2 ′ ( 0 ) ,则组合曲线在切换点处其一阶导数将发生突变;一阶导数将不连续。一般而言,如果曲线的所有导数在n之前在各个片段之间都匹配,则我们称该曲线为C n连续。我们将位置本身表示为零阶导数,因此C 0连续性条件意味着曲线的位置是连续的,而C 1连续性意味着位置和一阶导数都是连续的。曲线的定义要求曲线为C 0 。
An illustration of some continuity conditions is shown in Figure 15.2. A discontinuity in the first derivative (the curve is C0 but not C1) is usually noticeable because it displays a sharp corner. A discontinuity in the second derivative is sometimes visually noticeable. Discontinuities in higher derivatives might matter, depending on the application. For example, if the curve represents a motion, an abrupt change in the second derivative is noticeable, so third derivative continuity is often useful. If the curve is going to have a fluid flowing over it (for example, if it is the shape for an airplane wing or boat hull), a discontinuity in the fourth or fifth derivative might cause turbulence.
图 15.2说明了一些连续性条件。一阶导数的不连续性(曲线为C 0而不是C 1 )通常很明显,因为它显示出一个尖角。二阶导数的不连续性有时在视觉上很明显。根据应用情况,高阶导数的不连续性可能很重要。例如,如果曲线表示运动,则二阶导数的突然变化是明显的,因此三阶导数连续性通常很有用。如果曲线上有流体流过(例如,如果它是飞机机翼或船体的形状),四阶或五阶导数的不连续性可能会导致湍流。
Figure 15.2. An illustration of various types of continuity between two curve segments.
图 15.2.两个曲线段之间各种连续性的图示。
The type of continuity we have just introduced (Cn) is commonly referred to as parametric continuity as it depends on the parameterization of the two curve pieces. If the “speed” of each piece is different, then they will not be continuous. For cases where we care about the shape of the curve, and not its parameterization, we define geometric continuity that requires that the derivatives of the curve pieces match when the curves are parameterized equivalently (for example, using an arc-length parameterization). Intuitively, this means that the corresponding derivatives must have the same direction, even if they have different magnitudes.
我们刚刚引入的连续性类型 ( C n ) 通常称为参数连续性,因为它取决于两个曲线段的参数化。如果每个曲线段的“速度”不同,则它们将不连续。对于我们关心曲线形状而不是其参数化的情况,我们定义几何连续性,要求当曲线等效参数化时(例如,使用弧长参数化),曲线段的导数必须匹配。直观地说,这意味着相应的导数必须具有相同的方向,即使它们具有不同的幅度。
So, if the C1 continuity condition is
因此,如果C 1连续性条件
the G1 continuity condition would be
G 1连续性条件为
for some value of scalar k. Generally, geometric continuity is less restrictive than parametric continuity. A Cn curve is also Gn except when the parametric derivatives vanish.
对于标量k的某个值。通常,几何连续性比参数连续性限制较少。除非参数导数为零,否则C n曲线也是G n 。
The most widely used representations of curves in computer graphics is done by piecing together basic elements that are defined by polynomials and called polynomial pieces. For example, a line element is given by a linear polynomial. In Section 15.3.1, we give a formal definition and explain how to put pieces of polynomial together.
计算机图形学中最广泛使用的曲线表示是通过将多项式定义的基本元素拼凑在一起而实现的,这些基本元素称为多项式片段。例如,线元素由线性多项式给出。在第 15.3.1 节中,我们给出了正式定义并解释了如何将多项式片段拼凑在一起。
Polynomials are functions of the form
多项式是形式为
The ai are called the coefficients,and n is called the degree of the polynomial if an ≠ 0. We also write Equation (15.3) in the form
如果a n ≠ 0,则 a 称为系数, n称为多项式的次数。我们还将方程 (15.3) 写成如下形式
We call this the canonical form of the polynomial.
我们称之为多项式的标准形式。
We can generalize the canonical form to
我们可以将规范形式推广到
where bi(t) is a polynomial. We can choose these polynomials in a convenient form for different applications, and we call them basis functions or blending functions (see Section 15.3.5). In Equation (15.4), the ti are the bi(t) of Equation (15.5). If the set of basis functions is chosen correctly, any polynomial of degree n +1 can be represented by an appropriate choice of c.
其中 b( t ) 是多项式。我们可以为不同的应用选择方便的形式的多项式,我们称它们为基函数或混合函数(参见第 15.3.5 节)。在公式 (15.4) 中,t 是公式 (15.5) 中的 b( t )。如果正确选择了基函数集,则任何n +1 次多项式都可以用适当的c表示。
The canonical form does not always have convenient coefficients. For practical purposes, throughout this chapter, we will find sets of basis functions such that the coefficients are convenient ways to control the curves represented by the polynomial functions.
标准形式并不总是具有方便的系数。出于实际目的,在本章中,我们将找到一组基函数,使得系数成为控制多项式函数所表示的曲线的方便方法。
To specify a curve embedded in two dimensions, one can either specify two polynomials in t: one for how x varies with t and one for how y varies with t; or specify a single polynomial where each of the ai is a 2D point. An analogous situation exists for any curve in an n-dimensional space.
要指定嵌入二维的曲线,可以指定t中的两个多项式:一个表示x随t变化的方式,另一个表示y随t变化的方式;或者指定单个多项式,其中每个a都是一个二维点。n维空间中的任何曲线都存在类似的情况。
To introduce the concepts of piecewise polynomial curve representations, we will discuss line segments. In practice, line segments are so simple that the mathematical derivations will seem excessive. However, by understanding this simple case, things will be easier when we move on to more complicated polynomials.
为了介绍分段多项式曲线表示的概念,我们将讨论线段。实际上,线段非常简单,数学推导似乎有些多余。但是,通过理解这个简单的情况,当我们转向更复杂的多项式时,事情会变得更容易。
Consider a line segment that connects point p0 to p1. We could write the parametric function over the unit domain for this line segment as
考虑连接点p 0和p 1 的线段。我们可以将此线段在单位域上的参数函数写为
By writing this in vector form, we have hidden the dimensionality of the points and the fact that we are dealing with each dimension separately. For example, were we working in 2D, we could have created separate equations:
通过以矢量形式编写,我们隐藏了点的维数以及我们分别处理每个维度的事实。例如,如果我们在 2D 中工作,我们可以创建单独的方程式:
The line that we specify is determined by the two endpoints, but from now on we will stick to vector notation since it is cleaner. We will call the vector of control parameters, p,the control points, and each element of p,a control point.
我们指定的线由两个端点决定,但从现在开始我们将坚持使用矢量符号,因为它更清晰。我们将控制参数的矢量p称为控制点,将p的每个元素称为控制点。
While describing a line segment by the positions of its endpoints is obvious and usually convenient, there are other ways to describe a line segment. For example,
虽然用线段端点的位置来描述线段是显而易见且通常很方便的,但还有其他方法可以描述线段。例如,
the position of the center of the line segment, the orientation, and the length;
线段中心的位置、方向、长度;
the position of one endpoint and the position of the second point relative to the first;
一个端点的位置以及第二个点相对于第一个端点的位置;
the position of the middle of the line segment and one endpoint.
线段中点和一个端点的位置。
It is obvious that given one kind of a description of a line segment, we can switch to another one.
显然,给定一种线段描述,我们可以切换到另一种描述。
A different way to describe a line segment is using the canonical form of the polynomial (as discussed in Section 15.3.1),
描述线段的另一种方法是使用多项式的标准形式(如第 15.3.1 节所述),
Any line segment can be represented either by specifying a0 and a1 or the endpoints (p0 and p1). It is usually more convenient to specify the endpoints, because we can compute the other parameters from the endpoints.
任何线段都可以通过指定a 0和a 1或端点( p 0和p 1 )来表示。指定端点通常更方便,因为我们可以从端点计算其他参数。
To write the canonical form as a vector expression, we defineavector u that is a vector of the powers of u:
为了将标准形式写成向量表达式,我们定义一个向量u ,它是u的幂的向量:
so that Equation (15.4) can be written as
因此公式 (15.4) 可以写成
This vector notation will make transforming between different forms of the curve easier.
这种矢量符号将使不同形式的曲线之间的转换变得更容易。
Equation (15.8) describes a curve segment by the set of polynomial coefficients for the simple form of the polynomial. We call such a representation the canonical form. We will denote the parameters of the canonical form by a.
方程 (15.8) 通过多项式简单形式的多项式系数集描述曲线段。我们将这种表示称为规范形式。我们将用 表示规范形式的参数。
While it is mathematically simple, the canonical form is not always the most convenient way to specify curves. For example, we might prefer to specify a line segment by the positions of its endpoints. If we want to define p0 to be the beginning of the segment (where the segment is when u = 0)and p1 to be the end of the line segment (where the line segment is at u = 1), we can write
虽然从数学上来说很简单,但规范形式并不总是指定曲线的最方便方法。例如,我们可能更喜欢通过线段端点的位置来指定线段。如果我们想将p 0定义为线段的起点( u = 0 时线段所在的位置),将p 1定义为线段的终点( u = 1 时线段所在的位置),我们可以写成
We can solve these equations for a0 and a1:
我们可以解这些方程求出a 0和a 1 :
While this first example was easy enough to solve, for more complicated examples it will be easier to write Equation (15.9) in the form
虽然第一个例子很容易解决,但对于更复杂的例子,将公式 (15.9) 写成以下形式会更容易
Alternatively, we can write
或者我们可以写
where we call C, the constraint matrix.1 If having vectors of points bothers you, you can consider each dimension independently (so that p is [x0 x1] or [y0 y1] and a is handled correspondingly).
我们称C为约束矩阵。1如果有点向量让您感到困扰,您可以独立考虑每个维度(这样p就是 [ x 0 x 1 ] 或 [ y 0 y 1 ] 并且a会得到相应处理)。
We can solve Equation (15.10) for a by finding the inverse of C. This inverse matrix which we will denote by B is called the basis matrix. The basis matrix is very handy since it tells us how to convert between the convenient parameters p and the canonical form a, and, therefore, gives us an easy way to evaluate the curve
我们可以通过求C的逆来求解方程 (15.10) 中的a 。我们将用B表示的这个逆矩阵称为基矩阵。基矩阵非常方便,因为它告诉我们如何在方便的参数p和标准形式a之间进行转换,因此,它为我们提供了一种评估曲线的简单方法
We can find a basis matrix for whatever form of the curve that we want, providing that there are no nonlinearities in the definition of the parameters. Examples of nonlinearly defined parameters include the length and angle of the line segment.
只要参数定义中不存在非线性,我们就能找到所需曲线的任何形式的基矩阵。非线性定义参数的例子包括线段的长度和角度。
Now, suppose we want to parameterize the line segment so that p0 is the halfway point (u = 0.5), and p1 is the ending point (u = 1). To derive the basis matrix for this parameterization, we set
现在,假设我们要参数化线段,使p 0为中点( u = 0.5), p 1为终点( u = 1)。为了推导此参数化的基础矩阵,我们设置
So
所以
and therefore
因此
Line segments are so simple that finding a basis matrix is trivial. However, it was good practice for curves of higher degree. First, let’s consider quadratics (curves of degree two). The advantage of the canonical form (Equation (15.4)) is that it works for these more complicated curves, just by letting n be a larger number.
线段非常简单,因此找到基矩阵并不难。但是,对于更高阶的曲线来说,这是个很好的做法。首先,让我们考虑二次曲线(二次曲线)。标准形式(公式 (15.4))的优点在于,只要让n为一个更大的数字,它就可以适用于这些更复杂的曲线。
1 We assume the form of a vector (row or column) is obvious from the context, and we will skip all of the transpose symbols for vectors.
1我们假设向量的形式(行或列)从上下文中是显而易见的,并且我们将跳过所有向量的转置符号。
A quadratic (a degree-two polynomial) has three coefficients, a0, a1,and a2. These coefficients are not convenient for describing the shape of the curve. However, we can use the same basis matrix method to devise more convenient parameters. If we know the value of u, Equation (15.4) becomes a linear equation in the parameters, and the linear algebra from the last section still works.
二次多项式有三个系数, a 0 、 a 1和a 2 。这些系数不便于描述曲线的形状。但是,我们可以使用相同的基矩阵方法来设计更方便的参数。如果我们知道u的值,方程(15.4)就变成了参数中的线性方程,上一节中的线性代数仍然有效。
Suppose that we wanted to describe our curves by the position of the beginning (u = 0), middle2 (u = 0.5), and end (u = 1). Entering the appropriate values into Equation (15.4):
假设我们想用起点( u = 0)、中间2 ( u = 0.5)和终点( u = 1)的位置来描述曲线。将适当的值代入公式 (15.4):
So the constraint matrix is
所以约束矩阵是
and the basis matrix is
基础矩阵为
There is an additional type of constraint (or parameter) that is sometimes convenient to specify: the derivative of the curve (with respect to its free parameter) at a particular value. Intuitively, the derivatives tell us how the curve is changing, so that the first derivative tells us what direction the curve is going, the second derivative tells us how quickly the curve is changing direction, etc. We will see examples of why it is useful to specify derivatives later.
有时,还有一种额外的约束(或参数)类型便于指定:曲线在特定值处的导数(相对于其自由参数)。直观地说,导数告诉我们曲线是如何变化的,因此一阶导数告诉我们曲线的走向,二阶导数告诉我们曲线改变方向的速度,等等。稍后我们将通过示例来说明指定导数为何有用。
For the quadratic,
对于二次函数,
the derivatives are simple:
导数很简单:
and
和
2 Notice that this is the middle of the parameter space, which might not be the middle of the curve itself.
2注意,这是参数空间的中间,可能不是曲线本身的中间。
Or, more generally,
或者更一般地,
For example, consider a case where we want to specify a quadratic curve segment by the position, first, and second derivative at its middle (u = 0.5).
例如,考虑这样一种情况,我们想要通过中间的位置、一阶和二阶导数来指定二次曲线段( u = 0.5)。
The constraint matrix is
约束矩阵为
and the basis matrix is
基础矩阵为
Cubic polynomials are popular in graphics (See Section 15.5). The derivations for the various forms of cubics are just like the derivations we’ve seen already in this section. We will work through one more example for practice.
三次多项式在图形学中很常见(参见第 15.5 节)。各种形式的三次多项式的推导与本节中我们已经看到的推导一样。我们将再举一个例子来练习。
A very useful form of a cubic polynomial is the Hermite form, where we specify the position and first derivative at the beginning and end, that is,
三次多项式的一个非常有用的形式是Hermite形式,其中我们指定开始和结束的位置和一阶导数,即
Thus, the constraint matrix is
因此,约束矩阵为
and the basis matrix is
基础矩阵为
We will discuss Hermite cubic splines in Section 15.5.2.
我们将在第 15.5.2 节讨论 Hermite 三次样条。
If we know the basis matrix, B, we can multiply it by the parameter vector, u,to get a vector of functions
如果我们知道基础矩阵B ,我们可以将其乘以参数向量u ,得到一个函数向量
Notice that we denote this vector by b(u) to emphasize the fact that its value depends on the free parameter u. We call the elements of b(u) the blending functions, because they specify how to blend the values of the control point vector together:
请注意,我们用b ( u ) 表示这个向量,以强调其值取决于自由参数u。我们将b ( u ) 的元素称为混合函数,因为它们指定如何将控制点向量的值混合在一起:
It is important to note that for a chosen value of u, Equation (15.11) is a linearequation specifying a linear blend (or weighted average) of the control points. This is true no matter what degree polynomials are “hidden” inside of the bifunctions.
值得注意的是,对于选定的u值,公式 (15.11) 是一个线性方程,指定控制点的线性混合(或加权平均值)。无论b函数内部“隐藏”了多少次多项式,情况都是如此。
Blending functions provide a nice abstraction for describing curves. Any type of curve can be represented as a linear combination of its control points, where those weights are computed as some arbitrary functions of the free parameter.
混合函数为描述曲线提供了很好的抽象。任何类型的曲线都可以表示为其控制点的线性组合,其中这些权重被计算为自由参数的一些任意函数。
In general, a polynomial of degree n can interpolate a set of n + 1 values. If we are given a vector p = (p0,...,pn) of points to interpolate and a vector t = (t0,...,tn) of increasing parameter values, ti ≠ tj, we can use the methods described in the previous sections to determine an n + 1 × n + 1 basis matrix that gives us a function f(t) such that f(ti) = pi. For any given vector t,weneed to set up and solve an n = 1 × n + 1 linear system. This provides us with a set of n + 1 basis functions that perform interpolation:
一般来说, n次多项式可以插值一组n + 1 个值。如果给定一个要插值的点向量p = ( p 0 ,...,p n ) 和一个参数值递增的向量t = ( t 0 ,...,t n ),t ≠ t,我们可以使用上一节中描述的方法确定一个n + 1 × n + 1 个基矩阵,该矩阵给出一个函数f ( t ),使得f (t) = p。对于任何给定向量t ,我们需要建立并求解一个n = 1 × n + 1 个线性系统。这为我们提供了一组执行插值的n + 1 个基函数:
These interpolating basis functions can be derived in other ways. One particularly elegant way to define them is the Lagrange form:
这些插值基函数可以通过其他方式导出。一种特别优雅的定义方法是拉格朗日形式:
There are more computationally efficient ways to express the interpolating basis functions than the Lagrange form (see De Boor (1978) for details).
与拉格朗日形式相比,有更多计算效率更高的方法来表达插值基函数(详情请参阅 De Boor (1978))。
Interpolating polynomials provide a mechanism for defining curves that interpolate a set of points. Figure 15.3 shows some examples. While it is possible to create a single polynomial to interpolate any number of points, we rarely use high-order polynomials to represent curves in computer graphics. Instead, interpolating splines (piecewise polynomial functions) are preferred. Some reasons for this are considered in Section 15.5.3.
插值多项式提供了一种定义插值一组点的曲线的机制。图 15.3显示了一些示例。虽然可以创建一个多项式来插值任意数量的点,但我们很少使用高阶多项式来表示计算机图形学中的曲线。相反,插值样条(分段多项式函数)更受欢迎。第 15.5.3 节讨论了其中的一些原因。
Figure 15.3. Interpolating polynomials through multiple points. In (a) and (b), the curve contains extra wiggles and over-shooting between points. And when the sixth point is added in (c), it completely changes the shape of the curve due to the non-local nature of interpolating polynomials.
图 15.3.通过多个点插值多项式。在 (a) 和 (b) 中,曲线包含点之间的额外摆动和过冲。当在 (c) 中添加第六个点时,由于插值多项式的非局部性质,它完全改变了曲线的形状。
Now that we’ve seen how to make individual pieces of polynomial curves, we can consider how to put these pieces together.
现在我们已经了解了如何制作多项式曲线的各个部分,我们可以考虑如何将这些部分组合在一起。
The basic idea of a piecewise parametric function is that each piece is only used over some parameter range. For example, if we want to define a function that has two piecewise linear segments that connect three points (as shown in Figure 15.4(a)), we might define
分段参数函数的基本思想是,每个部分只在某些参数范围内使用。例如,如果我们想定义一个函数,该函数具有连接三个点的两个分段线性段(如图 15.4(a)所示),我们可以定义
Figure 15.4. (a) Two line segments connect three points; (b) the blending functions for each of the points are graphed at right.
图 15.4. (a) 两条线段连接三个点;(b) 每个点的混合函数如右图所示。
where f1 and f2 are functions for each of the two line segments. Notice that we have rescaled the parameter for each of the pieces to facilitate writing their equations as
其中f 1和f 2是两条线段的函数。请注意,我们重新调整了每段的参数,以便将其方程写为
For each polynomial in our piecewise function, there is a site (or parameter value) where it starts and ends. Sites where a piece function begins or ends are called knots. For the example in Equation (15.13), the values of the knots are 0, 0.5, and 1.
对于分段函数中的每个多项式,都有一个起点和终点(或参数值)。分段函数的起点和终点称为节点。对于公式 (15.13) 中的例子,节点的值为 0、0.5 和 1。
We may also write piecewise polynomial functions as the sum of basis functions, each scaled by a coefficient. For example, we can rewrite the two line segments of Equation (15.13) as
我们也可以将分段多项式函数写成基函数之和,每个基函数都用一个系数缩放。例如,我们可以将方程 (15.13) 中的两条线段重写为
where the function b1(u) is defined as
其中函数b 1 ( u ) 定义为
and b2 and b3 are defined similarly. These functions are plotted in Figure 15.4(b).
并且b 2和b 3的定义类似。这些函数绘制在图 15.4(b)中。
The knots of a polynomial function are the combination of the knots of all of the pieces that are used to create it. The knot vector is a vector that stores all of the knot values in ascending order.
多项式函数的节点是用于创建它的所有部分的节点的组合。节点向量是按升序存储所有节点值的向量。
Notice that in this section we have used two different mechanisms for combining polynomial pieces: using independent polynomial pieces for different ranges of the parameter and blending together piecewise polynomial functions.
请注意,在本节中,我们使用了两种不同的机制来组合多项式段:对不同范围的参数使用独立的多项式段,以及将分段多项式函数混合在一起。
In Section 15.3, we defined pieces of polynomials over the unit parameter range. If we want to assemble these pieces, we need to convert from the parameter of the overall function to the value of the parameter for the piece. The simplest way to do this is to define the overall curve over the parameter range [0,n] where n is the number of segments. Depending on the value of the parameter, we can shift it to the required range.
在第 15.3 节中,我们定义了单位参数范围内的多项式片段。如果我们想要组装这些片段,我们需要将整体函数的参数转换为片段的参数值。最简单的方法是在参数范围 [0 ,n ] 上定义整体曲线,其中n是段数。根据参数的值,我们可以将其移动到所需的范围。
If we want to make a single curve from two line segments, we need to make sure that the end of the first line segment is at the same location as the beginning of the next. There are three ways to connect the two segments (in order of simplicity):
如果我们想用两条线段画一条曲线,我们需要确保第一条线段的末端与下一条线段的起点位于同一位置。连接两条线段的方法有三种(按简单顺序):
Represent the line segment as its two endpoints, and then use the same point for both. We call this a shared-point scheme.
将线段表示为其两个端点,然后对两个端点使用同一个点。我们称之为共享点方案。
Copy the value of the end of the first segment to the beginning of the second segment every time that the parameters of the first segment change. We call this a dependency scheme.
每次第一个段的参数发生变化时,将第一个段末尾的值复制到第二个段的开头。我们称之为依赖方案。
Write an explicit equation for the connection, and enforce it through numerical methods as the other parameters are changed.
为连接写出一个明确的方程,并在其他参数改变时通过数值方法强制执行它。
While the simpler schemes are preferable since they require less work, they also place more restrictions on the way the line segments are parameterized. For example, if we want to use the center of the line segment as a parameter (so that the user can specify it directly), we will use the beginning of each line segment and the center of the line segment as their parameters. This will force us to use the dependency scheme.
虽然更简单的方案更可取,因为它们需要的工作较少,但它们也对线段的参数化方式施加了更多限制。例如,如果我们想使用线段的中心作为参数(以便用户可以直接指定它),我们将使用每个线段的开头和线段的中心作为它们的参数。这将迫使我们使用依赖方案。
Notice that if we use a shared-point or dependency scheme, the total number of control points is less than n * m, where n is the number of segments and m is the number of control points for each segment; many of the control points of the independent pieces will be computed as functions of other pieces. Notice that if we use either the shared-point scheme for lines (each segment uses its two endpoints as parameters and shares interior points with its neighbors), or if we use the dependency scheme (such as the example one with the first endpoint and midpoint), we end up with n + 1 controls for an n-segment curve.
请注意,如果我们使用共享点或依赖方案,控制点总数将小于n * m,其中n是段数, m是每个段的控制点数;独立部分的许多控制点将作为其他部分的函数进行计算。请注意,如果我们对线使用共享点方案(每个段使用其两个端点作为参数并与其相邻段共享内部点),或者如果我们使用依赖方案(例如具有第一个端点和中点的示例方案),我们最终会得到n段曲线的n + 1 个控制点。
Dependency schemes have a more serious problem. A change in one place in the curve can propagate through the entire curve. This is called a lack of locality.Locality means that if you move a point on a curve it will only affect a local region. The local region might be big, but it will be finite. If a curve’s controls do not have locality, changing a control point may affect points infinitely far away.
依赖方案存在更严重的问题。曲线上一个位置的更改可能会影响整个曲线。这称为缺乏局部性。局部性意味着,如果移动曲线上的一个点,它只会影响局部区域。局部区域可能很大,但它是有限的。如果曲线的控制点没有局部性,则更改控制点可能会影响无限远的点。
To see locality, and the lack thereof, in action, consider two chains of line segments, as shown in Figure 15.5. One chain has its pieces parameterized by their endpoints and uses point-sharing to maintain continuity. The other has its pieces parameterized by an endpoint and midpoint and uses dependency propagation to keep the segments together. The two segment chains can represent the same curves: they are both a set of n connected line segments. However, because of locality issues, the endpoint-shared form is likely to be more convenient for the user. Consider changing the position of the first control point in each chain. For the endpoint-shared version, only the first segment will change, while all of the segments will be affected in the midpoint version, as in Figure 15.5. In fact, for any point moved in the endpoint-shared version, at most two line segments will change. In the midpoint version, all segments after the control point that is moved will change, even if the chain is infinitely long.
要实际了解局部性及其缺乏,请考虑两条线段链,如图 15.5所示。一条链的各部分由其端点参数化,并使用点共享来保持连续性。另一条链的各部分由端点和中点参数化,并使用依赖传播来使线段保持在一起。两条线段链可以表示相同的曲线:它们都是一组n 个相连的线段。但是,由于局部性问题,端点共享形式可能对用户更方便。考虑更改每条链中第一个控制点的位置。对于端点共享版本,只有第一条线段会发生变化,而在中点版本中,所有线段都会受到影响,如图15.5所示。事实上,对于端点共享版本中移动的任何点,最多只有两条线段会发生变化。在中点版本中,即使链无限长,移动的控制点之后的所有线段都会发生变化。
Figure 15.5. A chain of line segments with local control and one with non-local control.
图 15.5.具有局部控制的线段链和具有非局部控制的线段链。
In this example, the dependency propagation scheme was the one that did not have local control. This is not always true. There are direct sharing schemes that are not local and propagation schemes that are local.
在此示例中,依赖传播方案是没有本地控制的方案。但情况并非总是如此。有些直接共享方案不是本地的,而有些传播方案是本地的。
We emphasize that locality is a convenience of control issue. While it is inconvenient to have the entire curve change every time, the same changes can be made to the curve. It simply requires moving several points in unison.
我们强调局部性是为了便于控制。虽然每次都改变整个曲线很不方便,但可以对曲线进行相同的更改。只需同时移动几个点即可。
In graphics, when we represent curves using piecewise polynomials, we usually use either line segments or cubic polynomials for the pieces. There are a number of reasons why cubics are popular in computer graphics:
在图形学中,当我们使用分段多项式表示曲线时,我们通常使用线段或三次多项式来表示分段。三次多项式在计算机图形学中流行的原因有很多:
Piecewise cubic polynomials allow for C2 continuity, which is generally sufficient for most visual tasks. The C1 smoothness that quadratics offer is often insufficient. The greater smoothness offered by higher-order polynomials is rarely important.
分段三次多项式允许C 2连续性,这通常足以满足大多数视觉任务的要求。二次多项式提供的C 1平滑度通常不够。高阶多项式提供的更高平滑度很少重要。
Cubic curves provide the minimum-curvature interpolants to a set of points. That is, if you have a set of n +3 points and define the “smoothest” curve that passes through them (that is the curve that has the minimum curvature over its length), this curve can be represented as a piecewise cubic with nsegments.
三次曲线为点集提供最小曲率插值。也就是说,如果您有一组n +3 个点,并定义经过这些点的“最平滑”曲线(即其长度上曲率最小的曲线),则该曲线可以表示为具有n 个段的分段三次曲线。
Cubic polynomials have a nice symmetry where position and derivative can be specified at the beginning and end.
三次多项式具有良好的对称性,可以在开始和结束时指定位置和导数。
Cubic polynomials have a nice tradeoff between the numerical issues in computation and the smoothness.
三次多项式在计算的数值问题和平滑度之间有很好的权衡。
Notice that we do not have to use cubics; they just tend to be a good tradeoff between the amount of smoothness and complexity. Different applications may have different tradeoffs. We focus on cubics since they are the most commonly used.
请注意,我们不必使用三次函数;它们只是在平滑度和复杂性之间取得良好的平衡。不同的应用可能有不同的平衡。我们专注于三次函数,因为它们是最常用的。
The canonical form of a cubic polynomial is
三次多项式的标准形式是
As we discussed in Section 15.3, these canonical form coefficients are not a convenient way to describe a cubic segment.
正如我们在第 15.3 节中讨论的那样,这些标准形式系数并不是描述三次线段的便捷方式。
We seek forms of cubic polynomials for which the coefficients are a convenient way to control the resulting curve represented by the cubic. One of the main conveniences will be to provide ways to ensure the connectedness of the pieces and the continuity between the segments.
我们寻求三次多项式的形式,其中的系数是控制三次多项式所表示的结果曲线的便捷方式。主要的便利之一将是提供确保各部分的连通性和各段之间的连续性的方法。
Each cubic polynomial piece requires four coefficients or control points. That means for a piecewise polynomial with n pieces, we may require up to 4n control points if no sharing between segments is done or dependencies used. More often, some part of each segment is either shared or depends on an adjacent segment, so the total number of control points is much lower. Also, note that a control point might be a position or a derivative of the curve.
每个三次多项式片段需要四个系数或控制点。这意味着对于具有n 个片段的分段多项式,如果片段之间没有共享或没有使用依赖项,我们可能需要最多 4 n 个控制点。更常见的是,每个片段的某些部分要么是共享的,要么依赖于相邻片段,因此控制点的总数要少得多。另外,请注意,控制点可能是曲线的位置或导数。
Unfortunately, there is no single “best” representation for a piecewise cubic. It is not possible to have a piecewise polynomial curve representation that has all of the following desirable properties:
不幸的是,分段三次函数没有单一的“最佳”表示。分段多项式曲线表示不可能具有以下所有理想属性:
each piece of the curve is a cubic;
每条曲线都是三次的;
the curve interpolates the control points;
曲线插值控制点;
the curve has local control;
曲线有局部控制;
the curve has C2 continuity.
曲线具有C2连续性。
We can have any three of these properties, but not all four; there are representations that have any combination of three. In this book, we will discuss cubic B-splines that do not interpolate their control points (but have local control and are C2); Cardinal splines and Catmull-Rom splines that interpolate their control points and offer local control, but are not C2; and natural cubics that interpolate and are C2, but do not have local control.
我们可以拥有这三种属性中的任意三种,但不能拥有全部四种;有些表示法可以有三种的任意组合。在本书中,我们将讨论不对控制点进行插值(但具有局部控制并且是C 2 )的三次 B 样条函数;对控制点进行插值并提供局部控制但不是C 2 的基数样条函数和 Catmull-Rom 样条函数;以及可以进行插值并且是C 2但没有局部控制的自然三次样条函数。
The continuity properties of cubics refer to the continuity between the segments (at the knot points). The cubic pieces themselves have infinite continuity in their derivatives (the way we have been talking about continuity so far). Note that if you have a lot of control points (or knots), the curve can be wiggly, which might not seem “smooth.”
三次曲线的连续性是指线段之间的连续性(在结点处)。三次曲线本身的导数具有无限连续性(我们到目前为止一直在谈论连续性)。请注意,如果您有很多控制点(或结点),曲线可能会很波动,看起来可能不“平滑”。
With a piecewise cubic curve, it is possible to create a C2 curve. To do this, we need to specify the position and first and second derivative at the beginning of each segment (so that we can make sure that it is the same as at the end of the previous segment). Notice that each curve segment receives three out of its four parameters from the previous curve in the chain. These C2 continuous chains of cubics are sometimes referred to as natural cubic splines.
使用分段三次曲线,可以创建C 2曲线。为此,我们需要指定每个段开头的位置以及一阶和二阶导数(以便确保它与上一个段结尾的位置相同)。请注意,每个曲线段从链中的前一个曲线接收四个参数中的三个。这些C 2连续三次链有时被称为自然三次样条。
For one segment of the natural cubic, we need to parameterize the cubic by the positions of its endpoints and the first and second derivative at the beginning point. The control points are therefore
对于自然立方体的一个部分,我们需要通过其端点的位置以及起始点的一阶和二阶导数来参数化立方体。因此,控制点是
Therefore, the constraint matrix is
因此,约束矩阵为
and the basis matrix is
基础矩阵为
Given a set of n control points, a natural cubic spline has n − 1 cubic segments. The first segment uses the control points to define its beginning position, ending position, and first and second derivative at the beginning. A dependency scheme copies the position, and first and second derivative of the end of the first segment for use in the second segment.
给定一组n 个控制点,自然三次样条线具有n − 1 个三次段。第一段使用控制点来定义其起始位置、终止位置以及起始处的一阶和二阶导数。依赖方案复制第一段末尾的位置以及一阶和二阶导数,以供第二段使用。
A disadvantage of natural cubic splines is that they are not local. Any change in any segment may require the entire curve to change (at least the part after the change was made). To make matters worse, natural cubic splines tend to be ill-conditioned: a small change at the beginning of the curve can lead to large changes later. Another issue is that we only have control over the derivatives of the curve at its beginning. Segments after the beginning of the curve determine their derivatives from their beginning point.
自然三次样条函数的一个缺点是它们不是局部的。任何段中的任何变化都可能需要改变整个曲线(至少是变化后的部分)。更糟糕的是,自然三次样条函数往往是病态的:曲线开头的一个小变化可能会导致以后的大变化。另一个问题是我们只能控制曲线开头的导数。曲线开头之后的段从它们的起点确定它们的导数。
Hermite cubic polynomials were introduced in Section 15.3.4. A segment of a cubic Hermite spline allows the positions and first derivatives of both of its endpoints to be specified. A chain of segments can be linked into a C1 spline by using the same values for the position and derivative of the end of one segment and for the beginning of the next.
15.3.4 节介绍了 Hermite 三次多项式。三次 Hermite 样条线的线段允许指定其两个端点的位置和一阶导数。通过将一个线段的端点和下一个线段的起点的位置和导数设置为相同的值,可以将一串线段链接成C 1样条线。
Given a set of n control points, where every other control point is a derivative value, a cubic Hermite spline contains (n − 2)/2 cubic segments. The spline interpolates the points, as shown in Figure 15.6, but can guarantee only C1 continuity.
给定一组n 个控制点,其中每个其他控制点都是导数值,三次 Hermite 样条包含 ( n − 2) / 2 个三次线段。样条对点进行插值,如图 15.6所示,但只能保证C 1连续性。
Figure 15.6. A Hermite cubic spline made up of three segments.
图 15.6.由三段构成的 Hermite 三次样条。
Hermite cubics are convenient because they provide local control over the shape, and provide C1 continuity. However, since the user must specify both positions and derivatives, a special interface for the derivatives must be provided. One possibility is to provide the user with points that represent where the derivative vectors would end if they were “placed” at the position point.
埃尔米特立方很方便,因为它们可以局部控制形状,并提供C 1连续性。但是,由于用户必须同时指定位置和导数,因此必须为导数提供特殊接口。一种可能性是向用户提供表示导数向量在“放置”在位置点时结束的位置的点。
A cardinal cubic spline is a type of C1 interpolating spline made up of cubic polynomial segments. Given a set of n control points, a cardinal cubic spline uses n – 2 cubic polynomial segments to interpolate all of its points except for the first and last.
基数三次样条函数是一种由三次多项式段组成的C 1插值样条函数。给定一组n 个控制点,基数三次样条函数使用n – 2 个三次多项式段对除第一个和最后一个点之外的所有点进行插值。
Cardinal splines have a parameter called tension that controls how “tight” the curve is between the points it interpolates. The tension is a number in the range [0, 1) that controls how the curve bends toward the next control point. For the important special case of t = 0, the splines are called Catmull-Rom splines.
基数样条线有一个称为张力的参数,用于控制插值点之间的曲线的“紧密度”。张力是一个介于 [0, 1) 之间的数字,用于控制曲线如何向下一个控制点弯曲。对于t = 0 的重要特殊情况,样条线称为Catmull-Rom样条线。
Each segment of the cardinal spline uses four control points. For segment i, the points used are i, i + 1, i + 2, and i + 3 as the segments share three points with their neighbors. Each segment begins at its second control point and ends at its third control point. The derivative at the beginning of the curve is determined by the vector between the first and third control points, while the derivative at the end of the curve is given by the vector between the second and fourth points, as shown in Figure 15.7.
基数样条线的每个段使用四个控制点。对于段i ,使用的点为i 、 i + 1 、 i + 2 和i + 3 ,因为这些段与相邻段共享三个点。每个段从其第二个控制点开始,到其第三个控制点结束。曲线起点处的导数由第一个和第三个控制点之间的矢量确定,而曲线终点处的导数由第二个和第四个控制点之间的矢量确定,如图 15.7所示。
Figure 15.7. Asegment of a cardinal cubic spline interpolates its second and third control points (p2 and p3), and uses its other points to determine the derivatives at the beginning and end.
图 15.7 基数三次样条线的一段对其第二和第三个控制点( p 2和p 3 )进行插值,并利用其其他点确定起点和终点的导数。
The tension parameter adjusts how much the derivatives are scaled. Specifically, the derivatives are scaled by (1 − t)/2. The constraints on the cubic are therefore
张力参数调整导数的缩放程度。具体来说,导数按 (1 − t ) / 2 缩放。因此,立方上的约束为
Solving these equations for the control points (defining s = (1 − t)/2)gives
求解这些方程中的控制点(定义s = (1 − t ) / 2)可得
This yields the cardinal matrix
这产生了基数矩阵
Since the third point of segment i is the second point of segment i+ 1, adjacent segments of the cardinal spline connect. Similarly, the same points are used to specify the first derivative of each segment, providing C1 continuity.
由于线段i的第三点是线段i + 1 的第二点,因此基数样条的相邻线段相连。同样,相同的点用于指定每个线段的一阶导数,从而提供C 1连续性。
Cardinal splines are useful, because they provide an easy way to interpolate a set of points with C1 continuity and local control. They are only C1, so they sometimes get “kinks” in them. The tension parameter gives some control over what happens between the interpolated points, as shown in Figure 15.8, where aset of cardinal splines through a set of points is shown. The curves use the same control points, but they use different values for the tension parameters. Note that the first and last control points are not interpolated.
基数样条线很有用,因为它们提供了一种简单的方法来插值具有C 1连续性和局部控制的一组点。它们只有C 1 个点,所以有时会出现“扭结”。张力参数可以控制插值点之间发生的情况,如图 15.8所示,其中显示了一组通过一组点的基数样条线。曲线使用相同的控制点,但它们对张力参数使用不同的值。请注意,第一个和最后一个控制点未插值。
Figure 15.8. Cardinal splines through seven control points with varying values of tension parameter t.
图 15.8.通过七个控制点的基数样条线,其张力参数t的值各不相同。
Given a set of n points to interpolate, you might wonder why we might prefer to use a cardinal cubic spline (that is a set of n − 2 cubic pieces) rather than a single, order n polynomial as described in Section 15.3.6. Some of the disadvantages of the interpolating polynomial are:
给定一组n个点进行插值,您可能想知道为什么我们更喜欢使用基数三次样条(即一组n - 2 个三次样条),而不是第 15.3.6 节中描述的单个n阶多项式。插值多项式的一些缺点是:
The interpolating polynomial tends to overshoot the points, as seen in Figure 15.9. This overshooting gets worse as the number of points grows larger. The cardinal splines tend to be well behaved in between the points.
插值多项式往往会超出点数,如图15.9所示。随着点数的增加,这种超出情况会变得更糟。基数样条函数在点数之间表现良好。
Figure 15.9. Splines interpolating nine control points (marked with small crosses). The thick orange line shows an interpolating polynomial. The thin line shows a Catmull-Rom spline. The latter is made of seven cubic segments, which are each shown in alternating blue tones.
图 15.9。插值九个控制点的样条线(标有小十字)。粗橙色线表示插值多项式。细线表示 Catmull-Rom 样条线。后者由七个三次线段组成,每个线段都以交替的蓝色色调显示。
Control of the interpolating polynomial is not local. Changing a point at the beginning of the spline affects the entire spline. Cardinal splines are local: any place on the spline is affected by its four neighboring points at most.
插值多项式的控制不是局部的。改变样条曲线开头的点会影响整个样条曲线。基数样条曲线是局部的:样条曲线上的任何位置最多受其四个相邻点的影响。
Evaluation of the interpolating polynomial is not local. Evaluating a point on the polynomial requires access to all of its points. Evaluating a point on the piecewise cubic requires a fixed small number of computations, no matter how large the total number of points is.
插值多项式的求值不是局部的。求多项式上的一个点需要访问其所有点。求分段三次函数上的一个点需要固定的少量计算,无论点的总数有多大。
There are a variety of other numerical and technical issues in using interpolating splines as the number of points grows larger. See De Boor (2001) for more information.
随着点数的增加,使用插值样条函数还存在各种其他数值和技术问题。有关详细信息,请参阅 De Boor (2001)。
A cardinal spline has the disadvantage that it does not interpolate the first or last point, which can be easily fixed by adding an extra point at either end of the sequence. The cardinal spline also is not as continuous–providing only C1 continuity at the knots.
基数样条线的缺点是它不插入第一个点或最后一个点,这可以通过在序列的任一端添加一个额外的点来轻松修复。基数样条线的连续性也不强——在结点处仅提供C 1连续性。
It might seem like the easiest way to control a curve is to specify a set of points for it to interpolate. In practice, however, interpolation schemes often have undesirable properties because they have less continuity and offer no control of what happens between the points. Curve schemes that only approximate the points are often preferred. With an approximating scheme, the control points influence the shape of the curve, but do not specify it exactly. Although we give up the ability to directly specify points for the curve to pass through, we gain better behavior of the curve and local control. Should we need to interpolate a set of points, the positions of the control points can be computed such that the curve passes through these interpolation points.
控制曲线的最简单方法似乎是指定一组要插值的点。然而,在实践中,插值方案通常具有不良特性,因为它们的连续性较差,并且无法控制点之间发生的事情。通常首选仅近似点的曲线方案。使用近似方案,控制点会影响曲线的形状,但不能精确指定它。虽然我们放弃了直接指定曲线要经过的点的能力,但我们获得了更好的曲线行为和局部控制。如果我们需要插值一组点,则可以计算控制点的位置,以使曲线经过这些插值点。
The two most important types of approximating curves in computer graphics are Bézier curves and B-spline curves.
计算机图形学中两种最重要的近似曲线类型是贝塞尔曲线和B样条曲线。
Bézier curves are one of the most common representations for free-form curves in computer graphics. The curves are named for Pierre Bézier, one of the people who was instrumental in their development. Bézier curves have an interesting history where they were concurrently developed by several independent groups.
贝塞尔曲线是计算机图形学中最常见的自由曲线表示之一。这些曲线以皮埃尔·贝塞尔 (Pierre Bézier) 的名字命名,他是贝塞尔曲线开发过程中发挥了重要作用的人之一。贝塞尔曲线有着一段有趣的历史,它们是由几个独立的团队同时开发的。
ABézier curve is a polynomial curve that approximates its control points. The curves can be a polynomial of any degree. A curve of degree d is controlled by d + 1 control points. The curve interpolates its first and last control points, and the shape is directly influenced by the other points.
阿贝塞尔曲线是一条近似其控制点的多项式曲线。曲线可以是任意次数的多项式。d次曲线由d + 1 个控制点控制。曲线插入其第一个和最后一个控制点,其形状直接受其他点的影响。
Often, complex shapes are made by connecting a number of Bézier curves of low degree, and in computer graphics, cubic (d = 3)Bézier curves are commonly used for this purpose. Many popular illustration programs, such as Adobe Illus-trator, and font representation schemes, such as that used in Postscript, use cubic Bézier curves. Bézier curves are extremely popular in computer graphics because they are easy to control, have a number of useful properties, and there are very efficient algorithms for working with them.
通常,复杂形状是通过连接多条低阶贝塞尔曲线来形成的,在计算机图形学中,三次( d = 3)贝塞尔曲线通常用于此目的。许多流行的插图程序(例如 Adobe Illus-trator)和字体表示方案(例如 Postscript 中使用的方案)都使用三次贝塞尔曲线。贝塞尔曲线在计算机图形学中非常流行,因为它们易于控制,具有许多有用的特性,并且有非常有效的算法来处理它们。
Bézier curves are constructed such that:
贝塞尔曲线的构造如下:
The curve interpolates the first and last control points, with u = 0 and 1, respectively.
曲线插入第一个和最后一个控制点, u分别等于 0 和 1。
The first derivative of the curve at its beginning (end) is determined by the vector between the first and second (next to last and last) control points. The derivatives are given by the vectors between these points scaled by the degree of the curve.
曲线在其起点(终点)的一阶导数由第一个和第二个(倒数第二个和最后一个)控制点之间的矢量决定。导数由这些点之间的矢量按曲线的度数缩放给出。
Higher derivatives at the beginning (end) of the curve depend on the points at the beginning (end) of the curve. The nth derivative depends on the first (last) n + 1 points.
曲线起点(终点)的高阶导数取决于曲线起点(终点)的点。第 n个导数取决于前(后) n + 1 个点。
For example, consider the Bézier curve of degree 3 (cubic) as in Figure 15.10. The curve has four (d + 1) control points. It begins at the first control point (p0) and ends at the last (p1). The first derivative at the beginning is proportional to the vector between the first and second control points (p1 − p0). Specifically, f'(0) = 3(p1 − p0). Similarly, the first derivative at the end of the curve is given by f'(1) = 3(p3 − p2). The second derivative at the beginning of the curve can be determined from control points p0, p1 and p2.
例如,考虑图 15.10中的 3 阶(三次)贝塞尔曲线。该曲线有四个( d + 1)控制点。它始于第一个控制点( p 0 ),终于最后一个控制点( p 1 )。起始处的一阶导数与第一个和第二个控制点之间的矢量( p 1 − p 0 )成比例。具体而言, f (0) = 3( p 1 − p 0 )。类似地,曲线终点处的一阶导数由f (1) = 3( p 3 − p 2 ) 给出。曲线起始处的二阶导数可由控制点p 0 、 p 1和p 2确定。
Figure 15.10. A cubic Bézier curve is controlled by four points. It interpolates the first and last, and the beginning and final derivatives are three times the vectors between the first two (or last two) points.
图 15.10。三次贝塞尔曲线由四个点控制。它对第一个和最后一个点进行插值,并且起始和最终导数是前两个(或最后两个)点之间向量的三倍。
Using the facts about Bézier cubics in the preceding paragraph, we can use the methods of Section 15.5 to create a parametric function for them. The definitions of the beginning and end interpolation and derivatives give
利用上一段中关于贝塞尔三次函数的事实,我们可以使用第 15.5 节的方法为其创建参数函数。起始和终止插值和导数的定义如下
This can be solved for the basis matrix
这可以求解基础矩阵
and then written as
然后写成
or
或者
where the bi,3 are the Bézier blending functions of degree 3:
其中b i, 3是 3 阶贝塞尔混合函数:
Fortunately, the blending functions for Bézier curves have a special form that works for all degrees. These functions are known as the Bernstein basis polynomials and have the general form
幸运的是,贝塞尔曲线的混合函数具有适用于所有阶数的特殊形式。这些函数称为伯恩斯坦基多项式,其一般形式为
where n is the order of the Bézier curve, and k is the blending function number between 0 and n (inclusive). C(n, k) are the binomial coefficients:
其中n是贝塞尔曲线的阶数, k是 0 到n (含)之间的混合函数数。C ( n, k ) 是二项式系数:
Given the positions of the control points pk, the function to evaluate the Bézier curve of order n (with n + 1 control points) is
给定控制点pk的位置,求n阶贝塞尔曲线(有n + 1 个控制点)的函数为
Some Bézier segments are shown in Figure 15.11.
图 15.11显示了一些贝塞尔线段。
Figure 15.11. Various Bézier segments of degree 2–6. The control points are shown with crosses, and the control polygons (line segments connecting the control points) are also shown.
图 15.11。2至 6 度的各种贝塞尔线段。控制点以十字表示,还显示了控制多边形(连接控制点的线段)。
Bézier segments have several useful properties:
贝塞尔线段有几个有用的属性:
The curve is bounded by the convex hull of the control points.
曲线由控制点的凸包所界定。
Any line intersects the curve no more times than it intersects the set of line segments connecting the control points. This is called the variation diminishing property. This property is illustrated in Figure 15.12.
任何直线与曲线的交点数都不会多于它与连接控制点的线段集的交点数。这被称为变异递减性质。该性质如图 15.12所示。
Figure 15.12. The variation diminishing property of Bézier curves means that the curve does not cross a line more than its control polygon does. Therefore, if the control polygon has no “wiggles,” the curve will not have them either. B-splines (Section 15.6.2) also have this property.
图 15.12。贝塞尔曲线的变化递减特性意味着曲线与线的交点不会超过其控制多边形与线的交点。因此,如果控制多边形没有“摆动”,曲线也不会有“摆动”。B 样条线(第 15.6.2 节)也具有此特性。
The curves are symmetric: reversing the order of the control points yields the same curve, with a reversed parameterization.
曲线是对称的:反转控制点的顺序会产生相同的曲线,但具有反转的参数化。
The curves are affine invariant. This means that translating, scaling, rotating, or skewing the control points is the same as performing those operations on the curve itself.
曲线具有仿射不变性。这意味着平移、缩放、旋转或倾斜控制点与在曲线本身上执行这些操作相同。
There are good simple algorithms for evaluating and subdividing Bézier curves into pieces that are themselves Bézier curves. Because subdivision can be done effectively using the algorithm described later, a divide and conquer approach can be used to create effective algorithms for important tasks such as rendering Bézier curves, approximating them with line segments, and determining the intersection between two curves.
有一些简单且不错的算法可用于评估贝塞尔曲线并将其细分为贝塞尔曲线本身。由于可以使用后面介绍的算法有效地进行细分,因此可以使用分而治之的方法为重要任务创建有效的算法,例如渲染贝塞尔曲线、用线段近似贝塞尔曲线以及确定两条曲线之间的交点。
When Bézier segments are connected together to make a spline, connectivity between the segments is created by sharing the endpoints. However, continuity of the derivatives must be created by positioning the other control points. This provides the user of a Bézier spline with control over the smoothness. For G1 continuity, the second-to-last point of the first curve and the second point of the second curve must be collinear with the equated endpoints. For C1 continuity, the distances between the points must be equal as well. This is illustrated in Figure 15.13. Higher degrees of continuity can be created by properly positioning more points.
当贝塞尔线段连接在一起形成样条曲线时,线段之间的连接是通过共享端点来创建的。但是,必须通过定位其他控制点来创建导数的连续性。这为贝塞尔样条曲线的用户提供了对平滑度的控制。对于G 1连续性,第一条曲线的倒数第二个点和第二条曲线的第二个点必须与相等的端点共线。对于C 1连续性,点之间的距离也必须相等。如图 15.13所示。通过正确定位更多点可以创建更高程度的连续性。
Figure 15.13. Two Bézier segments connect to form a C1 spline, because the vector between the last two points of the first segment is equal to the vector between the first two points of the second segment.
图 15.13两个贝塞尔线段连接形成一条C 1样条线,因为第一条线段的最后两点之间的矢量等于第二条线段的前两点之间的矢量。
Bézier curves can be derived from geometric principles, as well as from the algebraic methods described above. We outline the geometric principles because they provides intuition on how Bézier curves work.
贝塞尔曲线既可以从几何原理推导而来,也可以从上述代数方法推导而来。我们概述了几何原理,因为它们提供了贝塞尔曲线工作原理的直观感受。
Imagine that we have a set of control points from which we want to create a smooth curve. Simply connecting the points with lines (to form the control polygon) will lead to something that is non-smooth. It will have sharp corners. We could imagine “smoothing” this polygon by cutting off the sharp corners, yielding a new polygon that is smoother, but still not “smooth” in the mathematical sense (since the curve is still a polygon, and therefore only C1). We can repeat this process, each time yielding a smoother polygon, as shown in Figure 15.14. In the limit, that is if we repeated the process infinitely many times, we would obtain a C1 smooth curve.
假设我们有一组控制点,想要基于它们创建一条平滑曲线。如果简单地用线连接这些点(形成控制多边形),那么得到的曲线将不平滑。该曲线将具有尖角。我们可以想象通过切掉尖角来“平滑”该多边形,得到一个更平滑的新多边形,但在数学意义上仍然不是“平滑”的(因为该曲线仍然是多边形,因此只有C 1 个)。我们可以重复此过程,每次都会得到一个更平滑的多边形,如图 15.14所示。在极限情况下,也就是说,如果我们无限次重复该过程,我们将获得一条C 1 个平滑曲线。
Figure 15.14. Subdivision procedure for quadratic Béziers. Each line segment is divided in half and these midpoints are connected (blue points and lines). The interior control point is moved to the midpoint of the new line segment (orange point).
图 15.14。二次贝塞尔的细分过程。每条线段被分成两半,并将这些中点连接起来(蓝色点和线)。内部控制点移动到新线段的中点(橙色点)。
What we have done with corner cutting is defining a subdivision scheme. That is, we have defined curves by a process for breaking a simpler curve into smaller pieces (e.g., subdividing it). The resulting curve is the limit curve that is achieved by applying the process infinitely many times. If the subdivision scheme is defined correctly, the result will be a smooth curve, and it will have a parametric form.
我们对切角所做的就是定义细分方案。也就是说,我们通过将较简单的曲线分解成较小的部分(例如,对其进行细分)的过程来定义曲线。得到的曲线是通过无限次应用该过程实现的极限曲线。如果正确定义了细分方案,则结果将是一条平滑的曲线,并且它将具有参数形式。
Let us consider applying corner cutting to a single corner. Given three points (p0, p1, p2), we repeatedly “cut off the corners” as shown in Figure 15.15. At each step, we divide each line segment in half, connect the midpoints, and then move the corner point to the midpoint of the new line segment. Note that in this process, new points are introduced, moved once, and then remain in this position for any remaining iterations. The endpoints never move.
让我们考虑将切角应用于单个角。给定三个点( p0 、 p1 、 p2 ),我们反复“切角”,如图 15.15所示。在每一步中,我们将每条线段分成两半,连接中点,然后将角点移动到新线段的中点。请注意,在此过程中,会引入新点,移动一次,然后在任何剩余的迭代中保持在此位置。端点永远不会移动。
Figure 15.15. By repeatedly cutting the corners off a polygon, we approach a smooth curve.
图 15.15通过反复切掉多边形的角,我们得到一条平滑的曲线。
If we compute the “new” position for p2 as the midpoint of the midpoints, we get the expression
如果我们将p2的“新”位置计算为中点的中点,则得到表达式
The construction actually works for other proportions of distance along each segment. If we let u be the distance between the beginning and the end of each segment where we place the middle point, we can rewrite this expression as
这种构造方法实际上适用于每条线段上其他比例的距离。如果我们让u表示每条线段的起点和终点之间的距离,我们可以将该表达式重写为
Regrouping terms gives the quadratic Bézier function:
重新组合项可得出二次贝塞尔函数:
One nice feature of Bézier curves is that there is a very simple and general method for computing and subdividing them. The method, called the de Casteljau algorithm, uses a sequence of linear interpolations to compute the positions along the Bézier curve of arbitrary order. It is the generalization of the subdivision scheme described in the previous section.
贝塞尔曲线的一个优点是,有一种非常简单且通用的方法来计算和细分它们。这种方法称为de Casteljau 算法,它使用一系列线性插值来计算任意阶贝塞尔曲线上的位置。它是上一节中描述的细分方案的推广。
The de Casteljau algorithm begins by connecting every adjacent set of points with lines, and finding the point on these lines that is the u interpolation, giving a set of n − 1 points. These points are then connected with straight lines, those lines are interpolated (again by u), giving a set of n − 2 points. This process is repeated until there is one point. An illustration of this process is shown in Figure 15.16.
德卡斯特里奥算法首先用线连接每个相邻的点集,然后在这些线上找到u插值的点,得到一组n - 1 个点。然后用直线连接这些点,对这些线进行插值(再次通过u 插值),得到一组n - 2 个点。重复此过程,直到得到一个点。图 15.16显示了此过程的说明。
Figure 15.16. An illustration of the de Casteljau algorithm for a cubic Bézier. The left-hand image shows the construction for u = 0.5. The right-hand image shows the construction for 0.25, 0.5, and 0.75.
图 15.16。三次贝塞尔曲线的德卡斯特里奥算法图示。左图显示u = 0.5 的构造。右图显示 0.25、0.5 和 0.75 的构造。
The process of computing a point on a Bézier segment also provides a method for dividing the segment at the point. The intermediate points computed during the de Casteljau algorithm form the new control points of the new, smaller segments, as shown in Figure 15.17.
计算贝塞尔线段上某个点的过程也提供了一种在该点处分割线段的方法。在德卡斯特里奥算法中计算出的中间点构成了新的、更小的线段的新控制点,如图 15.17所示。
Figure 15.17. The de Casteljau algorithm is used to subdivide a cubic Bézier segment. The initial points (black diamonds A, B, C, and D) are linearly interpolated to yield blue circles (AB, BC, CD), which are linearly interpolated to yield orange circles (AC, BD), which are linearly interpolated to give the point on the cubic AD. This process also has subdivided the Bézier segment with control points A,B,C,D into two Bézier segments with control points A, AB, AC, AD and AD, BD, CD, D.
图 15.17。de Casteljau 算法用于细分三次贝塞尔线段。初始点(黑色菱形 A、B、C 和 D)经过线性插值得到蓝色圆(AB、BC、CD),再经过线性插值得到橙色圆(AC、BD),最后经过线性插值得到三次 AD 上的点。此过程还将控制点为 A、B、C、D 的贝塞尔线段细分为两个控制点为 A、AB、AC、AD 和 AD、BD、CD、D 的贝塞尔线段。
The existence of a good algorithm for dividing Bézier curves makes divide-and-conquer algorithms possible. For example, when drawing a Bézier curve segment, it is easy to check if the curve is close to being a straight line because it is bounded by its convex hull. If the control points of the curve are all close to being colinear, the curve can be drawn as a straight line. Otherwise, the curve can be divided into smaller pieces, and the process can be repeated. Similar algorithms can be used for determining the intersection between two curves. Because of the existence of such algorithms, other curve representations are often converted to Bézier form for processing.
存在一个好的贝塞尔曲线分割算法,使得分治算法成为可能。例如,在绘制贝塞尔曲线段时,很容易检查该曲线是否接近直线,因为它由凸包包围。如果曲线的控制点都接近共线,则可以将曲线绘制为直线。否则,可以将曲线划分为较小的部分,然后重复该过程。可以使用类似的算法来确定两条曲线之间的交点。由于存在这样的算法,其他曲线表示通常被转换为贝塞尔形式进行处理。
B-splines provide a method for approximating a set of n points with a curve made up of polynomials of degree d that gives C(d−1) continuity. Unlike the Bézier splines of the previous section, B-splines allow curves to be generated for any desired degree of continuity (almost up to the number of points). Because of this, B-splines are a preferred way to specify very smooth curves (high degrees of continuity) in computer graphics. If we want a C2 or higher curve through an arbitrary number of points, B-splines are probably the right method.
B 样条曲线提供了一种方法,可以使用由d次多项式组成的曲线来近似一组n个点,该曲线具有C ( d −1)连续性。与上一节中的贝塞尔样条曲线不同,B 样条曲线允许生成具有任意连续度(几乎与点数相同)的曲线。因此,B 样条曲线是计算机图形学中指定非常平滑的曲线(高连续度)的首选方法。如果我们想要一条通过任意数量的点的C 2或更高的曲线,B 样条曲线可能是正确的方法。
We can represent a curve using a linear combination of B-spline basis functions. Since these basis functions are themselves splines, we call them basis splines or B-splines for short. Each B-spline or basis function is made up of a set of d + 1 polynomials each of degree d. The methods of B-splines provide general procedures for defining these functions.
我们可以使用 B 样条基函数的线性组合来表示曲线。由于这些基函数本身就是样条函数,因此我们将其简称为基样条函数或 B 样条函数。每个 B 样条函数或基函数由一组d + 1 个多项式组成,每个多项式的次数为d 。B 样条函数的方法提供了定义这些函数的一般程序。
The term B-spline specifically refers to one of the basis functions, not the function created by the linear combination of a set of B-splines. However, there is inconsistency in how the term is used in computer graphics. Commonly, a “B-spline curve” is used to mean a curve represented by the linear combination of B-splines.
B 样条这一术语特指基函数之一,而不是一组 B 样条的线性组合所创建的函数。然而,该术语在计算机图形学中的用法不一致。通常,“B 样条曲线”用于表示由 B 样条的线性组合表示的曲线。
The idea of representing a polynomial as the linear combination of other polynomials has been discussed in Section 15.3.1 and 15.3.5. Representing a spline as a linear combination of other splines was shown in Section 15.4.1. In fact, the example given is a simple case of a B-spline.
将多项式表示为其他多项式的线性组合的想法已在15.3.1 节和 15.3.5 节中讨论过。将样条表示为其他样条的线性组合已在15.4.1 节中展示。实际上,给出的示例是 B 样条的一个简单情况。
The general notation for representing a function as a linear combination of other functions is
将函数表示为其他函数的线性组合的一般符号是
where the pi are the coefficients and the bi are the basis functions. If the coefficients are points (e.g., 2 or 3 vectors), we refer to them as control points. The key to making such a method work is to define the bi appropriately. B-splines provide a very general way to do this.
其中p是系数,b 是基函数。如果系数是点(例如 2 或 3 个向量),我们将它们称为控制点。使这种方法奏效的关键是适当地定义 b。B 样条线提供了一种非常通用的方法来做到这一点。
A set of B-splines can be defined for a number of coefficients n and a parameter value k.3 The value of k is one more than the degree of the polynomials used to make the B-splines (k = d + 1.)
可以为一定数量的系数n和一个参数值 k 定义一组 B 样条线。3 k的值比用于制作 B 样条线的多项式的次数多一 ( k = d + 1 )。
B-splines are important because they provide a very general method for creating functions (that will be useful for representing curves) that have a number of useful properties. A curve with n points made with B-splines with parameter value k:
B 样条线很重要,因为它们提供了一种非常通用的方法来创建具有许多有用属性的函数(这对于表示曲线很有用)。使用参数值为k 的B 样条线绘制的具有n 个点的曲线:
is C(k−2) continuous;
是C ( k −2) 连续的;
is made of polynomials of degree k − 1;
由k − 1 次多项式组成;
has local control–any site on the curve only depends on k of the control points;
具有局部控制——曲线上的任何站点仅取决于k个控制点;
is bounded by the convex hull of the points;
由点的凸包所界定;
exhibits the variation diminishing property illustrated in Figure 15.12.
表现出图 15.12所示的变差减小特性。
A curve created using B-splines does not necessarily interpolate its control points.
使用 B 样条创建的曲线不一定会插入其控制点。
We will introduce B-splines by first looking at a specific, simple case to introduce the concepts. We will then generalize the methods and show why they are interesting. Because the method for computing B-splines is very general, we delay introducing it until we have shown what these generalizations are.
我们将首先通过一个具体的简单案例来介绍 B 样条线的概念。然后我们将概括这些方法并说明它们为什么有趣。由于计算 B 样条线的方法非常通用,因此我们推迟介绍它,直到我们展示了这些概括是什么。
Consider a set of basis functions of the following form:
考虑一组以下形式的基函数:
Each of these functions looks like a little triangular “hat” between i and i + 2 with its peak at i + 1. Each is a piecewise polynomial, with knots at i, i + 1, and i + 2. Two of them are graphed in Figure 15.18.
这些函数中的每一个看起来都像i和i + 2 之间的一个小三角“帽子”,其峰值在i + 1 处。每个函数都是一个分段多项式,在i 、 i + 1 和i + 2 处有节点。其中两个函数的图形如图 15.18 所示。
Each of these functions bi,2 is a first-degree (linear) B-spline. Because we will consider B-splines of other parameter values later, we denote these with the 2 in the subscript.
这些函数b i ,2中的每一个都是一阶(线性)B 样条函数。因为我们稍后会考虑其他参数值的 B 样条函数,所以我们用下标中的 2 来表示这些函数。
Figure 15.18. B-splines with d = 1 or k = 2.
图 15.18. d = 1 或k = 2 的 B 样条。
3 The B-spline parameter is actually the order of the polynomials used in the B-splines. While this terminology is not uniform in the literature, the use of the B-spline parameter k as a value one greater than the polynomial degree is widely used, although some texts (see the chapter notes) write all of the equations in terms of polynomial degree.
3 B 样条参数实际上是 B 样条中使用的多项式的阶数。虽然这一术语在文献中并不统一,但 B 样条参数k的使用被广泛使用为比多项式阶数大一的值,尽管有些文本(参见章节注释)将所有方程式都写成多项式阶数。
Notice that we have chosen to put the lower edge of the B-spline (its first knot) at i. Therefore, the first knot of the first B-spline (i = 1) is at 1. Iteration over the B-splines or elements of the coefficient vector is from 1 to n (see Equation 15.15). When B-splines are implemented, as well as in many other discussions of them, they often are numbered from 0 to n − 1.
请注意,我们已选择将 B 样条的下边缘(其第一个节点)置于i处。因此,第一个 B 样条( i = 1)的第一个节点位于 1 处。对 B 样条或系数向量元素的迭代从 1 到n (参见公式 15.15)。在实现 B 样条时,以及在许多其他关于 B 样条的讨论中,它们通常从 0 到n − 1 编号。
We can create a function from a set of n control points using Equation 15.15, with these functions used for the bi to create an “overall function” that was influenced by the coefficients. If we were to use these (k = 2) B-splines to define the overall function, we would define a piecewise polynomial function that linearly interpolates the coefficients pi between t = k and t = n + 1. Note that while (k = 2) B-splines interpolate all of their coefficients, B-splines of higher degree do this under some specific conditions that we will discuss in Section 15.6.3.
我们可以使用公式 15.15 从一组n 个控制点创建一个函数,这些函数用于 b 以创建一个受系数影响的“整体函数”。如果我们要使用这些( k = 2)B 样条来定义整体函数,我们将定义一个分段多项式函数,该函数在t = k和t = n + 1 之间线性插值系数p。请注意,虽然( k = 2)B 样条会插值其所有系数,但更高阶的 B 样条会在某些特定条件下执行此操作,我们将在第 15.6.3 节中讨论。
Some properties of B-splines can be seen in this simple case. We will write these in the general form using k, the parameter, and n for the number of coefficients or control points:
在这个简单的例子中,我们可以看到 B 样条的一些属性。我们将以一般形式写出这些属性,其中k为参数, n为系数或控制点的数量:
Each B-spline has k + 1 knots.
每个 B 样条线有k + 1 个节点。
Each B-spline is zero before its first knot and after its last knot.
每个 B 样条线在其第一个结点之前和最后一个结点之后都为零。
The overall spline has local control because each coefficient is only multiplied by one B-spline, and this B-spline is nonzero only between k + 1 knots.
整体样条具有局部控制,因为每个系数仅乘以一个 B 样条,并且该 B 样条仅在k + 1 个节点之间为非零。
The overall spline has n + k knots.
整体样条线有n + k个节点。
Each B-spline is C(k − 2) continuous, therefore the overall spline is C(k − 2) continuous.
每个 B 样条曲线都是C ( k − 2) 连续的,因此整体样条曲线也是C ( k − 2) 连续的。
The set of B-splines sums to 1 for all parameter values between knots k and n + 1. This range is where there are k B-splines that are nonzero. Summing to 1 is important because it means that the B-splines are shift invariant: translating the control points will translate the entire curve.
对于节点k和n + 1 之间的所有参数值,B 样条曲线集的总和为 1。此范围是k 个非零 B 样条曲线的范围。总和为 1 很重要,因为这意味着 B 样条曲线具有平移不变性:平移控制点将平移整个曲线。
Between each of its knots, the B-spline is a single polynomial of degree d = k − 1. Therefore, the overall curve (that sums these together) can also be expressed as a single, degree d polynomial between any adjacent knots.
在每个节点之间,B 样条线都是一个度为d = k − 1 的多项式。因此,整体曲线(将这些曲线相加)也可以表示为任何相邻节点之间的单个度为d 的多项式。
In this example, we have chosen the knots to be uniformly spaced. We will consider B-splines with nonuniform spacing later. When the knot spacing is uniform, each of the B-splines is identical except for being shifted. B-splines with uniform knot spacing are sometimes called uniform B-splines or periodic B-splines.
在本例中,我们选择节点间距均匀。稍后我们将考虑间距不均匀的 B 样条。当节点间距均匀时,每个 B 样条除了移位外都相同。节点间距均匀的 B 样条有时称为均匀 B 样条或周期 B 样条。
The properties of B-splines listed in the previous section were intentionally written for arbitrary n and k. A general procedure for constructing the B-splines will be provided later, but first, lets consider another specific case with k = 3.
上一节列出的 B 样条函数的性质是针对任意n和k编写的。稍后将提供构建 B 样条函数的一般过程,但首先让我们考虑另一个k = 3 的具体情况。
The B-spline b2,3 is shown in Figure 15.19. It is made of quadratic pieces (degree 2), and has three of them. It is C1 continuous and is nonzero only within the four knots that it spans. Notice that a quadratic B-spline is made of three pieces, one between knot 1 and 2, one between knot 2 and 3, and one between knot 3 and 4. In Section 15.6.3 we will see a general procedure for building these functions. For now, we simply examine these functions:
图 15.19显示了 B 样条线b 2,3 。它由二次段(2 阶)组成,共有 3 个。它是C 1连续的,并且仅在其跨越的 4 个节点内为非零。请注意,二次 B 样条线由 3 个段组成,一个在节点 1 和 2 之间,一个在节点 2 和 3 之间,一个在节点 3 和 4 之间。在第 15.6.3 节中,我们将看到构建这些函数的一般过程。现在,我们只检查这些函数:
Figure 15.19. The B-spline b2,3 with uniform knot spacing.
图 15.19.具有均匀节点间距的 B 样条线b 2,3 。
In order to make the expressions simpler, we wrote the function for each part as if it applied over the range 0 to 1.
为了使表达式更简单,我们为每个部分编写函数,就好像它适用于 0 到 1 的范围一样。
If we evaluate the overall function made from summing together the B-splines, at any time only k (3 in this case) of them are nonzero. One of them will be in the first part of Equation 15.17, one will be in the second part, and one will be in the third part. Therefore, we can think of any piece of the overall function as being made up of a degree d = k − 1 polynomial that depends on k coefficients. For the k = 3 case, we can write
如果我们评估由 B 样条相加而形成的总体函数,则在任何时候,只有k 个(在本例中为 3 个)非零。其中一个将位于公式 15.17 的第一部分,一个将位于第二部分,一个将位于第三部分。因此,我们可以将总体函数的任何部分视为由依赖于k个系数的d = k − 1 度多项式组成。对于k = 3 的情况,我们可以写成
where u = t − i. This defines the piece of the overall function when i ≤ t < i+ 1.
其中u = t − i。这定义了当i ≤ t < i + 1 时整体函数的部分。
If we have a set of n points, we can use the B-splines to create a curve. If we have seven points, we will need a set of seven B-splines. A set of seven B-splines for k = 3 is shown in Figure 15.20. Notice that there are n + k (10) knots, that the sum of the B-splines is 1 over the range k to n + 1 (knots 3 through 8). A curve specified using these B-splines and a set of points is shown in Figure 15.21.
如果我们有一组n个点,我们可以使用 B 样条线来创建曲线。如果我们有七个点,我们将需要一组七条 B 样条线。图 15.20显示了k = 3 的一组七条 B 样条线。请注意,有n + k (10)个节点,B 样条线的总和在k到n + 1 的范围内为 1(节点 3 到 8)。图15.21显示了使用这些 B 样条线和一组点指定的曲线。
Figure 15.20. The set of seven B-splines with k = 3 and uniform knot spacing
图 15.20。k = 3 且节点间距均匀的七条 B 样条线集
Figure 15.21. Curve made from seven quadratic (k = 3) B-splines, using seven control points.
图 15.21.由七条二次( k = 3)B 样条线构成的曲线,使用了七个控制点。
Because cubic polynomials are so popular in computer graphics, the special case of B-splines with k = 4 is sufficiently important that we consider it before discussing the general case. A B-spline of third degree is defined by four cubic polynomial pieces. The general process by which these pieces are determined is described later, but the result is
由于三次多项式在计算机图形学中非常流行,因此k = 4 的 B 样条线的特殊情况非常重要,因此我们在讨论一般情况之前先考虑它。三次 B 样条线由四个三次多项式片段定义。确定这些片段的一般过程将在后面描述,但结果是
This degree 3 B-spline is graphed for i = 1 in Figure 15.22.
图 15.22中绘制了i = 1 时的 3 阶 B 样条曲线。
Figure 15.22. The cubic (k = 4) B-spline with uniform knots.
图 15.22.具有均匀节点的三次( k = 4)B 样条。
We can write the function for the overall curve between knots i + 3 and i + 4 as a function of the parameter u between 0 and 1 and the four control points that influence it:
我们可以将节点i + 3 和i + 4 之间的整体曲线的函数写为 0 到 1 之间的参数u以及影响它的四个控制点的函数:
This can be rewritten using the matrix notation of the previous sections, giving a basis matrix for cubic B-splines of
这可以使用前几节的矩阵符号重写,给出三次 B 样条函数的基矩阵
Unlike the matrices that were derived from constraints in Section 15.5, this matrix is created from the polynomials that are determined by the general B-spline procedure defined in the next section.
与第 15.5 节中约束导出的矩阵不同,该矩阵是根据下一节中定义的一般 B 样条程序确定的多项式创建的。
One nice feature of B-splines is that they can be defined for any k > 1. So if we need a smoother curve, we can simply increase the value of k. This is illustrated in Figure 15.23.
B 样条曲线的一个优点是,它们可以定义为任意的k > 1。因此,如果我们需要更平滑的曲线,只需增加k的值即可。如图 15.23所示。
Figure 15.23. B-spline curves using the same uniform set of knots and the same control points, for various values of k. Note that as k increases, the valid parameter range for the curve shrinks.
图 15.23。对于不同的k值,使用相同均匀节点集和相同控制点的 B 样条曲线。请注意,随着k 的增加,曲线的有效参数范围会缩小。
So far, we have said that B-splines generalize to any k > 1 and any n ≥ d. There is one last generalization to introduce before we show how to actually compute these B-splines. B-splines are defined for any non-decreasing knot vector.
到目前为止,我们已经说过 B 样条可以推广到任何k > 1 和任何n ≥ d 。在我们展示如何实际计算这些 B 样条之前,还有最后一个推广要介绍。B 样条适用于任何非递减节点向量。
For a given n and k, the set of B-splines (and the function created by their linear combination) has n + k knots. We can write the value of these knots as a vector, that we will denote as t. For the uniform B-splines, the knot vector is [1, 2, 3,...,n + k]. However, B-splines can be generated for any knot vector of length n + k, providing the values are non-decreasing (e.g., ti+1 ≥ ti).
对于给定的n和k, B 样条线集(以及由它们的线性组合创建的函数)有n + k个节点。我们可以将这些节点的值写成一个向量,我们将其表示为t 。对于均匀 B 样条线,节点向量为 [1, 2, 3 ,...,n + k ]。但是,只要值不减少(例如, t i +1 ≥ t),就可以为长度为n + k的任何节点向量生成 B 样条线。
There are two main reasons why nonuniform knot spacing is useful: it gives us control over what parameter range of the overall function each coefficient affects, and it allows us to repeat knots (e.g., create knots with no spacing in between) in order to create functions with different properties around these points. The latter will be considered later in this section.
非均匀节点间距之所以有用,主要有两个原因:它使我们能够控制每个系数影响的整个函数的参数范围,并且它允许我们重复节点(例如,创建中间没有间距的节点),以便围绕这些点创建具有不同属性的函数。后者将在本节后面讨论。
The ability to specify knot values for B-splines is similar to being able to specify the interpolation sites for interpolating spline curves. It allows us to associate curve features with parameter values. By specifying a nonuniform knot vector, we specify what parameter range each coefficient of a B-spline curve affects. Remember that B-spline i is nonzero only between knot i and knot i + k. Therefore, the coefficient associated with it only affects the curve between these parameter values.
为 B 样条曲线指定节点值的能力类似于能够为插值样条曲线指定插值点。它允许我们将曲线特征与参数值关联起来。通过指定非均匀节点向量,我们可以指定 B 样条曲线的每个系数影响的参数范围。请记住,B 样条曲线i仅在节点i和节点i + k之间为非零。因此,与其关联的系数仅影响这些参数值之间的曲线。
One place where control over knot values is particularly useful is in inserting or deleting knots near the beginning of a sequence. To illustrate this, consider a curve defined using linear B-splines (k = 2) as discussed in Section 15.6.2. For n = 4, the uniform knot vector is [1, 2, 3, 4, 5, 6]. This curve is controlled by a set of four points and spans the parameter range t = 2 to t = 5. The “end” of the curve (t = 5) interpolates the last control point. If we insert a new point in the middle of the point set, we would need a longer knot vector. The locality properties of the B-splines prevent this insertion from affecting the values of the curve at the ends. The longer curve would still interpolate its last control point at its end. However, if we chose to keep the uniform knot spacing, the new knot vector would be [1, 2, 3, 4, 5, 6, 7]. The end of the curve would be at t = 6, and the parameter value at which the last control point is interpolated will be a different parameter value than before the insertion. With nonuniform knot spacing, we can use the knot vector [1, 2, 3, 3.5, 4, 5, 6] so that the ends of the curve are unaffected by the change. The abilities to have nonuniform knot spacing makes the locality property of B-splines an algebraic property, as well as a geometric one.
控制节点值特别有用的一个地方是在序列开头附近插入或删除节点。为了说明这一点,考虑一条使用线性 B 样条( k = 2)定义的曲线,如第 15.6.2 节所述。对于n = 4,均匀节点向量为 [1, 2, 3, 4, 5, 6]。该曲线由一组四个点控制,跨越参数范围t = 2 到t = 5。曲线的“端点”( t = 5)插入最后一个控制点。如果我们在点集中间插入一个新点,则需要一个更长的节点向量。B 样条的局部性可防止这种插入影响曲线两端的值。较长的曲线仍会在其端点插入其最后一个控制点。但是,如果我们选择保持均匀的节点间距,则新的节点向量将为 [1, 2, 3, 4, 5, 6, 7]。曲线的末端将位于t = 6,最后一个控制点插入时的参数值将与插入前的参数值不同。对于非均匀节点间距,我们可以使用节点向量 [1, 2, 3, 3.5, 4, 5, 6],这样曲线的末端就不会受到变化的影响。具有非均匀节点间距的能力使 B 样条的局部性成为代数性质,也是几何性质。
We now introduce the general method for defining B-splines. Given values for the number of coefficients n, the B-spline parameter k, and the knot vector t (which has length n + k), the following recursive equations define the B-splines:
现在我们介绍定义 B 样条的一般方法。给定系数个数n、 B 样条参数k和节点向量t (长度为n + k )的值,以下递归方程定义 B 样条:
This equation is know as the Cox–de Boor recurrence. It may be used to compute specific values for specific B-splines. However, it is more often applied algebraically to derive equations such as Equation 15.17 or 15.18.
这个方程称为Cox–de Boor 递归。它可用于计算特定 B 样条函数的特定值。然而,它更常用于代数推导方程,例如方程 15.17 或 15.18。
As an example, consider how we would have derived Equation 15.17. Using a uniform knot vector [1, 2, 3,...], ti = i, and the value k = 3 in Equation 15.20 yields
举个例子,考虑一下如何推导出公式 15.17。使用均匀节点向量 [1, 2, 3,...], t = i ,公式 15.20 中的值k = 3 得出
Continuing the recurrence, we must evaluate the recursive expressions:
继续递归,我们必须评估递归表达式:
Inserting these results into Equation 15.22 gives:
将这些结果代入公式 15.22 可得出:
To see that this expression is equivalent to Equation 15.17, we note that each of the (k = 1) B-splines is like a switch, turning on only for a particular parameter range. For instance, bi,1 is only nonzero between i and i + 1. So, if i ≤ t < i + 1, only the first of the (k = 1) B-splines in the expression is nonzero, so
为了证明此表达式等同于公式 15.17,我们注意到每个 ( k = 1) B 样条线就像一个开关,仅在特定参数范围内打开。例如, b i ,1仅在i和i + 1 之间为非零值。因此,如果i ≤ t < i + 1,则表达式中只有第一个 ( k = 1) B 样条线为非零值,因此
Similar manipulations give the other parts of Equation 15.17.
类似的操作可以得到公式 15.17 的其他部分。
While B-splines have many nice properties, functions defined using them generally do not interpolate the coefficients. This can be inconvenient if we are using them to define a curve that we want to interpolate a specific point. We give a brief overview of how to interpolate a specific point using B-splines here. A more complete discussion can be found in the books listed in the chapter notes.
虽然 B 样条具有许多优良特性,但使用它们定义的函数通常不会对系数进行插值。如果我们使用它们来定义想要插值特定点的曲线,这可能会很不方便。我们在这里简要概述了如何使用 B 样条插值特定点。在章节注释中列出的书籍中可以找到更完整的讨论。
One way to cause B-splines to interpolate their coefficients is to repeat knots. If all of the interior knots for a particular B-spline have the same value, then the overall function will interpolate this B-spline’s coefficient. An example of this is shown in Figure 15.24.
使 B 样条函数插值其系数的一种方法是重复节点。如果特定 B 样条函数的所有内部节点都具有相同的值,则整体函数将插值此 B 样条函数的系数。图 15.24显示了此示例。
Figure 15.24. A curve parameterized by quadratic B-splines (k = 3) with seven control points. In (a) a uniform knot vector [1,2,3,4,5,6,7,8,9,10] is used, while in (b) the nonuniform knot vector [1,2,3,4,4,6,7,8,8,10] is used. The duplication of the 4th and 8th knots means that all interior points of the 3rd and 7th B-spline are equal, so the curve interpolates the control points associated with those knots.
图 15.24。由具有七个控制点的二次 B 样条 (k = 3) 参数化的曲线。在 (a) 中使用均匀节点向量 [1,2,3,4,5,6,7,8,9,10],而在 (b) 中使用非均匀节点向量 [1,2,3,4,4,6,7,8,8,10]。第 4 个和第 8 个节点的重复意味着第 3 个和第 7 个 B 样条的所有内部点都相等,因此曲线会插入与这些节点相关的控制点。
Interpolation by repeated knots comes at a high cost: it removes the smoothness of the B-spline and the resulting overall function and represented curve.
通过重复节点进行插值的成本很高:它消除了 B 样条线的平滑度以及由此产生的整体函数和表示曲线。
However, at the beginning and end of the spline, where continuity is not an issue, knot repetition is useful for creating endpoint interpolating B-splines. While the first (or last) knot’s value is not important for interpolation, for simplicity, we make the first (or last) k knots have the same value to achieve interpolation.
然而,在样条线的开始和结束处,连续性不是问题,节点重复对于创建端点插值 B 样条线很有用。虽然第一个(或最后一个)节点的值对于插值并不重要,但为简单起见,我们让前(或最后一个) k 个节点具有相同的值以实现插值。
Endpoint interpolating quadratic B-splines are shown in Figure 15.25. The first two and last two B-splines are different than the uniform ones. Their expressions can be derived through the use of the Cox–de Boor recurrence:
端点插值二次 B 样条如图 15.25所示。前两个和最后两个 B 样条与均匀样条不同。它们的表达式可以通过使用 Cox-de Boor 递归得出:
Figure 15.25. Endpoint interpolating quadratic (k = 3) B-splines, for n = 8. The knot vector is [0,0,0,1,2,3,4,5,6,6,6]. The first and last two B-splines are aperiodic, while the middle four (shown as dotted lines) are periodic and identical to the ones in Figure 15.20.
图 15.25.端点插值二次( k = 3)B 样条, n = 8。节点向量为 [0,0,0,1,2,3,4,5,6,6,6]。第一个和最后一个 B 样条是非周期性的,而中间四个(显示为虚线)是周期性的,与图 15.20中的相同。
Despite all of the generality B-splines provide, there are some functions that cannot be exactly represented using them. In particular, B-splines cannot represent conic sections. To represent such curves, a ratio of two polynomials is used. Nonuniform B-splines are used to represent both the numerator and the denominator. The most general form of these are nonuniform rational B-splines, or NURBS for short.
尽管 B 样条函数具有所有通用性,但有些函数无法用它们精确表示。特别是,B 样条函数不能表示圆锥曲线。为了表示此类曲线,需要使用两个多项式的比率。非均匀 B 样条函数用于表示分子和分母。其中最通用的形式是非均匀有理 B 样条函数,简称 NURBS。
NURBS associate a scalar weight hi with every control point pi and use the same B-splines for both:
NURBS 将标量权重 h 与每个控制点 p 关联,并对两者使用相同的 B 样条:
where bi,k,t are the B-splines with parameter k and knot vector t.
其中b i,k, t是带有参数k和节点向量t的 B 样条曲线。
NURBS are very widely used to represent curves and surfaces in geometric modeling because of the amazing versatility they provide, in addition to the useful properties of B-splines.
NURBS 除具有 B 样条的有用特性外,还具有惊人的多功能性,因此被广泛用于表示几何造型中的曲线和曲面。
In this chapter, we have discussed a number of representations for free-form curves. The most important ones for computer graphics are:
在本章中,我们讨论了自由曲线的多种表示方法。对于计算机图形学来说,最重要的是:
Cardinal splines use a set of cubic pieces to interpolate control points. They are generally preferred to interpolating polynomials because they are local and easier to evaluate.
基数样条函数使用一组三次样条函数来插值控制点。它们通常比插值多项式更受欢迎,因为它们是局部的并且更容易求值。
Bézier curves approximate their control points and have many useful properties and associated algorithms. For this reason, they are popular in graphics applications.
贝塞尔曲线近似于其控制点,具有许多有用的属性和相关算法。因此,它们在图形应用中很受欢迎。
B-spline curves represent the curve as a linear combination of B-spline functions. They are general and have many useful properties such as being bounded by their convex hull and being variation diminishing. B-splines are often used when smooth curves are desired.
B 样条曲线将曲线表示为 B 样条函数的线性组合。它们很通用,并且具有许多有用的特性,例如受凸包约束和方差减小。当需要平滑曲线时,通常会使用 B 样条曲线。
The problem of representing shapes mathematically is an entire field unto itself, generally known as geometric modeling. Representing curves is just the beginning and is generally a precursor to modeling surfaces and solids. A more thorough discussion of curves can be found in most geometric modeling texts, see for example Geometric Modeling (Mortenson, 1985) for a text that is accessible to computer graphics students. Many geometric modeling books specifically focus on smooth curves and surfaces. Texts such as An Introduction to Splines for Use in Computer Graphics (Bartels, Beatty, & Barsky, 1987), Curves and Surfaces for CAGD: A Practical Guide (Farin, 2002) and Geometric Modeling with Splines:
用数学方法表示形状的问题本身就是一个完整的领域,通常称为几何建模。表示曲线只是开始,通常是建模表面和实体的前提。在大多数几何建模教材中都可以找到关于曲线的更深入的讨论,例如,参见《几何建模》 (Mortenson,1985 年),这是一本计算机图形学学生可以阅读的教材。许多几何建模书籍专门关注平滑曲线和曲面。诸如《计算机图形学样条函数简介》 (Bartels、Beatty 和 Barsky,1987 年)、 《CAGD 曲线和曲面:实用指南》 (Farin,2002 年)和《使用样条函数进行几何建模》等教材:
An Introduction (E. Cohen, Riesenfeld, & Elber, 2001) provide considerable detail about curve and surface representations. Other books focus on the mathematics of splines; A Practical Guide to Splines (De Boor, 2001) is a standard reference.
《引言》 (E. Cohen、Riesenfeld 和 Elber,2001 年)提供了有关曲线和曲面表示的大量详细信息。其他书籍则侧重于样条函数的数学; 《样条函数实用指南》 (De Boor,2001 年)是一本标准参考书。
The history of the development of curve and surface representations is complex, see the chapter by Farin in Handbook of Computer Aided Geometric Design (Farin, Hoschek, & Kim, 2002) or the book on the subject An Introduction to NURBS: With Historical Perspective (D. F. Rogers, 2000) for a discussion.
曲线和曲面表示的发展历史非常复杂,有关讨论请参阅 Farin 在《计算机辅助几何设计手册》 (Farin、Hoschek 与 Kim,2002 年)中的章节或有关NURBS 简介:从历史角度看(DF Rogers,2000 年)的书籍。
Many ideas were independently developed by multiple groups who approached the problems from different disciplines. Because of this, it can be difficult to attribute ideas to a single person or to point at the “original” sources. It has also led to a diversity of notation, terminology, and ways of introducing the concepts in the literature.
许多想法是由多个研究不同学科问题的团队独立开发的。正因为如此,很难将想法归因于一个人或指出“原始”来源。这也导致了文献中符号、术语和引入概念的方式的多样性。
For Exercises 1–4, find the constraint matrix, the basis matrix, and the basis functions. To invert the matrices you can use a program such as MATLAB or OCTAVE (a free MATLAB-like system).
对于练习 1-4,找到约束矩阵、基矩阵和基函数。要求逆矩阵,您可以使用 MATLAB 或 OCTAVE(一个免费的类似 MATLAB 的系统)等程序。
1. A line segment: parameterized with p0 located 25% of the way along the segment (u = 0.25), and p1 located 75% of the way along the segment.
1.线段:参数化p 0位于线段长度的 25%( u = 0.25), p 1位于线段长度的 75%。
2. A quadratic: parameterized with p0 as the position of the beginning point (u = 0), p1, the first derivative at the beginning point, and p2, the second derivative at the beginning point.
2.二次方程:以p 0为起点位置( u = 0), p 1为起点的一阶导数, p 2为起点的二阶导数。
3. A cubic: its control points are equally spaced (p0 has u = 0, p1 has u = 1/3, p2 has u = 2/3,and p3 has u = 1).
3.立方体:其控制点间距相等( p 0的u = 0, p 1的u = 1/3, p 2的u = 2/3, p 3的u = 1)。
4. A quintic: (a degree five polynomial, so the matrices will be 6×6)where p0 is the beginning position, p1 is the beginning derivative, p2 is the middle (u = 0.5) position, p3 is the first derivative at the middle, p4 is the position at the end, and p5 is the first derivative at the end.
4.五次多项式:(五次多项式,因此矩阵为 6×6)其中p 0是起始位置, p 1是起始导数, p 2是中间( u = 0.5)位置, p 3是中间的一阶导数, p 4是末尾位置, p 5是末尾的一阶导数。
5. The Lagrange form (Equation (15.12)) can be used to represent the interpolating cubic of Exercise 3. Use it at several different parameter values to confirm that it does produce the same results as the basis functions derived in Exercise 3.
5.拉格朗日形式(公式 (15.12))可用于表示练习 3 的插值三次函数。在几个不同的参数值下使用它来确认它确实产生与练习 3 中推导出的基函数相同的结果。
6. Devise an arc-length parameterization for the curve represented by the parametric function
6.设计一个由参数函数表示的曲线的弧长参数化
7. Given the four control points of a segment of a Hermite spline, compute the control points of an equivalent Bézier segment.
7.给定 Hermite 样条线段的四个控制点,计算等效 Bézier 线段的控制点。
8. Use the de Casteljau algorithm to evaluate the position of the cubic Bézier curve with its control points at (0,0), (0,1), (1,1) and (1,0) for parameter values u = 0.5 and u = 0.75. Drawing a sketch will help you do this.
8.使用 de Casteljau 算法计算三次贝塞尔曲线的位置,其控制点位于 (0,0)、(0,1)、(1,1) 和 (1,0),参数值为u = 0.5 和u = 0.75。绘制草图将有助于您完成此操作。
9. Use the Cox–de Boor recurrence to derive Equation (15.16).
9.利用Cox-de Boor递推公式(15.16)。
Michael Ashikhmin
Animation is derived from the Latin anima and means the act, process, or result of imparting life, interest, spirit, motion, or activity. Motion is a defining property of life and much of the true art of animation is about how to tell a story, show emotion, or even express subtle details of human character through motion. A computer is a secondary tool for achieving these goals–it is a tool which a skillful animator can use to help get the result he wants faster and without concentrating on technicalities in which he is not interested. Animation without computers, which is now often called “traditional” animation, has a long and rich history of its own which is continuously being written by hundreds of people still active in this art. As in any established field, some time-tested rules have been crystallized which give general high-level guidance to how certain things should be done and what should be avoided. These principles of traditional animation apply equally to computer animation, and we will discuss some of them in this chapter.
动画源自拉丁语anima ,意为赋予生命、兴趣、精神、动作或活动的行为、过程或结果。动作是生命的决定性属性,而真正的动画艺术很大程度上是关于如何通过动作讲述故事、表达情感,甚至表达人类性格的微妙细节。计算机是实现这些目标的辅助工具——熟练的动画师可以使用它来帮助更快地获得想要的结果,而无需专注于他不感兴趣的技术细节。不使用计算机的动画,现在通常被称为“传统”动画,它本身有着悠久而丰富的历史,数百名仍然活跃在这一艺术领域的人不断对其进行创作。与任何既定领域一样,一些经过时间考验的规则已经结晶,这些规则为某些事情应该如何做以及应该避免什么提供了一般性的高级指导。这些传统动画的原则同样适用于计算机动画,我们将在本章中讨论其中的一些原则。
The computer, however, is more than just a tool. In addition to making the animator’s main task less tedious, computers also add some truly unique abilities that were simply not available or were extremely difficult to obtain before. Modern modeling tools allow the relatively easy creation of detailed three-dimensional models, rendering algorithms can produce an impressive range of appearances, from fully photorealistic to highly stylized, powerful numerical simulation algorithms can help to produce desired physics-based motion for particularly hard to animate objects, and motion capture systems give the ability to record and use real-life motion. These developments led to an exploding use of computer animation techniques in motion pictures and commercials, automotive design and architecture, medicine and scientific research, among many other areas. Completely new domains and applications have also appeared including fully computer-animated feature films, virtual/augmented reality systems, and, of course, computer games.
然而,计算机不仅仅是一种工具。除了使动画师的主要任务变得不那么繁琐之外,计算机还增加了一些以前根本不存在或极难获得的真正独特的能力。现代建模工具可以相对轻松地创建详细的三维模型,渲染算法可以产生令人印象深刻的外观,从完全逼真到高度风格化,强大的数值模拟算法可以帮助为特别难以动画的对象产生所需的基于物理的运动,而运动捕捉系统可以记录和使用真实运动。这些发展导致计算机动画技术在电影和商业广告、汽车设计和建筑、医学和科学研究等许多领域的应用激增。还出现了全新的领域和应用,包括完全由计算机制作的动画电影、虚拟/增强现实系统,当然还有电脑游戏。
Other chapters of this book cover many of the developments mentioned above (for example, geometric modeling and rendering) more directly. Here, we will provide an overview only of techniques and algorithms directly used to create and manipulate motion. In particular, we will loosely distinguish and briefly describe four main computer animation approaches:
本书的其他章节更直接地介绍了上面提到的许多发展(例如几何建模和渲染)。在这里,我们将仅概述直接用于创建和操纵运动的技术和算法。特别是,我们将粗略地区分并简要描述四种主要的计算机动画方法:
Keyframing gives the most direct control to the animator who provides necessary data at some moments in time and the computer fills in the rest.
关键帧为动画师提供了最直接的控制权,动画师在某些时刻提供必要的数据,然后计算机填充其余部分。
Procedural animation involves specially designed, often empirical, mathematical functions and procedures whose output resembles some particular motion.
程序动画涉及专门设计的、通常是经验的数学函数和程序,其输出类似于某些特定的运动。
Physics-based techniques solve differential equation of motion.
基于物理的技术解决运动微分方程。
Motion capture uses special equipment or techniques to record real-world motion and then transfers this motion into that of computer models.
动作捕捉使用特殊的设备或技术来记录现实世界的动作,然后将该动作转换为计算机模型的动作。
We do not touch upon the artistic side of the field at all here. In general, we cannot possibly do more here than just scratch the surface of the fascinating subject of creating motion with a computer. We hope that readers truly interested in the subject will continue their journey well beyond the material of this chapter.
我们在这里完全不涉及该领域的艺术方面。一般来说,我们在这里所做的只能是触及使用计算机创建动作这一迷人主题的表面。我们希望真正对这个主题感兴趣的读者能够继续阅读本章的内容。
In his seminal 1987 SIGGRAPH paper (Lasseter, 1987), John Lasseter brought key principles developed as early as the 1930’s by traditional animators of Walt Disney studios to the attention of the then-fledgling computer animation community. Twelve principles were mentioned: squash and stretch, timing, anticipation, follow through and overlapping action, slow-in and slow-out, staging, arcs, secondary action, straight-ahead and pose-to-pose action, exaggeration, solid drawing skill,and appeal. Almost two decades later, these time-tested rules, which can make a difference between a natural and entertaining animation and a mechanistic-looking and boring one, are as important as ever. For computer animation, in addition, it is very important to balance control and flexibility given to the animator with the full advantage of the computer’s abilities. Although these principles are widely known, many factors affect how much attention is being paid to these rules in practice. While a character animator working on a feature film might spend many hours trying to follow some of these suggestions (for example, tweaking his timing to be just right), many game designers tend to believe that their time is better spent elsewhere.
在他 1987 年发表的开创性 SIGGRAPH 论文(Lasseter, 1987)中,约翰·拉塞特 (John Lasseter) 向当时刚刚起步的计算机动画界介绍了早在 20 世纪 30 年代由沃尔特·迪斯尼工作室的传统动画师开发的关键原则。他提到了十二条原则:挤压和拉伸、时间安排、预期、跟进和重叠动作、慢进和慢出、分阶段、弧线、次要动作、直线和姿势间动作、夸张、扎实的绘图技巧和吸引力。近二十年后,这些久经考验的规则仍然像以往一样重要,它们可以使动画变得自然有趣,而不是机械无趣。此外,对于计算机动画而言,在赋予动画师的控制力和灵活性与充分利用计算机的能力之间取得平衡也非常重要。尽管这些原则广为人知,但许多因素会影响人们在实践中对这些规则的关注程度。虽然制作故事片的角色动画师可能会花费大量时间来尝试遵循其中的一些建议(例如,调整时间以使其恰到好处),但许多游戏设计师往往认为他们的时间最好花在其他地方。
Timing, or the speed of action, is at the heart of any animation. How fast things happen affects the meaning of action, emotional state, and even perceived weight of objects involved. Depending on its speed, the same action, a turn of a character’s head from left to right, can mean anything from a reaction to being hit by a heavy object to slowly seeking a book on a bookshelf or stretching a neck muscle. It is very important to set timing appropriate for the specific action at hand. Action should occupy enough time to be noticed while avoiding too slow and potentially boring motions. For computer animation projects involving recorded sound, the sound provides a natural timing anchor to be followed. In fact, in most productions, the actor’s voice is recorded first and the complete animation is then synchronized to this recording. Since large and heavy objects tend to move slower than small and light ones (with less acceleration, to be more precise), timing can be used to provide significant information about the weight of an object.
时间,即动作速度,是任何动画的核心。事情发生的速度会影响动作的意义、情绪状态,甚至影响所涉及物体的重量。根据速度的不同,同一个动作,角色的头部从左到右转动,可能意味着任何事情,从被重物击中后的反应,到慢慢地在书架上寻找一本书,或拉伸颈部肌肉。为手头的具体动作设置适当的时间非常重要。动作应该占用足够的时间才能被注意到,同时避免动作太慢和可能令人厌烦。对于涉及录音的计算机动画项目,声音提供了自然的时间锚点。事实上,在大多数作品中,演员的声音是先录制的,然后将整个动画与此录音同步。由于大而重的物体往往比小而轻的物体移动得慢(更准确地说,加速度较小),因此可以使用时间提供有关物体重量的重要信息。
At any moment during an animation, it should be clear to the viewer what idea (action, mood, expression) is being presented. Good staging, or high-level planning of the action, should lead a viewer’s eye to where the important action is currently concentrated, effectively telling him “look at this, and now, look at this” without using any words. Some familiarity with human perception can help us with this difficult task. Since human visual systems react mostly to relative changes rather than absolute values of stimuli, a sudden motion in a still environment or lack of motion in some part of a busy scene naturally draws attention. The same action presented so that the silhouette of the object is changing can often be much more noticeable compared with a frontal arrangement (see Figure 16.1 (bottom left)).
在动画的任何时刻,观看者都应该清楚地知道正在呈现什么想法(动作、情绪、表情)。好的舞台布置或高层次的动作规划应该引导观看者的眼睛到当前重要动作集中的地方,有效地告诉观看者“看这个,现在,看这个”,而无需使用任何语言。熟悉人类感知可以帮助我们完成这项艰巨的任务。由于人类视觉系统主要对刺激的相对变化而不是绝对值做出反应,因此静止环境中的突然运动或繁忙场景中某些部分的静止自然会引起注意。与正面布置相比,以物体轮廓变化的方式呈现的相同动作通常会更加引人注目(见图16.1 (左下))。
Figure 16.1. Action layout. Left: Staging action properly is crucial for bringing attention to currently important motion. The act of raising a hand would be prominent on the top but harder to notice on the bottom. A change in nose length, on the contrary, might be completely invisible in the first case. Note that this might be intentionally hidden, for example, to be suddenly revealed later. Neither arrangement is particularly good if both motions should be attended to. Middle: The amount of anticipation can tell much about the following action. The action which is about to follow (throwing a ball) is very short, but it is clear what is about to happen. The more wound up the character is, the faster the following action is perceived to be. Right: The follow-through phase is especially important for secondary appendages (hair) whose motion follows the leading part (head). The motion of the head is very simple, but leads to nontrivial follow-through behavior of the hair itself. It is impossible to create a natural animation without a follow-through phase and overlapping action in this case. Figure courtesy Peter Shirley and Christina Villarruel.
图 16.1。动作布局。左图:适当安排动作对于引起对当前重要动作的注意至关重要。举手的动作在顶部很突出,但在底部很难注意到。相反,鼻子长度的变化在第一种情况下可能完全不可见。请注意,这可能是故意隐藏的,例如,稍后突然显露出来。如果两个动作都需要注意,那么这两种安排都不是特别好。中图:预期量可以说明很多有关后续动作的信息。即将发生的动作(投球)非常短,但即将发生的事情很清楚。角色越紧张,后续动作就越快。右图:跟进阶段对于运动跟随前导部分(头部)的次要附属物(头发)尤其重要。头部的运动非常简单,但会导致头发本身的跟进行为不平凡。在这种情况下,如果没有跟进阶段和重叠动作,就不可能创建自然的动画。图片由 Peter Shirley 和 Christina Villarruel 提供。
On a slightly lower level, each action can be split into three parts: anticipation (preparation for the action), the action itself, and follow-through (termination of the action). In many cases, the action itself is the shortest part and, in some sense, the least interesting. For example, kicking a football might involve extensive preparation on the part of the kicker and long “visual tracking” of the departing ball with ample opportunities to show the stress of the moment, emotional state of the kicker, and even the reaction to the expected result of the action. The action itself (motion of the leg to kick the ball) is rather plain and takes just a fraction of a second in this case.
在稍低的层次上,每个动作可以分为三个部分:预期(动作的准备)、动作本身和后续动作(动作的结束)。在许多情况下,动作本身是最短的部分,从某种意义上说,也是最无趣的部分。例如,踢足球可能需要踢球者进行大量准备,并长时间“视觉跟踪”离去的球,有充足的机会展示当时的压力、踢球者的情绪状态,甚至对动作预期结果的反应。在这种情况下,动作本身(踢球的腿动作)相当简单,只需几分之一秒。
The goal of anticipation is to prepare the viewer for what is about to happen. This becomes especially important if the action itself is very fast, greatly important, or extremely difficult. Creating a more extensive anticipation for such actions serves to underscore these properties and, in case of fast events, makes sure the action will not be missed (see Figure 16.1 (bottom center)).
预期的目的是让观众为即将发生的事情做好准备。如果动作本身非常快、非常重要或极其困难,这一点就变得尤为重要。为这些动作创建更广泛的预期有助于强调这些特性,并且在快速事件的情况下,确保不会错过动作(参见图 16.1 (底部中心))。
In real life, the main action often causes one or more other overlapping actions. Different appendages or loose parts of the object typically drag behind the main leading section and keep moving for a while in the follow-through part of the main action as shown in Figure 16.1 (bottom right). Moreover, the next action often starts before the previous one is completely over. A player might start running while he is still tracking the ball he just kicked. Ignoring such natural flow is generally perceived as if there are pauses between actions and can result in robot-like mechanical motion. While overlapping is necessary to keep the motion natural, secondary action is often added by the animator to make motion more interesting and achieve realistic complexity of the animation. It is important not to allow secondary action to dominate the main action.
在现实生活中,主要动作经常会引起一个或多个其他动作重叠动作。物体的不同附属物或松散部分通常拖在主要引导部分后面,并在主要动作的后续部分继续移动一段时间,如图 16.1 (右下)所示。此外,下一个动作通常在前一个动作完全结束之前开始。球员可能会在追踪刚踢出的球时开始奔跑。忽略这种自然的流畅性通常会被认为动作之间存在停顿,并可能导致类似机器人的机械运动。虽然重叠对于保持动作自然是必要的,但动画师通常会添加次要动作,以使动作更有趣并实现动画的真实复杂性。重要的是不要让次要动作主导主要动作。
Several specific techniques can be used to make motion look more natural. The most important one is probably squash and stretch which suggests to change the shape of a moving object in a particular way as it moves. One would generally stretch an object in the direction of motion and squash it when a force is applied to it, as demonstrated in Figure 16.2 for a classic animation of a bouncing ball. It is important to preserve the total volume as this happens to avoid the illusion of growing or shrinking of the object. The greater the speed of motion (or the force), the more stretching (or squashing) is applied. Such deformations are used for several reasons. For very fast motion, an object can move between two sequential frames so quickly that there is no overlap between the object at the time of the current frame and at the time of the previous frame which can lead to strobing (a variant of aliasing). Having the object elongated in the direction of motion can ensure better overlap and helps the eye to fight this unpleasant effect. Stretching/squashing can also be used to show flexibility of the object with more deformation applied for more pliable materials. If the object is intended to appear as rigid, its shape is purposefully left the same when it moves.
可以使用几种特定的技术使运动看起来更自然。最重要的一种可能是挤压和拉伸,即在移动物体移动时以特定方式改变其形状。人们通常会在运动方向上拉伸物体,并在对其施加力时将其挤压,如图16.2中所示的经典弹跳球动画。在发生这种情况时保持总体积很重要,以避免物体变大或缩小的错觉。运动速度(或力)越大,施加的拉伸(或挤压)就越多。使用这种变形有几个原因。对于非常快的运动,物体可以在两个连续帧之间移动得如此之快,以至于当前帧和前一帧之间的物体之间没有重叠,这会导致频闪(混叠的一种变体)。将物体沿运动方向拉长可以确保更好的重叠,并有助于眼睛抵抗这种不愉快的效果。拉伸/挤压也可用于显示物体的柔韧性,对于柔韧性更强的材料,变形幅度更大。如果物体想要显得刚性,则在移动时故意保持其形状不变。
Figure 16.2. Classic example of applying the squash and stretch principle. Note that the volume of the bouncing ball should remain roughly the same throughout the animation.
图 16.2。应用挤压和拉伸原理的经典示例。请注意,弹跳球的体积在整个动画过程中应保持大致相同。
Natural motion rarely happens along straight lines, so this should generally be avoided in animation and arcs should be used instead. Similarly, no real-world motion can instantly change its speed–this would require an infinite amount of force to be applied to an object. It is desirable to avoid such situations in animation as well. In particular, the motion should start and end gradually (slow in and out). While hand-drawn animation is sometimes done via straight-ahead action with an animator starting at the first frame and drawing one frame after another in sequence until the end, pose-to-pose action, also known as keyframing,is much more suitable for computer animation. In this technique, animation is carefully planned through a series of relatively sparsely spaced key frames with the rest of the animation (in-between frames) filled in only after the keys are set (Figure 16.3). This allows more precise timing and allows the computer to take over the most tedious part of the process–the creation of the in-between frames–using algorithms presented in the next section.
自然运动很少沿直线发生,因此动画中通常应避免直线运动,而应使用弧线。同样,现实世界中的运动不可能立即改变其速度——这需要对物体施加无限大的力。动画中也应避免这种情况。特别是,运动应该逐渐开始和结束(慢进慢出)。虽然手绘动画有时是通过直线动作完成的,动画师从第一帧开始,按顺序一帧一帧地绘制,直到结束,但姿势到姿势的动作,也称为关键帧更适合计算机动画。在这种技术中,动画是通过一系列相对稀疏的关键帧精心规划的,动画的其余部分(中间帧)仅在设置关键帧后才填充(图 16.3 )。这允许更精确的计时,并允许计算机使用下一节中介绍的算法接管过程中最繁琐的部分——创建中间帧。
Figure 16.3. Keyframing (top) encourages detailed action planning while straight-ahead action (bottom) leads to a more spontaneous result.
图 16.3。关键帧(顶部)鼓励详细的行动规划,而直接行动(底部)则可产生更自发的结果。
Almost any of the techniques outlined above can be used with some reasonable amount of exaggeration to achieve greater artistic effect or underscore some specific property of an action or a character. The ultimate goal is to achieve something the audience will want to see, something which is appealing. Extreme complexity or too much symmetry in a character or action tends to be less appealing. To create good results, a traditional animator needs solid drawing skills.Analogously, a computer animator should certainly understand computer graphics and have a solid knowledge of the tools he uses.
几乎上述任何一种技巧都可以在适当夸张的情况下使用,以实现更好的艺术效果或强调动作或角色的某些特定属性。最终目标是实现观众想要看到的东西,吸引人的东西。角色或动作的极端复杂性或过多的对称性往往不太吸引人。要创造良好的效果,传统动画师需要扎实的绘画技巧。类似地,计算机动画师当然应该了解计算机图形学并对其使用的工具有扎实的了解。
In traditional animation, the animator has complete control over all aspects of the production process and nothing prevents the final product to be as it was planned in every detail. The price paid for this flexibility is that every frame is created by hand, leading to an extremely time- and labor-consuming enterprise. In computer animation, there is a clear tradeoff between, on the one hand, giving an animator more direct control over the result, but asking him to contribute more work and, on the other hand, relying on more automatic techniques which might require setting just a few input parameters but offer little or no control over some of the properties of the result. A good algorithm should provide sufficient flexibility while asking an animator only the information which is intuitive, easy to provide, and which he himself feels is necessary for achieving the desired effect. While perfect compliance with this requirement is unlikely in practice since it would probably take something close to a mind-reading machine, we do encourage the reader to evaluate any computer-animation technique from the point of view of providing such balance.
在传统动画中,动画师可以完全控制制作过程的各个方面,没有什么可以阻止最终产品在每个细节上都按照计划进行。这种灵活性的代价是每一帧都是手工创建的,这导致制作过程极其耗时耗力。在计算机动画中,一方面,让动画师更直接地控制结果,但要求他付出更多的工作;另一方面,依靠更自动化的技术,这些技术可能只需要设置几个输入参数,但对结果的某些属性几乎没有控制权。一个好的算法应该提供足够的灵活性,同时只要求动画师提供直观、易于提供的信息,以及他自己认为实现预期效果所必需的信息。虽然在实践中不可能完全满足这一要求,因为这可能需要一些类似于读心术的东西,但我们确实鼓励读者从提供这种平衡的角度来评估任何计算机动画技术。
The term keyframing can be misleading when applied to 3D computer animation since no actual completed frames (i.e., images) are typically involved. At any Key frames (created first) given moment, a 3D scene being animated is specified by a set of numbers: the positions of centers of all objects, their RGB colors, the amount of scaling applied to each object in each axis, modeling transformations between different parts of a complex object, camera position and orientation, light sources intensity, etc. To animate a scene, some subset of these values have to change with time. One can, of course, directly set these values at every frame, but this will not be particularly efficient. Short of that, some number of important moments in time (key frames tk) can be chosen along the timeline of animation for each of the parameters and values of this parameter (key values fk) are set only for these selected frames. We will call a combination (tk, fk) of keyframe andkey valuesimplya key. Key frames do not have to be the same for different parameters, but it is often logical to set keys at least for some of them simultaneously. For example, key frames chosen for x-, y-and z-coordinates of a specific object might be set at exactly the same frames forming a single position vector key (tk, pk).These key frames, however, might be completely different from those chosen for the object’s orientation or color. The closer key frames are to each other, the more control the animator has over the result; however the cost of doing more work of setting the keys has to be assessed. It is, therefore, typical to have large spacing between keys in parts of the animation which are relatively simple, concentrating them in intervals where complex action occurs, as shown in Figure 16.4.
关键帧这一术语在应用于 3D 计算机动画时可能会产生误导,因为通常不涉及任何实际完成的帧(即图像)。在任何给定时刻的关键帧(首先创建),正在动画化的 3D 场景由一组数字指定:所有对象的中心位置、它们的 RGB 颜色、在每个轴上应用于每个对象的缩放量、复杂对象的不同部分之间的建模变换、相机位置和方向、光源强度等。要为场景制作动画,这些值中的一些子集必须随时间而变化。当然,我们可以直接在每一帧设置这些值,但这不是特别有效。除此之外,可以沿着动画时间轴为每个参数选择一些重要的时间时刻(关键帧t k ),并且仅为这些选定的帧设置此参数的值(关键值f k )。我们将关键帧和关键值的组合( t k , f k )简称为关键。不同参数的关键帧不必相同,但同时为其中至少一些参数设置关键帧通常是合乎逻辑的。例如,为特定对象的x 、 y和z坐标选择的关键帧可能设置在完全相同的帧上,形成单个位置向量关键帧 ( t k , p k )。然而,这些关键帧可能与为对象的方向或颜色选择的关键帧完全不同。 关键帧之间的距离越近,动画师对结果的控制力就越强;但是,必须评估设置关键帧的更多工作量的成本。因此,通常在动画相对简单的部分中,关键帧之间的间距较大,而将关键帧集中在发生复杂动作的间隔中,如图 16.4所示。
Figure 16.4. Different patterns of setting keys (black circles above) can be used simultaneously for the same scene. It is assumed that there are more frames before, as well as after, this portion.
图 16.4。可同时对同一场景使用不同的设置键模式(上方的黑色圆圈)。假设此部分之前和之后有更多帧。
Once the animator sets the key (tk, fk), the system has to compute values of f for all other frames. Although we are ultimately interested only in a discrete set of values, it is convenient to treat this as a classical interpolation problem which fits a continuous animation curve f(t) through a provided set of data points (Figure 16.5). Extensive discussion of curve-fitting algorithms can be found in Chapter 15, and we will not repeat it here. Since the animator initially provides only the keys and not the derivative (tangent), methods which compute all necessary information directly from keys are preferable for animation. The speed of parameter change along the curve is given by the derivative of the curve with respect to time df /dt. Therefore, to avoid sudden jumps in velocity, C1 continuity is typically necessary. A higher degree of continuity is typically not required from animation curves, since the second derivative, which corresponds to acceleration or applied force, can experience very sudden changes in real-world situations (ball hitting a solid wall), and higher derivatives do not directly correspond to any parameters of physical motion. These consideration make Catmull-Rom splines one of the best choices for initial animation curve creation.
一旦动画师设置了关键帧( tk , fk ),系统就必须计算其他所有帧的f值。尽管我们最终只对一组离散的值感兴趣,但将其视为经典的插值问题会很方便,即通过提供的一组数据点(图 16.5 )拟合连续动画曲线 f ( t )。第 15 章详细介绍了曲线拟合算法,我们在此不再赘述。由于动画师最初只提供关键帧而不是导数(切线),因此直接从关键帧计算所有必要信息的方法对于动画来说是更好的选择。参数沿曲线变化的速度由曲线对时间的导数df/dt给出。因此,为了避免速度突然跳跃,通常需要C 1连续性。动画曲线通常不需要更高程度的连续性,因为对应于加速度或施加力的二阶导数在真实世界中可能会经历非常突然的变化(球击中实心墙),而更高阶导数并不直接对应于任何物理运动参数。这些考虑使得 Catmull-Rom 样条线成为初始动画曲线创建的最佳选择之一。
Figure 16.5. A continuous curve f(t) is fit through the keys provided by the animator even though only values at frame positions are of interest. The derivative of this function gives the speed of parameter change and is at first determined automatically by the fitting procedure.
图 16.5.即使只对帧位置的值感兴趣,也可以通过动画器提供的键来拟合连续曲线f(t)。此函数的导数给出了参数变化的速度,并且首先由拟合程序自动确定。
Most animation systems give the animator the ability to perform interactive fine editing of this initial curve, including inserting more keys, adjusting existing keys, or modifying automatically computed tangents. Another useful technique which can help to tweak the shape of the curve is called TCB control (TCB stands for tension, continuity, and bias). The idea is to introduce three new parameters which can be used to modify the shape of the curve near a key through coordinated adjustment of incoming and outgoing tangents at this point. For keys uniformly spaced in time with distance Δt between them, the standard Catmull-Rom expression for incoming and outgoing tangents at an internal key (tk, fk) can be rewritten as
大多数动画系统都允许动画师对初始曲线进行交互式精细编辑,包括插入更多关键帧、调整现有关键帧或修改自动计算的切线。另一种有助于调整曲线形状的有用技术称为 TCB 控制(TCB 代表张力、连续性和偏差)。其理念是引入三个新参数,这些参数可用于通过协调调整此点的传入和传出切线来修改关键帧附近的曲线形状。对于在时间上均匀分布且彼此之间距离为 Δ t 的关键帧,传入的标准 Catmull-Rom 表达式电视我我n和外向电视我o你吨内部键 ( t k , f k ) 处的切线可以重写为
Modified tangents of a TCB spline are
TCB 样条的修改切线为
The tension parameter t controls the sharpness of the curve near the key by scaling both incoming and outgoing tangents. Larger tangents (lower tension) lead to a flatter curve shape near the key. Bias b allows the animator to selectively increase the weight of a key’s neighbors locally pulling the curve closer to a straight line connecting the key with its left (b near 1, “overshooting” the action) or right (bnear −1, “undershooting” the action) neighbors. A nonzero value of continuity cmakes incoming and outgoing tangents different allowing the animator to create kinks in the curve at the key value. Practically useful values of TCB parameters are typically confinedtothe interval [−1; 1] with defaults t = c = b = 0 corresponding to the original Catmull-Rom spline. Examples of possible curve shape adjustments are shown in Figure 16.6.
张力参数t通过缩放传入和传出切线来控制关键点附近曲线的锐度。切线越大(张力越小),关键点附近的曲线形状越平坦。偏差b允许动画师有选择地增加关键点邻居的权重,从而将曲线局部拉近到连接关键点与其左侧( b接近 1,“超过”动作)或右侧( b接近 -1,“未达到”动作)邻居的直线。非零的连续性c使传入和传出切线不同,从而允许动画师在关键值处创建曲线的扭结。实际有用的 TCB 参数值通常限制在区间 [-1; 1] 内,默认值t = c = b = 0,对应于原始 Catmull-Rom 样条线。图 16.6显示了可能的曲线形状调整示例。
Figure 16.6. Editing the default interpolating spline (middle column) using TCB controls. Note that all keys remain at the same positions.
图 16.6.使用 TCB 控件编辑默认插值样条线(中间一列)。请注意,所有键都保持在相同的位置。
So far, we have described how to control the shape of the animation curve through key positioning and fine tweaking of tangent values at the keys. This, however, is generally not sufficient when one would like to have control both over where the object is moving, i.e., its path, and how fast it moves along this path. Given a set of positions in space as keys, automatic curve-fitting techniques can fitacurve through them, but resulting motion is only constrained by forcing the object to arrive at a specified key position pk at the corresponding key frame tk, and nothing is directly said about the speed of motion between the keys. This can create problems. For example, if an object moves along the x-axis with velocity 11 meters per second for 1 second and then with 1 meter per second for 9 seconds, it will arrive at position x = 20 after 10 seconds thus satisfying animator’s keys (0,0) and (10, 20). It is rather unlikely that this jerky motion was actually desired, and uniform motion with speed 2 meters/second is probably closer to what the animator wanted when setting these keys. Although typically not displaying such extreme behavior, polynomial curves resulting from standard fitting procedures do exhibit nonuniform speed of motion between keys as demonstrated in Figure 16.7. While this can be tolerable (within limits) for some parameters for which the human visual system is not very good at determining nonuniformities in the rate of change (such as color or even rate of rotation), we have to do better for position p of the object where velocity directly corresponds to everyday experience.
到目前为止,我们已经描述了如何通过关键点定位和对关键点处的切线值进行微调来控制动画曲线的形状。然而,当人们想要控制物体移动的位置(即其路径)以及它沿此路径移动的速度时,这通常是不够的。给定一组空间位置作为关键点,自动曲线拟合技术可以通过它们进行曲线拟合,但最终的运动仅受到强制物体在相应关键帧t k处到达指定关键位置p k 的限制,并且没有直接说明关键点之间的运动速度。这可能会产生问题。例如,如果一个物体沿x轴以每秒 11 米的速度移动 1 秒,然后以每秒 1 米的速度移动 9 秒,它将在 10 秒后到达位置x = 20,从而满足动画师的关键点 (0,0) 和 (10, 20)。这种急促的运动不太可能是真正想要的,速度为 2 米/秒的匀速运动可能更接近动画师在设置这些关键点时想要的效果。虽然通常不会表现出这种极端行为,但标准拟合程序产生的多项式曲线确实表现出关键点之间运动速度不均匀的情况,如图 16.7所示。虽然对于某些参数(人类视觉系统不太擅长确定变化率的不均匀性,例如颜色甚至旋转速度),这种情况是可以容忍的(在一定范围内),但对于物体的位置p ,我们必须做得更好,因为速度直接对应于日常经验。
Figure 16.7. All three motions are along the same 2D path and satisfy the set of keys at the tips of the black triangles. The tips of the white triangles show object position at Δt = 1 intervals. Uniform speed of motion between the keys (top) might be closer to what the animator wanted, but automatic fitting procedures could result in either of the other two motions.
图 16.7。所有三个动作都沿着相同的 2D 路径,并满足黑色三角形尖端的一组键。白色三角形的尖端以 Δ t = 1 间隔显示对象位置。键之间的均匀运动速度(顶部)可能更接近动画师想要的速度,但自动拟合程序可能会导致其他两种运动中的任何一种。
We will first distinguish curve parameterization used during the fitting procedure from that used for animation. When a curve is fit through position keys, we will write the result as a function p(u) of some parameter u. This will describe the geometry of the curve in space. The arc length s is the physical length of the curve. A natural way for the animator to control the motion along the now-existing curve is to specify an extra function s(t) which corresponds to how far along the curve the object should be at any given time. To get an actual position in space, we need one more auxiliary function u(s) which computes a parameter value u for given arc length s. The complete process of computing an object position for a given time t is then given by composing these functions (see Figure 16.8):
我们首先将区分拟合过程中使用的曲线参数化和动画中使用的曲线参数化。当通过位置键拟合曲线时,我们将结果写为某个参数u的函数p ( u )。这将描述曲线在空间中的几何形状。弧长s是曲线的物理长度。动画师控制沿现有曲线运动的自然方法是指定一个额外的函数s ( t ),该函数对应于对象在任何给定时间沿曲线应该走多远。为了获得空间中的实际位置,我们需要一个辅助函数u ( s ),它可以计算给定弧长s的参数值u 。然后通过组合这些函数来给出计算给定时间t的对象位置的完整过程(见图16.8 ):
Figure 16.8. To get position in space at a given time t, one first utilizes user-specified motion control to obtain the distance along the curve s(t) and then computes the corresponding curve parameter value u(s(t)). Previously fitted curve P(u) can now be used to find the position P(u(s(t))).
图 16.8。要获取给定时间t时的空间位置,首先利用用户指定的运动控制来获取沿曲线s ( t ) 的距离,然后计算相应的曲线参数值u ( s ( t ))。现在可以使用先前拟合的曲线P ( u ) 来查找位置P ( u ( s ( t )))。
Several standard functions can be used as the distance-time function s(t). One of the simplest is the linear function corresponding to constant velocity: s(t) = vt with v = const. Another common example is the motion with constant acceleration a (and initial speed v0) which is described by the parabolic s(t) = v0t + at2/2. Since velocity is changing gradually here, this function can help to model desirable ease-in and ease-out behavior. More generally, the slope of s(t) gives the velocity of motion with negative slope corresponding to the motion backwards along the curve. To achieve most flexibility, the ability to interactively edit s(t) is typically provided to the animator by the animation system. The distance-time function is not the only way to control motion. In some cases it might be more convenient for the user to specify a velocity-time function v(t) or even an acceleration-time function a(t). Since these are correspondingly first and second derivatives of s(t), to use these type of controls, the system first recovers the distance-time function by integrating the user input (twice in the case of a(t)).
有几种标准函数可用作距离-时间函数s ( t )。最简单的函数之一是对应于恒定速度的线性函数: s ( t )= vt ,其中v = const。另一个常见的例子是具有恒定加速度a (和初始速度v 0 )的运动,可由 2/2 处的抛物线 s(t)=v0t +描述。由于速度在这里逐渐变化,因此此函数有助于模拟理想的缓入和缓出行为。更一般地, s ( t ) 的斜率给出运动速度,负斜率对应于沿曲线向后的运动。为了实现最大的灵活性,动画系统通常为动画师提供以交互方式编辑s ( t ) 的能力。距离-时间函数不是控制运动的唯一方法。在某些情况下,用户指定速度-时间函数v ( t ) 甚至加速度-时间函数a ( t ) 可能会更方便。由于这些分别是s ( t ) 的一阶和二阶导数,为了使用这些类型的控制,系统首先通过积分用户输入来恢复距离-时间函数(对于a ( t ) 来说为两次)。
The relationship between the curve parameter u and arc length s is established automatically by the system. In practice, the system first determines arc length dependance on parameter u (i.e., the inverse function s(u)). Using this function, for any given S it is possible to solve the equation s(u) − S = 0 with unknown uobtaining u(S). For most curves, the function s(u) cannot be expressed in closed analytic form and numerical integration is necessary (see Chapter 14). Standard numerical root-finding procedures (such as the Newton-Raphson method, for example) can then be directly used to solve the equation s(u) − S = 0 for u.
系统自动建立曲线参数u和弧长s之间的关系。实际上,系统首先确定弧长对参数u的依赖关系(即反函数s ( u ))。利用此函数,对于任何给定的S ,都可以解方程s ( u ) − S = 0 中未知数u得到u ( S )。对于大多数曲线,函数s ( u ) 不能用封闭的解析形式表示,需要进行数值积分(参见第 14 章)。然后可以直接使用标准数值求根程序(例如牛顿-拉夫逊法)解方程s ( u ) − S = 0 求u 。
An alternative technique is to approximate the curve itself as a set of linear segments between points pi computed at some set of sufficiently densely spaced parameter values ui. One then creates a table of approximate arc lengths
另一种方法是将曲线本身近似为点p之间的一组线性段,这些点 p 是在一组足够密集的参数值 u 处计算出来的。然后创建一个近似弧长表
Since s(u) is a non-decreasing function of u, one can then find the interval containing the value S by simple searching through the table (see Figure 16.9). Linear interpolation of the interval’s u end values is then performed to finally find u(S). If greater precision is necessary, a few steps of the Newton-Raphson algorithm with this value as the starting point can be applied.
由于s ( u ) 是u的非减函数,因此只需通过表格简单搜索即可找到包含值S的区间(见图16.9 )。然后对区间的u端值进行线性插值,最终找到 u(S)。如果需要更高的精度,可以以该值作为起点,应用几步牛顿-拉夫森算法。
Figure 16.9. To create a tabular version of s(u), the curve can be approximated by a number of line segments connecting points on the curve positioned at equal parameter increments. The table is searched to find the u-interval for a given S. For the curve above, for example, the value of u corresponding to the position of S = 6.5 lies between u = 0.6 and u = 0.8.
图 16.9。要创建s ( u ) 的表格版本,可以通过连接曲线上位于相等参数增量处的点的多个线段来近似曲线。搜索表格以找到给定S的u间隔。例如,对于上面的曲线,与S = 6.5 的位置相对应的u值位于u = 0.6 和u = 0.8 之间。
The techniques presented above can be used to interpolate the keys set for most of the parameters describing the scene. Three-dimensional rotation is one important motion for which more specialized interpolation methods and representations are common. The reason for this is that applying standard techniques to 3D rotations often leads to serious practical problems. Rotation (a change in orientation of an object) is the only motion other than translation which leaves the shape of the object intact. It therefore plays a special role in animating rigid objects.
上面介绍的技术可用于对描述场景的大多数参数的键集进行插值。三维旋转是一种重要的运动,对此更专业的插值方法和表示很常见。原因是将标准技术应用于 3D 旋转通常会导致严重的实际问题。旋转(物体方向的改变)是除平移之外唯一保持物体形状不变的运动。因此,它在为刚性物体制作动画时起着特殊的作用。
There are several ways to specify the orientation of an object. First, transformation matrices as described in Chapter 6 can be used. Unfortunately, naive (element-by-element) interpolation of rotation matrices does not produce a correct result. For example, the matrix “halfway” between 2D clock- and counterclockwise 90 degree rotation is the null matrix:
有几种方法可以指定对象的方向。首先,可以使用第 6 章中描述的变换矩阵。不幸的是,旋转矩阵的简单(逐个元素)插值不会产生正确的结果。例如,二维顺时针和逆时针 90 度旋转“中间”的矩阵是零矩阵:
The correct result is, of course, the unit matrix corresponding to no rotation. Second, one can specify arbitrary orientation as a sequence of exactly three rotations around coordinate axes chosen in some specific order. These axes can be fixed in space (fixed-angle representation) or embedded into the object therefore changing after each rotation (Euler-angle representation as shown in Figure 16.10). These three angles of rotation can be animated directly through standard keyframing, but a subtle problem known as gimbal lock arises. Gimbal lock occurs if during rotation one of the three rotation axes is by accident aligned with another, thereby reducing by one the number of available degrees of freedom as shown in Figure 16.11 for a physical device. This effect is more common than one might think–a single 90 degree turn to the right (or left) can potentially put an object into a gimbal lock. Finally, any orientation can be specified by choosing an appropriate axis in space and angle of rotation around this axis. While animating in this representation is relatively straightforward, combining two rotations, i.e., finding the axis and angle corresponding to a sequence of two rotations both represented by axis and angle, is nontrivial. A special mathematical apparatus, quaternionshas been developed to make this representation suitable both for combining several rotations into a single one and for animation.
当然,正确的结果是对应于无旋转的单位矩阵。其次,可以将任意方向指定为围绕以特定顺序选择的坐标轴的三个旋转序列。这些轴可以在空间中固定(固定角度表示)或嵌入到对象中,因此每次旋转后都会发生变化(欧拉角表示,如图 16.10所示)。这三个旋转角度可以直接通过标准关键帧进行动画处理,但会出现一个称为万向节锁的微妙问题。如果在旋转过程中三个旋转轴中的一个意外与另一个旋转轴对齐,就会发生万向节锁,从而将可用的自由度数减少一个,如图 16.11所示的物理设备。这种效果比人们想象的更常见——向右(或向左)旋转 90 度就可能使物体陷入万向节锁。最后,可以通过选择空间中的适当轴和绕该轴的旋转角度来指定任何方向。虽然用这种表示法制作动画相对简单,但组合两个旋转(即找到由轴和角度表示的两个旋转序列对应的轴和角度)并非易事。一种特殊的数学装置,四元数的开发使得这种表示既适合将多个旋转组合成一个旋转,也适合用于动画。
Figure 16.10. Three Euler angles can be used to specify arbitrary object orientation through a sequence of three rotations around coordinate axes embedded into the object (axis Y always points to the tip of the cone). Note that each rotation is given in a new coordinate system. Fixed angle representation is very similar, but the coordinate axes it uses are fixed in space and do not rotate with the object.
图 16.10。三个欧拉角可用于通过围绕嵌入物体的坐标轴(Y 轴始终指向圆锥的尖端)进行一系列旋转来指定任意物体方向。请注意,每次旋转都在一个新的坐标系中给出。固定角度表示非常相似,但它使用的坐标轴在空间中是固定的,不会随物体旋转。
Figure 16.11. In this example, gimbal lock occurs when a 90 degree turn around axis Z is made. Both X and Y rotations are now performed around the same axis leading to the loss of one degree of freedom.
图 16.11。在此示例中,当绕 Z 轴旋转 90 度时,就会发生万向节锁定。X 和 Y 旋转现在都绕同一轴进行,导致失去一个自由度。
Given a 3D vector v = (x, y, z) and a scalar s, a quaternion q is formed by combining the two into a four-component object: q = [s x y z] = [s; v]. Several new operations are then defined for quaternions. Quaternion addition simply sums scalar and vector parts separately:
给定一个三维向量v = ( x, y, z ) 和一个标量s ,将两者组合成一个四分量对象,即可得到四元数q : q = [ sxyz ] = [ s ; v ] 。然后为四元数定义了几个新运算。四元数加法只是分别对标量和向量部分求和:
Multiplication by a scalar a gives a new quaternion
与标量a相乘得到一个新的四元数
More complex quaternion multiplication is defined as
更复杂的四元数乘法定义为
where × denotes a vector cross product. It is easy to see that, similar to matrices, quaternion multiplication is associative, but not commutative. We will be interested mostly in normalized quaternions–those for which the quaternion norm is equal to one. One final definition we need is that of an inverse quaternion:
其中 × 表示向量叉积。很容易看出,与矩阵类似,四元数乘法是结合的,但不是交换的。我们主要对规范化四元数感兴趣——四元数范数|问| = s 2 +五2等于一。我们需要的最后一个定义是逆四元数:
To represent a rotation by angle ϕ around an axis passing through the origin whose direction is given by the normalized vector n, a normalized quaternion
为了表示绕通过原点的轴旋转角度ϕ ,其方向由归一化向量n给出,需要使用归一化四元数
is formed. To rotate point p, one turns it into the quaternion qp = [0; p] and computes the quaternion product
形成。要旋转点p ,可以将其变成四元数q p = [0; p ] 并计算四元数乘积
which is guaranteed to have a zero scalar part and the rotated point as its vector part. Composite rotation is given simply by the product of quaternions representing each of the separate rotation steps. To animate with quaternions, one can treat them as points in a four-dimensional space and set keys directly in this space. To keep quaternions normalized, one should, strictly speaking, restrict interpolation procedures to a unit sphere (a 3D object) in this 4D space. However, a spherical version of even linear interpolation (often called slerp) already results in rather unpleasant math. Simple 4D linear interpolation followed by projection onto the unit sphere shown in Figure 16.12 is much simpler and often sufficient in practice. Smoother results can be obtained via repeated application of a linear interpolation procedure using the de Casteljau algorithm.
保证标量部分为零,旋转点为矢量部分。复合旋转由表示每个单独旋转步骤的四元数的乘积给出。要使用四元数制作动画,可以将它们视为四维空间中的点,并直接在此空间中设置键。要使四元数保持规范化,严格来说,应该将插值过程限制在此 4D 空间中的单位球面(3D 对象)。但是,均匀线性插值的球面版本(通常称为slerp )已经产生了相当令人不快的数学运算。图 16.12所示的简单 4D 线性插值,然后投影到单位球体上,这要简单得多,而且在实践中通常就足够了。通过使用 de Casteljau 算法重复应用线性插值过程,可以获得更平滑的结果。
Figure 16.12. Interpolating quaternions should be done on the surface of a 3D unit sphere embedded in 4D space. However, much simpler interpolation along a 4D straight line (open circles) followed by re-projection of the results onto the sphere (black circles) is often sufficient.
图 16.12。四元数插值应在嵌入四维空间的三维单位球体表面上进行。然而,沿四维直线(空心圆)进行更简单的插值,然后将结果重新投影到球体上(黑色圆圈)通常就足够了。
Taper Twist Although techniques for object deformation might be more properly treated as modeling tools, they are traditionally discussed together with animation methods. Probably the simplest example of an operation which changes object shape is a nonuniform scaling. More generally, some function can be applied to local coordinates of all points specifying the object (i.e., vertices of a triangular mesh or control polygon of a spline surface), repositioning these points and creating a new shape: p' = f (p, γ) where γ is a vector of parameters used by the deformation function. Choosing different f (and combining them by applying one after another) can help to create very interesting deformations. Examples of useful simple functions include bend, twist, and taper which are shown in Figure 16.13. Animating shape change is very easy in this case by keyframing the parameters of the deformation function. Disadvantages of this technique include difficulty of choosing the mathematical function for some nonstandard deformations and the fact that the resulting deformation is global in the sense that the complete object, and not just some part of it, is reshaped.
锥化扭曲 尽管对象变形技术可能更适合作为建模工具来处理,但它们传统上与动画方法一起讨论。改变对象形状的操作最简单的例子可能是非均匀缩放。更一般地,可以将某个函数应用于指定对象的所有点的局部坐标(即三角网格的顶点或样条曲面的控制多边形),重新定位这些点并创建新形状: p = f ( p , γ),其中 γ 是变形函数使用的参数向量。选择不同的f (并通过一个接一个地应用它们来组合它们)有助于创建非常有趣的变形。有用的简单函数示例包括弯曲、扭曲和锥化,如图 16.13所示。在这种情况下,通过对变形函数的参数进行关键帧处理,可以非常轻松地为形状变化制作动画。这种技术的缺点包括难以为某些非标准变形选择数学函数,并且所产生的变形是全局的,因为整个对象(而不仅仅是其中的一部分)被重塑。
Figure 16.13. Popular examples of global deformations. Bending and twist angles, as well as the degree of taper, can all be animated to achieve dynamic shape change.
图 16.13。整体变形的常见示例。弯曲和扭转角度以及锥度都可以制作成动画,以实现动态形状变化。
To deform an object locally while providing more direct control over the result, one can choose a single vertex, move it to a new location and adjust vertices within some neighborhood to follow the seed vertex. The area affected by the deformation and the specific amount of displacement in different parts of the object are controlled by an attenuation function which decreases with distance (typically computed over the object’s surface) to the seed vertex. Seed vertex motion can be keyframed to produce animated shape change.
为了在局部变形物体的同时更直接地控制结果,可以选择单个顶点,将其移动到新位置,并调整某个邻域内的顶点以跟随种子顶点。变形影响的区域和物体不同部分的特定位移量由衰减函数控制,该函数随着与种子顶点的距离(通常在物体表面上计算)而减小。种子顶点运动可以设置为关键帧以产生动画形状变化。
A more general deformation technique is called free-form deformation (FFD) (Sederberg & Parry, 1986). A local (in most cases rectilinear) coordinate grid is first established to encapsulate the part of the object to be deformed, and coordinates (s, t, u) of all relevant points are computed with respect to this grid. The user then freely reshapes the grid of lattice points Pijk into a new distorted lattice (Figure 16.14). The object is reconstructed using coordinates computed in the original undistorted grid in the trivariate analog of Bézier interpolants (see Chapter 15) with distorted lattice points serving as control points in this expression:
一种更通用的变形技术称为自由变形 (FFD) (Sederberg & Parry, 1986)。首先建立一个局部(大多数情况下是直线)坐标网格来封装要变形的物体部分,并计算所有相关点相对于该网格的坐标 ( s, t, u )。然后,用户可以自由地将格点P ijk的网格重塑为新的扭曲格子磷伊克′ (图 16.14 )。使用在原始未失真网格中计算出的坐标,在具有失真格点的贝塞尔插值的三变量模拟中(参见第 15 章)重建对象磷伊克′充当此表达式中的控制点:
Figure 16.14. Adjusting the FFD lattice results in the deformation of the object.
图 16.14.调整 FFD 晶格会导致物体变形。
where L, M, N are maximum indices of lattice points in each dimension. In effect, the lattice serves as a low-resolution version of the object for the purpose of deformation, allowing for a smooth shape change of an arbitrarily complex object through a relatively small number of intuitive adjustments. FFD lattices can themselves be treated as regular objects by the system and can be transformed, animated, and even further deformed if necessary, leading to corresponding changes in the object to which the lattice is attached. For example, moving a deformation tool consisting of the original lattice and distorted lattice representing a bulge across an object results in a bulge moving across the object.
其中L、M、N是每个维度上晶格点的最大索引。实际上,晶格是用于变形的物体的低分辨率版本,允许通过相对较少的直观调整平滑地改变任意复杂物体的形状。FFD 晶格本身可以被系统视为常规物体,并且可以进行变换、动画,甚至在必要时进一步变形,从而导致晶格所附着的物体发生相应的变化。例如,移动由原始晶格和表示物体凸起的扭曲晶格组成的变形工具会导致凸起在物体上移动。
Animation of articulated figures is most often performed through a combination of keyframing and specialized deformation techniques. The character model intended for animation typically consists of at least two main layers as shown in Figure 16.15. The motion of a highly detailed surface representing the outer shell or skin of the character is what the viewer will eventually see in the final product. The skeleton underneath it is a hierarchical structure (a tree) of joints which provides a kinematic model of the figure and is used exclusively for animation. In some cases, additional intermediate layer(s) roughly corresponding to muscles are inserted between the skeleton and the skin.
关节运动人物的动画通常是通过关键帧和专门的变形技术的组合来实现的。用于动画的角色模型通常至少包含两个主要层,如图 16.15所示。代表角色外壳或皮肤的高度详细表面的运动是观看者最终将在最终产品中看到的。其下方的骨架是关节的层次结构(树),它提供了人物的运动模型,专门用于动画。在某些情况下,在骨架和皮肤之间插入大致对应于肌肉的额外中间层。
Figure 16.15. (Left) A hierarchy of joints, a skeleton, serves as a kinematic abstraction of the character; (middle) repositioning the skeleton deforms a separate skin object attached to it; (right) a tree data structure is used to represent the skeleton. For compactness, the internal structure of several nodes is hidden (they are identical to a corresponding sibling).
图 16.15。 (左)关节层次结构(即骨架)充当角色的运动抽象;(中)重新定位骨架会使附在其上的单独皮肤对象变形;(右)使用树形数据结构来表示骨架。为了紧凑,隐藏了几个节点的内部结构(它们与相应的兄弟节点相同)。
Each of the skeleton’s joints acts as a parent for the hierarchy below it. The root represents the whole character and is positioned directly in the world coordinate system. If a local transformation matrix which relates a joint to its parent in the hierarchy is available, one can obtain a transformation which relates local space of any joint to the world system (i.e., the system of the root) by simply concatenating transformations along the path from the root to the joint. To evaluate the whole skeleton (i.e., find position and orientation of all joints), a depth-first traversal of the complete tree of joints is performed. A transformation stack is a natural data structure to help with this task. While traversing down the tree, the current composite matrix is pushed on the stack and a new one is created by multiplying the current matrix with the one stored at the joint. When backtracking to the parent, this extra transformation should be undone before another branch is visited; this is easily done by simply popping the stack. Although this general and simple technique for evaluating hierarchies is used throughout computer graphics, in animation (and robotics) it is given a special name–forward kinematics (FK). While general representations for all transformations can be used, it is common to use specialized sets of parameters, such as link lengths or joint angles, to specify skeletons. To animate with forward kinematics, rotational parameters of all joints are manipulated directly. The technique also allows the animator to change the distance between joints (link lengths), but one should be aware that this corresponds to limb stretching and can often look rather unnatural.
骨架的每个关节都充当其下层层次结构的父级。根代表整个角色,并直接定位在世界坐标系中。如果存在将关节与其在层次结构中的父级相关联的局部变换矩阵,则只需沿着从根到关节的路径连接变换,即可获得将任何关节的局部空间与世界系统(即根的系统)相关联的变换。要评估整个骨架(即找到所有关节的位置和方向),需要对整个关节树进行深度优先遍历。变换堆栈是一种自然的数据结构,可帮助完成此任务。在遍历树时,当前复合矩阵被推送到堆栈上,并通过将当前矩阵与存储在关节处的矩阵相乘来创建一个新矩阵。当回溯到父级时,应在访问另一个分支之前撤消此额外变换;这可以通过简单地弹出堆栈轻松完成。虽然这种评估层次结构的通用而简单的技术在整个计算机图形学中都有使用,但在动画(和机器人技术)中,它被赋予了一个特殊的名称——向前运动学(FK)。虽然可以使用所有变换的通用表示,但通常使用专门的参数集(例如链接长度或关节角度)来指定骨架。要使用正向运动学制作动画,需要直接操纵所有关节的旋转参数。该技术还允许动画师改变关节之间的距离(链接长度),但应该注意,这对应于肢体拉伸,并且通常看起来相当不自然。
Forward kinematics requires the user to set parameters for all joints involved in the motion (Figure 16.16 (top)). Most of these joints, however, belong to internal nodes of the hierarchy, and their motion is typically not something the animator wants to worry about. In most situations, the animator just wants them to move naturally “on their own,” and one is much more interested in specifying the behavior of the endpoint of a joint chain, which typically corresponds to something performing a specific action, such as an ankle or a tip of a finger. The animator would rather have parameters of all internal joints be determined from the motion of the end effector automatically by the system. Inverse kinematics(IK) allows us to do just that (see Figure 16.16 (bottom)).
正向运动学要求用户为参与运动的所有关节设置参数(图 16.16 (顶部))。然而,这些关节中的大多数都属于层次结构的内部节点,动画师通常不想担心它们的运动。在大多数情况下,动画师只是希望它们“自行”自然地移动,而动画师更感兴趣的是指定关节链端点的行为,这通常对应于执行特定动作的某个物体,例如脚踝或指尖。动画师更希望系统根据末端执行器的运动自动确定所有内部关节的参数。逆运动学(IK) 使我们能够做到这一点(见图16.16 (底部))。
Figure 16.16. Forward kinematics (top) requires the animator to put all joints into correct position. In inverse kinematic (bottom), parameters of some internal joints are computed based on desired end effector motion.
图 16.16。正向运动学(顶部)要求动画师将所有关节置于正确位置。在逆向运动学(底部)中,一些内部关节的参数是根据所需的末端执行器运动计算的。
Let x be the position of the end effector and α be the vector of parameters needed to specify all internal joints along the chain from the root to the final joint. Sometimes the orientation of the final joint is also directly set by the animator, in which case we assume that the corresponding variables are included in the vector x. For simplicity, however, we will write all specific expressions for the vector:
令x为末端执行器的位置, α为指定从根关节到最终关节的链条上所有内部关节所需的参数向量。有时最终关节的方向也由动画师直接设置,在这种情况下,我们假设相应的变量包含在向量x中。但为了简单起见,我们将为向量写出所有具体表达式:
Since each of the variables in x is a function of α, it can be written as a vector equation x = F(α). If we change the internal joint parameters by a small amount δα, a resulting change δx in the position of the end effector can be approximately written as
由于x中的每个变量都是α的函数,因此可以将其写成矢量方程x = F ( α )。如果我们将内部关节参数改变少量δα ,则末端执行器位置的变化δ x可以近似写为
where is the matrix of partial derivatives called the Jacobian:
在哪里∂ F ∂ α是偏导数矩阵,称为雅可比矩阵:
At each moment in time, we know the desired position of the end effector (set by the animator) and, of course, the effector’s current position. Subtracting the two, we will get the desired adjustment δx. Elements of the Jacobian matrix are related to changes in a coordinate of the end effector when a particular internal parameter is changed while others remain fixed (see Figure 16.17). These elements can be computed for any given skeleton configuration using geometric relationships. The only remaining unknowns in the system of equations (16.1) are the changes in internal parameters α. Once we solve for them, we update α = α+δα which gives all the necessary information for the FK procedure to reposition the skeleton.
在每个时间点,我们都知道末端执行器的期望位置(由动画师设置),当然还有执行器的当前位置。将两者相减,我们将得到期望的调整值δ x 。雅可比矩阵的元素与当特定内部参数改变而其他参数保持不变时末端执行器坐标的变化有关(参见图 16.17 )。可以使用几何关系为任何给定的骨架配置计算这些元素。方程组 (16.1) 中唯一剩下的未知数是内部参数α的变化。一旦我们解出它们,我们就会更新α = α + δα ,这将提供 FK 程序重新定位骨架所需的所有信息。
Figure 16.17. Partial derivative ∂x/∂αknee is given by the limit of Δx/Δαknee. Effector displacement is computed while all joints, except the knee, are kept fixed.
图 16.17。偏导数∂x/∂α膝关节由 Δ x /Δ α膝关节的极限给出。计算效应器位移时,除膝关节外的所有关节均保持固定。
Unfortunately, the system (16.1) cannot usually be solved analytically and, moreover, it is in most cases underconstrained, i.e., the number of unknown internal joint parameters α exceeds the number of variables in vector x. This means that different motions of the skeleton can result in the same motion of the end effector. Some examples are shown on Figure 16.18. Many ways of obtaining specific solution for such systems are available, including those taking into account natural constraints needed for some real-life joints (bending a knee only in one direction, for example). One should also remember that the computed Jacobian matrix is valid only for one specificconfiguration, and it has to be updated as the skeleton moves. The complete IK framework is presented in Figure 16.19. Of course, the root joint for IK does not have to be the root of the whole hierarchy, and multiple IK solvers can be applied to independent parts of the skeleton. For example, one can use separate solvers for right and left feet and yet another one to help animate grasping with the right hand, each with its own root.
遗憾的是,系统 (16.1) 通常无法通过分析求解,而且在大多数情况下是欠约束的,即未知的内部关节参数α的数量超过了向量x中的变量数量。这意味着骨架的不同运动可能导致末端执行器的相同运动。图 16.18显示了一些示例。有许多方法可以获得此类系统的具体解决方案,包括考虑到某些现实生活中关节所需的自然约束(例如,膝盖只能朝一个方向弯曲)。还应记住,计算出的雅可比矩阵仅对一种特定配置有效,并且必须在骨架移动时进行更新。完整的 IK 框架如图 16.19所示。当然,IK 的根关节不必是整个层次结构的根,并且可以将多个 IK 解算器应用于骨架的独立部分。例如,人们可以对右脚和左脚使用单独的解算器,并使用另一个解算器来帮助制作右手抓握的动画,每个解算器都有自己的根。
Figure 16.18. Multiple configurations of internal joints can result in the same effector position. (Top) disjoint “flipped” solutions; (bottom) a continuum of solutions.
图 16.18。内部关节的多种配置可导致相同的效应器位置。(顶部)不相交的“翻转”解决方案;(底部)解决方案的连续体。
Figure 16.19. A diagram of the inverse kinematic algorithm.
图 16.19.逆运动学算法图。
A combination of FK and IK approaches is typically used to animate the skeleton. Many common motions (walking or running cycles, grasping, reaching, etc.) exhibit well-known patterns of mutual joint motion making it possible to quickly create naturally looking motion or even use a library of such “clips.” The animator then adjusts this generic result according to the physical parameters of the character and also to give it more individuality.
通常使用 FK 和 IK 方法的组合来制作骨骼动画。许多常见动作(行走或跑步循环、抓握、伸手等)都表现出众所周知的相互关节运动模式,这使得快速创建自然动作甚至使用此类“剪辑”库成为可能。然后,动画师根据角色的物理参数调整此通用结果,并赋予其更多个性。
When a skeleton changes its position, it acts as a special type of deformer applied to the skin of the character. The motion is transferred to this surface by assigning each skin vertex one (rigid skinning)or more(smooth skinning) joints as drivers (see Figure 16.20). In the first case, a skin vertex is simply frozen into the local space of the corresponding joint, which can be the one nearest in space or one chosen directly by the user. The vertex then repeats whatever motion this joint experiences, and its position in world coordinates is determined by standard FK procedure. Although it is simple, rigid skinning makes it difficult to obtain sufficiently smooth skin deformation in areas near the joints or also for more subtle effects resembling breathing or muscle action. Additional specialized deformers called flexors can be used for this purpose. In smooth skinning, several joints can influence a skin vertex according to some weight assigned by the animator, providing more detailed control over the results. Displacement vectors, di, suggested by different joints affecting a given skin vertex (each again computed with standard FK) are averaged according to their weights wi to compute the final displacement of the vertex d = Σ widi. Normalized weights (Σ wi = 1) are the most common but not fundamentally necessary. Setting smooth skinning weights to achieve the desired effect is not easy and requires significant skill from the animator.
当骨架改变其位置时,它充当一种应用于角色皮肤的特殊变形器。通过为每个皮肤顶点分配一个(刚性蒙皮)或多个(平滑蒙皮)关节作为驱动器,将运动转移到该表面(见图16.20 )。在第一种情况下,皮肤顶点只是冻结在相应关节的局部空间中,该关节可以是空间中最近的关节,也可以是用户直接选择的关节。然后顶点重复此关节经历的任何运动,其在世界坐标中的位置由标准 FK 程序确定。虽然很简单,但刚性蒙皮很难在关节附近的区域获得足够平滑的皮肤变形,也很难获得类似呼吸或肌肉动作的更微妙的效果。为此可以使用称为屈肌的其他专用变形器。在平滑蒙皮中,多个关节可以根据动画师指定的一些权重影响皮肤顶点,从而提供对结果的更详细控制。位移向量 d 由影响给定皮肤顶点的不同关节所表示(每个关节再次使用标准 FK 计算),根据其权重 w 取平均值,以计算顶点的最终位移d = Σ wd。归一化权重(Σ w = 1)最常见,但并非必不可少。设置平滑蒙皮权重以实现所需效果并不容易,需要动画师具备丰富的技能。
Figure 16.20. Top: Rigid skinning assigns skin vertices to a specific joint. Those belonging to the elbow joint are shown in black; Bottom: Soft skinning can blend the influence of several joints. Weights for the elbow joint are shown (lighter = greater weight). Note smoother skin deformation of the inner part of the skin near the joint.
图 16.20。顶部:刚性蒙皮将皮肤顶点分配给特定关节。属于肘关节的皮肤顶点显示为黑色;底部:软蒙皮可以混合多个关节的影响。显示了肘关节的权重(越轻 = 权重越大)。请注意关节附近皮肤内侧的皮肤变形更平滑。
Skeletons are well suited for creating most motions of a character’s body, but they are not very convenient for realistic facial animation. The reason is that the skin of a human face is moved by muscles directly attached to it, contrary to other parts of the body where the primary objective of the muscles is to move the bones of the skeleton and any skin deformation is a secondary outcome. The result of this facial anatomical arrangement is a very rich set of dynamic facial expressions humans use as one of the main instruments of communication. We are all very well trained to recognize such facial variations and can easily notice any unnatural appearance. This not only puts special demands on the animator but also requires a high-resolution geometric model of the face and, if photorealism is desired, accurate skin reflection properties and textures.
骨骼非常适合创建角色身体的大多数动作,但它们对于逼真的面部动画来说并不十分方便。原因是人类面部的皮肤是由直接附着在其上的肌肉移动的,而身体其他部位的肌肉则主要用来移动骨骼,皮肤变形则是其次。这种面部解剖结构导致人类拥有一组非常丰富的动态面部表情,而面部表情是人类交流的主要工具之一。我们都受过良好的训练,能够识别此类面部变化,并能轻易发现任何不自然的表情。这不仅对动画师提出了特殊要求,还需要高分辨率的面部几何模型,如果想要照片级真实感,还需要准确的皮肤反射特性和纹理。
While it is possible to set key poses of the face vertex-by-vertex and interpolate between them or directly simulate the behavior of the underlying muscle structure using physics-based techniques (see Section 16.5), more specialized high-level approaches also exist. The static shape of a specific face can be characterized by a relatively small set of so-called conformational parameters (overall scale, distance from the eye to the forehead, length of the nose, width of the jaws, etc.) which are used to morph a generic face model into one with individual features. An additional set of expressive parameters can be used to describe the dynamic shape of the face for animation. Examples include rigid rotation of the head, how wide the eyes are open, movement of some feature point from its static position, etc. These are chosen so that most of the interesting expressions can be obtained through some combination of parameter adjustments, therefore, allowing a face to be animated via standard keyframing. To achieve a higher level of control, one can use expressive parameters to create a set of expressions corresponding to common emotions (neutral, sadness, happiness, anger, surprise, etc.) and then blend these key poses to obtain a “slightly sad” or “angrily surprised” face. Similar techniques can be used to perform lip-synch animation, but key poses in this case correspond to different phonemes. Instead of using a sequence of static expressions to describe a dynamic one, the Facial Action Coding System (FACS) (Eckman & Friesen, 1978) decomposes dynamic facial expressions directly into a sum of elementary motions called action units (AUs). The set of AUs is based on extensive psychological research and includes such movements as raising the inner brow, wrinkling the nose, stretching lips, etc. Combining AUs can be used to synthesize a necessary expression.
虽然可以逐个顶点设置面部的关键姿势并在它们之间进行插值,或者直接使用基于物理的技术模拟底层肌肉结构的行为(参见第 16.5 节),但也存在更专业的高级方法。特定面部的静态形状可以通过一组相对较小的所谓构象参数(整体比例、从眼睛到前额的距离、鼻子的长度、下颌的宽度等)来表征,这些参数用于将通用面部模型变形为具有个体特征的模型。一组额外的表现参数可用于描述动画中面部的动态形状。例子包括头部的刚性旋转、眼睛睁开的宽度、某些特征点从其静态位置的移动等。选择这些参数是为了通过某种参数调整组合来获得大多数有趣的表情,从而允许通过标准关键帧来制作面部动画。为了实现更高级别的控制,可以使用表情参数创建一组与常见情绪(中性、悲伤、快乐、愤怒、惊讶等)相对应的表情,然后混合这些关键姿势以获得“略带悲伤”或“愤怒惊讶”的面部表情。可以使用类似的技术来执行口型同步动画,但在这种情况下,关键姿势对应于不同的音素。面部动作编码系统 (FACS)(Eckman & Friesen,1978)不是使用一系列静态表情来描述动态表情,而是将动态面部表情直接分解为一组称为动作单元 (AU) 的基本动作。 这组AU是基于广泛的心理学研究而来的,其中包括抬起眉毛内侧,皱鼻子,张开嘴唇等动作。组合AU可以合成必要的表情。
Even with the help of the techniques described above, creating realistic-looking character animation from scratch remains a daunting task. It is therefore only natural that much attention is directed toward techniques which record an actor’s motion in the real world and then apply it to computer-generated characters. Two main classes of such motion capture (MC) techniques exist: electromagnetic and optical.
即使借助上述技术,从头开始创建逼真的角色动画仍然是一项艰巨的任务。因此,人们自然而然地将注意力集中在记录演员在现实世界中的动作并将其应用于计算机生成角色的技术上。此类动作捕捉(MC) 技术主要分为两类:电磁和光学。
In electromagnetic motion capture, an electromagnetic sensor directly measures its position (and possibly orientation) in 3D, often providing the captured results in real time. Disadvantages of this technique include significant equipment cost, possible interference from nearby metal objects, and noticeable size of sensors and batteries which can be an obstacle in performing high-amplitude motions. In optical MC, small colored markers are used instead of active sensors making it a much less intrusive procedure. Figure 16.21 shows the operation of such a system. In the most basic arrangement, the motion is recorded by two calibrated video cameras, and simple triangulation is used to extract the marker’s 3D position. More advanced computer vision algorithms used for accurate tracking of multiple markers from video are computationally expensive, so, in most cases, such processing is done offline. Optical tracking is generally less robust than electromagnetic. Occlusion of a given marker in some frames, possible misidentification of markers, and noise in images are just a few of the common problem which have to be addressed. Introducing more cameras observing the motion from different directions improves both accuracy and robustness, but this approach is more expensive and it takes longer to process such data. Optical MC becomes more attractive as available computational power increases and better computer vision algorithms are developed. Because of low impact nature of markers, optical methods are suitable for delicate facial motion capture and can also be used with objects other than humans–for example, animals or even tree branches in the wind.
在电磁运动捕捉中,电磁传感器直接测量其在 3D 中的位置(可能还有方向),通常实时提供捕捉结果。这种技术的缺点包括设备成本高、附近金属物体的可能干扰以及传感器和电池的明显尺寸,这可能会妨碍执行高振幅运动。在光学 MC 中,使用小的彩色标记代替有源传感器,使其成为一种干扰性小得多的程序。图 16.21显示了这种系统的操作。在最基本的布置中,运动由两个校准的摄像机记录,并使用简单的三角测量来提取标记的 3D 位置。用于从视频中精确跟踪多个标记的更先进的计算机视觉算法在计算上是昂贵的,因此,在大多数情况下,这种处理是离线完成的。光学跟踪通常不如电磁跟踪那么稳健。某些帧中给定标记的遮挡、可能的错误识别标记以及图像中的噪声只是必须解决的常见问题中的一小部分。引入更多摄像头从不同方向观察动作可以提高准确性和稳健性,但这种方法成本更高,处理此类数据需要更长的时间。随着可用计算能力的提高和更好的计算机视觉算法的开发,光学 MC 变得越来越有吸引力。由于标记物的影响较小,光学方法适用于精细的面部动作捕捉,也可以用于人类以外的物体——例如动物,甚至是风中的树枝。
Figure 16.21. Optical motion capture: markers attached to a performer’s body allow skeletal motion to be extracted. Image courtesy of Motion Analysis Corp.
图 16.21。光学动作捕捉:附在表演者身上的标记可以提取骨骼运动。图片由 Motion Analysis Corp 提供。
With several sensors or markers attached to a performer’s body, a set of time-dependant 3D positions of some collection of points can be recorded. These tracking locations are commonly chosen near joints, but, of course, they still lie on skin surface and not at points where actual bones meet. Therefore, some additional care and a bit of extra processing is necessary to convert recorded positions into those of the physical skeleton joints. For example, putting two markers on opposite sides of the elbow or ankle allows the system to obtain better joint position by averaging locations of the two markers. Without such extra care, very noticeable artifacts can appear due to offset joint positions as well as inherent noise and insufficient measurement accuracy. Because of physical inaccuracy during motion, for example, character limbs can lose contact with objects they are supposed to touch during walking or grasping, problems like foot-sliding (skating) of the skeleton can occur. Most of these problems can be corrected by using inverse kinematics techniques which can explicitly force the required behavior of the limb’s end.
通过在演员的身体上安装多个传感器或标记,可以记录一些点集合的一组时间相关 3D 位置。这些跟踪位置通常选择在关节附近,但当然,它们仍然位于皮肤表面,而不是实际骨骼相接的点。因此,需要一些额外的注意和一些额外的处理才能将记录的位置转换为物理骨骼关节的位置。例如,将两个标记放在肘部或脚踝的相对两侧,系统可以通过平均两个标记的位置来获得更好的关节位置。如果没有这样的额外注意,由于关节位置偏移以及固有噪声和测量精度不足,可能会出现非常明显的伪影。例如,由于运动过程中的物理不准确性,角色肢体可能会在行走或抓握过程中与它们应该接触的物体失去接触,从而可能出现骨骼脚滑动(滑行)等问题。大多数这些问题可以通过使用逆运动学技术来纠正,该技术可以明确强制肢体末端所需的行为。
Recovered joint positions can now be directly applied to the skeleton of a computer-generated character. This procedure assumes that the physical dimensions of the character are identical to those of the performer. Retargeting recorded motion to a different character and, more generally, editing MC data, requires significant care to satisfy necessary constraints (such as maintaining feet on the ground or not allowing an elbow to bend backwards) and preserve an overall natural appearance of the modified motion. Generally, the greater the desired change from the original, the less likely it will be possible to maintain the quality of the result. An interesting approach to the problem is to record a large collection of motions and stitch together short clips from this library to obtain desired movement. Although this topic is currently a very active research area, limited ability to adjust the recorded motion to the animator’s needs remains one of the main disadvantages of motion capture technique.
现在可以将恢复的关节位置直接应用于计算机生成角色的骨架。此过程假设角色的物理尺寸与表演者的物理尺寸相同。将记录的动作重新定位到不同的角色,更一般地说,编辑 MC 数据,需要非常小心地满足必要的约束条件(例如保持双脚着地或不允许肘部向后弯曲)并保持修改后的动作的整体自然外观。通常,与原始动作相比,所需的变化越大,保持结果质量的可能性就越小。解决这个问题的一个有趣方法是记录大量动作,并将该库中的短片拼接在一起以获得所需的动作。虽然这个主题目前是一个非常活跃的研究领域,但根据动画师的需求调整记录的动作的能力有限仍然是运动捕捉技术的主要缺点之一。
The world around us is governed by physical laws, many of which can be formalized as sets of partial or, in some simpler cases, ordinary differential equations. One of the original applications of computers was (and remains) solving such equations. It is therefore only natural to attempt to use numerical techniques developed over the several past decades to obtain realistic motion for computer animation.
我们周围的世界受物理定律的支配,其中许多定律可以形式化为偏微分方程组,或者在一些更简单的情况下,常微分方程组。计算机最初的应用之一就是(现在仍然是)求解此类方程。因此,尝试使用过去几十年开发的数值技术来获得计算机动画的逼真运动是很自然的。
Because of its relative complexity and significant cost, physics-based animation is most commonly used in situations when other techniques are either unavailable or do not produce sufficiently realistic results. Prime examples include animation of fluids (which includes many gaseous phase phenomena described by the same equations–smoke, clouds, fire, etc.), cloth simulation (an example is shown in Figure 16.22), rigid body motion, and accurate deformation of elastic objects. Governing equations and details of commonly used numerical approaches are different in each of these cases, but many fundamental ideas and difficulties remain applicable across applications. Many methods for numerically solving ODEs and PDEs exist, but discussing them in details is far beyond the scope of this book. To give the reader a flavor of physics-based techniques and some of the issues involved, we will briefly mention here only the finite difference approach–one of the conceptually simplest and most popular families of algorithms which has been applied to most, if not all, differential equations encountered in animation.
由于其相对复杂性和显著成本,基于物理的动画最常用于其他技术不可用或不能产生足够逼真效果的情况。主要的例子包括流体动画(包括许多用相同方程描述的气相现象:烟、云、火等)、布料模拟(图 16.22中显示了一个示例)、刚体运动和弹性物体的精确变形。在每种情况下,控制方程和常用数值方法的细节都不同,但许多基本思想和难点在各种应用中仍然适用。存在许多用于数值求解 ODE 和 PDE 的方法,但详细讨论它们远远超出了本书的范围。为了让读者了解基于物理的技术和其中涉及的一些问题,我们将在这里简要提到有限差分方法 - 概念上最简单、最流行的算法系列之一,已应用于动画中遇到的大多数(如果不是全部)微分方程。
Figure 16.22. Realistic cloth simulation is often performed with physics-based methods. In this example, forces are due to collisions and gravity.
图 16.22.逼真的布料模拟通常采用基于物理的方法。在此示例中,力是由碰撞和重力引起的。
The key idea of this approach is to replace a differential equation with its discrete analog–a difference equation. To do this, the continuous domain of interest is represented by a finite set of points at which the solution will be computed. In the simplest case, these are defined on a uniform rectangular grid as shown in Figure 16.23. Every derivative present in the original ODE or PDE is then replaced by its approximation through function values at grid points. One way of doing this is to subtract the function value at a given point from the function value for its neighboring point on the grid:
这种方法的关键思想是用离散模拟——差分方程来代替微分方程。为此,感兴趣的连续域由一组有限的点表示,将在这些点上计算解。在最简单的情况下,这些点定义在均匀的矩形网格上,如图 16.23所示。然后,原始 ODE 或 PDE 中存在的每个导数都由其在网格点处通过函数值的近似值替换。一种方法是从网格上相邻点的函数值中减去给定点处的函数值:
Figure 16.23. Two possible difference schemes for an equation involving derivatives ∂f/∂x and ∂f/∂t. (Left) An explicit scheme expresses unknown values (open circles) only through known values at the current (orange circles) and possibly past (blue circles) time; (Right) Implicit schemes mix known and unknown values in a single equation making it necessary to solve all such equations as a system. For both schemes, information about values on the right boundary is needed to close the process.
图 16.23。涉及导数 ∂ f /∂ x和 ∂ f /∂ t的方程的两种可能的差分方案。(左)显式方案仅通过当前(橙色圆圈)和可能过去(蓝色圆圈)时间的已知值来表示未知值(空心圆圈);(右)隐式方案将已知值和未知值混合在一个方程中,因此需要将所有此类方程作为一个系统求解。对于这两种方案,都需要有关右边界上值的信息来关闭该过程。
These expressions are, of course, not the only way. One can, for example, use f (t − Δt) instead of f(t) above and divide by 2Δt. For an equation containing a time derivative, it is now possible to propagate values of an unknown function forward in time in a sequence of Δt-size steps by solving the system of difference equations (one at each spatial location) for unknown f(t + Δt). Some initial conditions, i.e., values of the unknown function at t = 0, are necessary to start the process. Other information, such as values on the boundary of the domain, might also be required depending on the specific problem.
当然,这些表达式并不是唯一的方法。例如,可以使用f ( t − Δ t ) 代替上面的f ( t ),然后除以 2Δ t 。对于包含时间导数的方程,现在可以通过求解未知f ( t + Δ t ) 的差分方程组(每个空间位置一个),以 Δ t大小的步骤序列将未知函数的值随时间向前传播。启动该过程需要一些初始条件,即t = 0 时未知函数的值。根据具体问题,可能还需要其他信息,例如域边界上的值。
The computation of f (t+Δt) can be done easily for so-called explicit schemes when all other values present are taken at the current time and the only unknown in the corresponding difference equation f (t + Δt) is expressed through these known values. Implicit schemes mix values at current and future times and might use, for example,
对于所谓的显式方案,当所有其他值都取自当前时间,并且相应差分方程f ( t + Δ t ) 中唯一的未知数通过这些已知值表示时,可以轻松计算f ( t + Δ t )。隐式方案混合了当前和未来时间的值,例如,可能会使用
as an approximation of . In this case one has to solve a system of algebraic equations at each step.
近似为∂ f ∂十. 在这种情况下,每一步都必须求解代数方程组。
The choice of difference scheme can dramatically affect all aspects of the algorithm. The most obvious among them is accuracy. In the limit Δt → 0 or Δx → 0, expressions of the type in Equation (16.2) are exact, but for finite step size some schemes allow better approximation of the derivative than others. Stability of a difference scheme is related to how fast numerical errors, which are always present in practice, can grow with time. For stable schemes this growth is bounded, while for unstable ones it is exponential and can quickly overwhelm the solution one seeks (see Figure 16.24). It is important to realize that while some inaccuracy in the solution is tolerable (and, in fact, accuracy demanded in physics and engineering is rarely needed for animation), an unstable result is completely meaningless, and one should avoid using unstable schemes. Generally, explicit schemes are either unstable or can become unstable at larger step sizes while implicit ones are unconditionally stable. Implicit schemes allows greater step size (and, therefore, fewer steps) which is why they are popular despite the need to solve a system of algebraic equations at each step. Explicit schemes are attractive because of their simplicity if their stability conditions can be satisfied. Developing a good difference scheme and corresponding algorithm for a specific problem is not easy, and for most standard situations it is well advised to use an existing method. Ample literature discussing details of these techniques is available.
差分格式的选择会极大地影响算法的各个方面。其中最明显的是精度。在极限 Δ t → 0 或 Δ x → 0 时,方程 (16.2) 中的表达式是精确的,但对于有限步长,某些格式比其他格式更能近似导数。差分格式的稳定性与数值误差(在实践中始终存在)随时间增长的速度有关。对于稳定格式,这种增长是有界的,而对于不稳定格式,这种增长是指数级的,并且可能很快压倒人们寻求的解决方案(参见图 16.24 )。重要的是要认识到,虽然解决方案中的一些不准确性是可以容忍的(事实上,物理和工程所要求的精度很少用于动画),但不稳定的结果完全没有意义,应该避免使用不稳定的格式。通常,显式格式要么不稳定,要么在较大的步长下会变得不稳定,而隐式格式则是无条件稳定的。隐式方案允许更大的步长(因此,步数更少),这就是为什么它们很受欢迎,尽管每一步都需要求解代数方程组。如果能够满足显式方案的稳定性条件,则显式方案因其简单性而具有吸引力。为特定问题开发一个好的差分方案和相应的算法并不容易,对于大多数标准情况,建议使用现有方法。有大量文献讨论这些技术的细节。
Figure 16.24. An unstable solution might follow the exact one initially, but can deviate arbitrarily far from it with time. Accuracy of a stable solution might still be insufficient for a specific application.
图 16.24。不稳定解决方案最初可能遵循精确解决方案,但随着时间的推移可能会偏离精确解决方案。稳定解决方案的精度可能仍然不足以满足特定应用的要求。
One should remember that, in many cases, just computing all necessary terms in the equation is a difficult and time-consuming task on its own. In rigid body or cloth simulation, for example, most of the forces acting on the system are due to collisions among objects. At each step during animation, one therefore has to solve a purely geometric, but very nontrivial, problem of collision detection. In such conditions, schemes which require fewer evaluations of such forces might provide significant computational savings.
需要记住的是,在许多情况下,仅计算方程中的所有必要项本身就是一项困难且耗时的任务。例如,在刚体或布料模拟中,作用于系统的大部分力都是由于物体之间的碰撞而产生的。因此,在动画的每个步骤中,都必须解决一个纯几何但非常不平凡的碰撞检测问题。在这种情况下,需要较少评估此类力的方案可能会节省大量计算量。
Although the result of solving appropriate time-dependant equations gives very realistic motion, this approach has its limitations. First of all, it is very hard to control the result of physics-based animation. Fundamental mathematical properties of these equations state that once the initial conditions are set, the solution is uniquely defined. This does not leave much room for animator input and, if the result is not satisfactory for some reason, one has only a few options. They are mostly limited to adjusting initial condition used, changing physical properties of the system, or even modifying the equations themselves by introducing artificial terms intended to “drive” the solution in the direction the animator wants. Making such changes requires significant skill as well as understanding of the underlying physics and, ideally, numerical methods. Without this knowledge, the realism provided by physics-based animation can be destroyed or severe numerical problems might appear.
尽管求解适当的时间相关方程的结果可以产生非常逼真的运动,但这种方法也有其局限性。首先,很难控制基于物理的动画的结果。这些方程的基本数学性质表明,一旦设置了初始条件,解决方案就被唯一定义。这为动画师输入留下了太多空间,如果由于某种原因结果不令人满意,则只有少数选择。它们大多仅限于调整使用的初始条件,更改系统的物理属性,甚至通过引入旨在“推动”解决方案朝着动画师想要的方向发展的人工术语来修改方程本身。进行此类更改需要相当的技能以及对基础物理学和理想情况下的数值方法的理解。如果没有这些知识,基于物理的动画所提供的真实感可能会被破坏,或者可能会出现严重的数值问题。
Imagine that one could write (and implement on a computer) a mathematical function which outputs precisely the desired motion given some animator guidance. Physics-based techniques outlined above can be treated as a special case of such an approach when the “function” involved is the procedure to solve a particular differential equation and “guidance” is the set of initial and boundary conditions, extra equation terms, etc.
想象一下,我们可以编写(并在计算机上实现)一个数学函数,在动画师的指导下,该函数可以精确输出所需的动作。当所涉及的“函数”是求解特定微分方程的过程,而“指导”是一组初始条件和边界条件、额外的方程项等时,上述基于物理的技术可以视为这种方法的一个特例。
However, if we are only concerned with the final result, we do not have to follow a physics-based approach. For example, a simple constant amplitude wave on the surface of a lake can be directly created by applying the function f (x,t) = A cos(ωt − kx + ϕ) with constant frequency ω, wave vector k and phase ϕ to get displacement at the 2D point x at time t. A collection of such waves with random phases and appropriately chosen amplitudes, frequencies, and wave vectors can result in a very realistic animation of the surface of water without explicitly solving any fluid dynamics equations. It turns out that other rather simple mathematical functions can also create very interesting patterns or objects. Several such functions, most based on lattice noises, have been described in Section 11.5. Adding time dependance to these functions allows us to animate certain complex phenomena much easier and cheaper than with physics-based techniques while maintaining very high visual quality of the results. If noise(x) is the underlying pattern-generating function, one can create a time-dependant variant of it by moving the argument position through the lattice. The simplest case is motion with constant speed: timenoise(x,t) = noise(x + vt), but more complex motion through the lattice is, of course, also possible and, in fact, more common. One such path, a spiral, is shown in Figure 16.25. Another approach is to animate parameters used to generate the noise function. This is especially appropriate if the appearance changes significantly with time–a cloud becoming more turbulent, for example. In this way one can animate the dynamic process of formation of clouds using the function which generates static ones.
但是,如果我们只关心最终结果,则不必遵循基于物理的方法。例如,可以通过应用函数f ( x ,t ) = A cos( ωt − kx + ϕ )(其频率为恒定的ω 、波矢为k和相位为ϕ )来直接创建湖面上的简单恒定振幅波,以得到时间t时二维点x处的位移。具有随机相位和适当选择的振幅、频率和波矢的集合可以产生非常逼真的水面动画,而无需明确求解任何流体动力学方程。事实证明,其他相当简单的数学函数也可以创建非常有趣的图案或对象。第 11.5 节中描述了几个这样的函数,其中大多数基于晶格噪声。在这些函数中添加时间依赖性使我们能够比使用基于物理的技术更轻松、更便宜地为某些复杂现象制作动画,同时保持结果的非常高的视觉质量。如果噪声( x ) 是底层的模式生成函数,则可以通过在晶格中移动自变量位置来创建它的时间相关变体。最简单的情况是恒速运动:时间噪声( x ,t ) =噪声( x + vt ),但当然,通过晶格的更复杂的运动也是可能的,事实上,更常见。 图 16.25显示了一条这样的路径,即螺旋线。另一种方法是将用于生成噪声函数的参数制作成动画。如果外观随时间发生显著变化(例如,云变得更加湍急),这种方法尤其适用。这样,就可以使用生成静态云的函数将云的动态形成过程制作成动画。
Figure 16.25. A path through the cube defining procedural noise is traversed to animate the resulting pattern.
图 16.25.定义程序噪声的立方体路径被遍历,以使生成的图案动起来。
For some procedural techniques, time dependance is a more integral component. The simplest cellular automata operate on a 2D rectangular grid where a binary value is stored at each location (cell). To create a time varying pattern, some user-provided rules for modifying these values are repeatedly applied. Rules typically involve some set of conditions on the current value and that of the cell’s neighbors. For example, the rules of the popular 2D Game of Life cellular automaton invented in 1970 by British mathematician John Conway are the following:
对于某些程序技术来说,时间依赖性是一个更不可或缺的组成部分。最简单的细胞自动机在二维矩形网格上运行,每个位置(细胞)都存储一个二进制值。为了创建随时间变化的模式,需要反复应用一些用户提供的修改这些值的规则。规则通常涉及当前值和细胞邻居值的一组条件。例如,英国数学家约翰·康威于 1970 年发明的流行二维生命游戏细胞自动机的规则如下:
A dead cell (i.e., binary value at a given location is 0) with exactly three live neighbors becomes a live cell (i.e., its value set to 1).
一个死细胞(即,在给定位置的二进制值为 0)如果恰好有三个活邻居,则变为活细胞(即,其值设置为 1)。
A live cell with two or three live neighbors stays alive.
具有两个或三个活邻居的活细胞仍可存活。
In all other cases, a cell dies or remains dead.
在所有其他情况下,细胞都会死亡或保持死亡状态。
Once the rules are applied to all grid locations, a new pattern is created and a new evolution cycle can be started. Three sample snapshots of the live cell distribution at different times are shown in Figure 16.26. More sophisticated automata simultaneously operate on several 3D grids of possibly floating point values and can be used for modeling dynamics of clouds and other gaseous phenomena or biological systems for which this apparatus was originally invented (note the terminology). Surprising pattern complexity can arise from just a few well-chosen rules, but how to write such rules to create the desired behavior is often not obvious. This is a common problem with procedural techniques: there is only limited, if any, guidance on how to create new procedures or even adjust parameters of existing ones. Therefore, a lot of tweaking and learning by trial-and-error (“by experience”) is usually needed to unlock the full potential of procedural methods.
一旦将规则应用于所有网格位置,就会创建一个新模式并开始新的进化周期。图 16.26显示了不同时间的活细胞分布的三个示例快照。更复杂的自动机可以同时在几个可能是浮点值的 3D 网格上运行,并且可以用于模拟云和其他气体现象或生物系统的动态,这种装置最初就是为此而发明的(请注意术语)。令人惊讶的模式复杂性可能仅仅来自几个精心挑选的规则,但是如何编写这样的规则来创建所需的行为通常并不明显。这是程序技术的一个常见问题:关于如何创建新程序或甚至调整现有程序参数的指导非常有限(如果有的话)。因此,通常需要通过反复试验(“通过经验”)进行大量的调整和学习,才能充分发挥程序方法的潜力。
Figure 16.26. Several (non-consecutive) stages in the evolution of a Game of Life automaton. Live cells are shown in black. Stable objects, oscillators, traveling patterns, and many other interesting constructions can result from the application of very simple rules. Figure created using a program by Alan Hensel.
图 16.26。生命游戏自动机演化的几个(非连续)阶段。活细胞以黑色显示。应用非常简单的规则可以产生稳定的物体、振荡器、行进模式和许多其他有趣的构造。该图使用 Alan Hensel 的程序创建。
Another interesting approach which was also originally developed to describe biological objects is the technique called L-systems (after the name of their original inventor, Astrid Lindenmayer). This approach is based on grammars or sets of recursive rules for rewriting strings of symbols. There are two types of symbols: terminal symbols stand for elements of something we want to represent with a grammar. Depending on their meaning, grammars can describe structure of trees and bushes, buildings and whole cities, or programming and natural languages. In animation, L-systems are most popular for representing plants and corresponding terminals are instructions to the geometric modeling system: put a leaf (or a branch) at a current position–we will use the symbol @ and just draw a circle, move current position forward by some number of units (symbol f), turn current direction 60 degrees around world Z-axis (symbol +), pop (symbol [) or push (symbol ]) current position/orientation, etc. Auxiliary nonterminal symbols (de-noted by capital letters) have only semantic rather than any direct meaning. They are intended to be eventually rewritten through terminals. We start from the special nonterminal start symbol S and keep applying grammar rules to the current string in parallel, i.e., replace all nonterminals currently present to get the new string, until we end up with a string containing only terminals and no more substitution is therefore possible. This string of modeling instructions is then used to output the actual geometry. For example, a set of rules (productions)
另一种有趣的方法最初也是为了描述生物对象而开发的,称为L 系统(以其最初发明者 Astrid Lindenmayer 的名字命名)。这种方法基于语法或一组用于重写符号字符串的递归规则。有两种类型的符号:终端符号代表我们想要用语法表示的事物的元素。根据其含义,语法可以描述树木和灌木、建筑物和整个城市的结构,或编程和自然语言。在动画中,L 系统最常用于表示植物,相应的终端是几何建模系统的指令:将一片叶子(或一个树枝)放在当前位置——我们将使用符号 @ 并绘制一个圆圈,将当前位置向前移动一定数量的单位(符号f ),将当前方向绕世界Z轴旋转 60 度(符号 +),弹出(符号 [)或推送(符号 ])当前位置/方向等。辅助非终端符号(用大写字母表示)仅具有语义,而没有任何直接含义。它们最终将通过终端重写。我们从特殊的非终结符起始符号S开始,并继续并行地将语法规则应用于当前字符串,即替换当前存在的所有非终结符以获取新字符串,直到我们最终得到一个仅包含终结符的字符串,因此无法再进行替换。然后使用此建模指令字符串输出实际几何图形。例如,一组规则(产生式)
might result in the following sequence of rewriting steps demonstrated in Figure 16.27:
可能导致如图 16.27所示的重写步骤序列:
Figure 16.27. Consecutive derivation steps using a simple L-system. Capital letters denote nonterminals and illustrate positions at which corresponding nonterminal will be expanded. They are not part of the actual output.
图 16.27.使用简单 L 系统的连续推导步骤。大写字母表示非终结符,并说明相应非终结符将扩展的位置。它们不是实际输出的一部分。
As shown above, there are typically many different productions for the same nonterminal allowing the generation of many different objects with the same grammar. The choice of which rule to apply can depend on which symbols are located next to the one being replaced (context-sensitivity) or can be performed at random with some assigned probability for each rule (stochastic L-systems). More complex rules can model interaction with the environment, such as pruning to a particular shape, and parameters can be associated with symbols to control geometric commands issued.
如上所示,对于同一个非终结符,通常有许多不同的产生式,从而允许生成具有相同语法的许多不同对象。选择应用哪条规则可能取决于哪些符号位于被替换符号的旁边(上下文敏感性),或者可以随机执行,并为每条规则分配一些概率(随机 L 系统)。更复杂的规则可以模拟与环境的交互,例如修剪到特定形状,并且可以将参数与符号相关联以控制发出的几何命令。
L-systems already capture plant topology changes with time: each intermediate string obtained in the rewriting process can be interpreted as a “younger” version of the plant (see Figure 16.27). For more significant changes, different productions can be in effect at different times allowing the structure of the plant to change significantly as it grows. A young tree, for example, produces a lot of new branches, while an older one branches only moderately.
L 系统已经捕捉到了植物拓扑结构随时间的变化:重写过程中获得的每个中间字符串都可以解释为植物的“年轻”版本(见图16.27 )。对于更显著的变化,不同的产生可以在不同的时间生效,从而使植物的结构在生长过程中发生显著变化。例如,一棵幼树会长出许多新枝,而一棵老树只会适度地分枝。
Very realistic plant models have been created with L-systems. However, as with most procedural techniques, one needs some experience to meaningfully apply existing L-systems, and writing new grammars to capture some desired effect is certainly not easy.
使用 L 系统已经创建了非常逼真的植物模型。然而,与大多数程序技术一样,人们需要一些经验才能有意义地应用现有的 L 系统,而编写新语法来捕捉一些期望的效果肯定并不容易。
To animate multiple objects one can, of course, simply apply standard techniques described in this chapter so far to each of them. This works reasonably well for a moderate number of independent objects whose desired motion is known in advance. However, in many cases, some kind of coordinated action in a dynamic environment is necessary. If only a few objects are involved, the animator can use an artificial intelligence (AI)-based system to automatically determine immediate tasks for each object based on some high-level goal, plan necessary motion, and execute the plan. Many modern games use such autonomous objects to create smart monsters or player’s collaborators.
当然,要为多个对象制作动画,只需将本章迄今为止描述的标准技术应用于每个对象即可。对于数量适中的独立对象(这些对象的预期运动是预先知道的),这种方法效果很好。然而,在许多情况下,动态环境中的某种协调动作是必要的。如果只涉及几个对象,动画师可以使用基于人工智能 (AI) 的系统根据某些高级目标自动确定每个对象的即时任务,规划必要的运动并执行该计划。许多现代游戏使用这种自主对象来创建智能怪物或玩家的合作者。
Interestingly, as the number of objects in a group grows from just a few to several dozens, hundreds, and thousands, individual members of a group must have only very limited “intelligence” in order for the group as a whole to exhibit what looks like coordinated goal-driven motion. It turns out that this flockingis emergent behavior which can arise as a result of limited interaction of group members with just a few of their closest neighbors (Reynolds, 1987). Flocking should be familiar to anyone who has observed the fascinatingly synchronized motion of a flock of birds or a school of fish. The technique can also be used to control groups of animals moving over terrain or even a human crowd.
有趣的是,随着群体中物体的数量从几个增加到几十个、几百个和几千个,群体中的个体成员必须只具有非常有限的“智能”,才能使整个群体表现出看起来像协调的目标驱动运动。事实证明,这群集行为是一种突发行为,可能由于群体成员与少数几个最亲密的邻居之间的有限互动而产生(Reynolds,1987)。任何观察过鸟群或鱼群令人着迷的同步运动的人都应该熟悉群集行为。该技术还可用于控制在地形上移动的动物群,甚至人类群体。
At any given moment, the motion of a member of a group, often called boid when applied to flocks, is the result of balancing several often contradictory tendencies, each of which suggests its own velocity vector (see Figure 16.28). First, there are external physical forces F acting on the boid, such as gravity or wind. New velocity due to those forces can be computed directly through Newton’s law as
在任何给定时刻,群体中一个成员的运动(在用于鸟群时通常称为“个体”)是平衡几种经常相互矛盾的趋势的结果,每种趋势都表明了其自己的速度矢量(见图16.28 )。首先,有外部物理力F作用于个体,例如重力或风。这些力产生的新速度可以直接通过牛顿定律计算为
Figure 16.28. (Left) Individual flock member (boid) can experience several urges of different importance (shown by line thickness) which have to be negotiated into a single velocity vector. A boid is aware of only its limited neighborhood (circle). (Right) Boid control is commonly implemented as three separate modules.
图 16.28。 (左)个体群体成员(群体)可以感受到不同重要性的几种冲动(以线条粗细表示),这些冲动必须被协调为一个速度矢量。群体只知道其有限的邻域(圆圈)。(右)群体控制通常作为三个独立模块实现。
Second, a boid should react to global environment and to the behavior of other group members. Collision avoidance is one of the main results of such interaction. It is crucial for flocking that each group member has only limited field of view, and therefore is aware only of things happening within some neighborhood of its current position. To avoid objects in the environment, the simplest, if imperfect, strategy is to set up a limited extent repulsive force field around each such object. This will create a second desired velocity vector , also given by Newton’s law. Interaction with other group members can be modeled by simultaneously applying different steering behaviors resulting in several additional desired velocity vectors . Moving away from neighbors to avoid crowding, steering toward flock mates to ensure flock cohesion, and adjusting a boid’s speed to align with average heading of neighbors are most common. Finally, some additional desired velocity vectors are usually applied to achieve needed global goals. These can be vectors along some path in space, following some specific designated leader of the flock, or simply representing migratory urge of a flock member.
其次,群体应该对全局环境和其他群体成员的行为做出反应。避免碰撞是这种互动的主要结果之一。对于群体而言,至关重要的是,每个群体成员只有有限的视野,因此只能知道其当前位置附近发生的事情。为了避开环境中的物体,最简单的(尽管不完美)策略是围绕每个这样的物体设置一个有限范围的排斥力场。这将创建第二个期望速度矢量五n埃瓦丙o升−一个五o我d ,也是由牛顿定律给出的。与其他小组成员的互动可以通过同时应用不同的转向行为来建模,从而产生几个额外的期望速度矢量五n埃瓦s吨埃埃r 。最常见的做法是远离邻居以避免拥挤,转向同伴以确保群体凝聚力,以及调整个体的速度以与邻居的平均航向保持一致。最后,一些额外的期望速度矢量五n埃瓦克o一个升通常用于实现所需的全球目标。 这些可以是沿着空间某条路径的矢量,跟随羊群中某个特定的指定领导者,或者仅仅代表羊群成员的迁徙冲动。
Once all vnew are determined, the final desired vector is negotiated based on priorities among them. Collision avoidance and velocity matching typically have higher priority. Instead of simple averaging of desired velocity vectors which can lead to cancellation of urges and unnatural “moving nowhere” behavior, an acceleration allocation strategy is used. Some fixed total amount of acceleration is made available for a boid and fractions of it are being given to each urge in order of priority. If the total available acceleration runs out, some lower priority urges will have less effect on the motion or be completely ignored. The hope is that once the currently most important task (collision avoidance in most situations) is accomplished, other tasks can be taken care of in near future. It is also important to respect some physical limitations of real objects, for example, clamping too high accelerations or speeds to some realistic values. Depending on the internal complexity of the flock member, the final stage of animation might be to turn the negotiated velocity vector into a specific set of parameters (bird’s wing positions, orientation of plane model in space, leg skeleton bone configuration) used to control a boid’s motion. A diagram of a system implementing flocking is shown on Figure 16.28 (right).
一旦确定了所有v new ,就会根据它们之间的优先级协商最终的期望向量。碰撞避免和速度匹配通常具有更高的优先级。与可能导致取消冲动和不自然的“无处移动”行为的期望速度向量的简单平均不同,我们使用了加速度分配策略。为一个 boid 提供一些固定的总加速度,并将其中的部分按优先级顺序分配给每个冲动。如果总可用加速度用尽,一些优先级较低的冲动将对运动产生较小的影响或被完全忽略。希望一旦完成了当前最重要的任务(在大多数情况下是碰撞避免),其他任务就可以在不久的将来完成。尊重真实物体的一些物理限制也很重要,例如,将过高的加速度或速度限制在某些实际值内。根据群体成员的内部复杂性,动画的最后阶段可能是将协商的速度矢量转换为一组特定的参数(鸟的翅膀位置、空间中平面模型的方向、腿部骨骼配置),用于控制群体的运动。图 16.28 (右)显示了实现群体的系统图。
Figure 16.29. After being emitted by a directional source, particles collide with an object and then are blown down by a local wind field once they clear the obstacle.
图 16.29.粒子由定向源发射后与物体发生碰撞,然后在越过障碍物后被局部风场吹落。
A much simpler, but still very useful, version of group control is implemented by particle systems (Reeves, 1983). The number of particles in a system is typically much larger than number of boids in a flock and can be in the tens or hundreds of thousands, or even more. Moreover, the exact number of particles can fluctuate during animation with new particles being born and some of the old ones destroyed at each step. Particles are typically completely independent from each other, ignoring one’s neighbors and interacting with the environment only by experiencing external forces and collisions with objects, not through collision avoidance as was the case for flocks. At each step during animation, the system first creates new particles with some initial parameters, terminates old ones, and then computes necessary forces and updates velocities and positions of the remaining particles according to Newton’s law.
一个更简单但仍然非常有用的组控制版本是通过以下方式实现的:粒子系统(Reeves,1983)。系统中的粒子数量通常比群体中的个体数量多得多,可以是数万、数十万,甚至更多。此外,在动画过程中,粒子的确切数量可能会波动,因为每一步都会产生新粒子,而一些旧粒子会被销毁。粒子通常彼此完全独立,忽略相邻粒子,并且仅通过受到外力和与物体的碰撞来与环境交互,而不是像群体那样通过避免碰撞来交互。在动画的每个步骤中,系统首先使用一些初始参数创建新粒子,终止旧粒子,然后根据牛顿定律计算必要的力并更新剩余粒子的速度和位置。
All parameters of a particle system (number of particles, particle life span, initial velocity, and location of a particle, etc.) are usually under the direct control of the animator. Prime applications of particle systems include modeling fireworks, explosions, spraying liquids, smoke and fire, or other fuzzy objects and phenomena with no sharp boundaries. To achieve a realistic appearance, it is important to introduce some randomness to all parameters, for example, having a random number of particles born (and destroyed) at each step with their velocities generated according to some distribution. In addition to setting appropriate initial parameters, controlling the motion of a particle system is commonly done by creating a specific force pattern in space–blowing a particle in a new direction once it reaches some specific location or adding a center of attraction, for example. One should remember that with all their advantages, simplicity of implementation and ease of control being the prime ones, particle systems typically do not provide the level of realism characteristic of true physics-based simulation of the same phenomena.
粒子系统的所有参数(粒子数量、粒子寿命、初始速度和粒子位置等)通常由动画师直接控制。粒子系统的主要应用包括模拟烟花、爆炸、喷洒液体、烟雾和火焰,或其他没有明显边界的模糊物体和现象。为了实现逼真的外观,重要的是为所有参数引入一些随机性,例如,在每个步骤中产生(和销毁)随机数量的粒子,并根据某种分布生成它们的速度。除了设置适当的初始参数外,控制粒子系统的运动通常是通过在空间中创建特定的力模式来实现的——例如,一旦粒子到达某个特定位置,就将其吹向新的方向或添加引力中心。应该记住,尽管粒子系统具有所有优点,但实现简单和易于控制是主要优点,它们通常无法提供真实物理模拟相同现象的真实感。
In this chapter we have concentrated on techniques used in 3D animation. There also exist a rich set of algorithms to help with 2D animation production and post-processing of images created by computer graphics rendering systems. These include techniques for cleaning up scanned-in artist drawings, feature extraction, automatic 2D in-betweening, colorization, image warping, enhancement and compositing, and many others.
本章我们集中讨论了 3D 动画中使用的技术。此外,还有一组丰富的算法可用于帮助制作 2D 动画和对计算机图形渲染系统创建的图像进行后期处理。这些算法包括清理扫描的艺术家绘图、特征提取、自动 2D 中间、着色、图像扭曲、增强和合成等技术。
One of the most significant developments in the area of computer animation has been the increasing power and availability of sophisticated animation systems. While different in their specific set of features, internal structure, details of user interface, and price, most such systems include extensive support not only for animation, but also for modeling and rendering, turning them into complete production platforms. It is also common to use these systems to create still images. For example, many images for figures in this section were produced using Maya software generously donated by Alias.
计算机动画领域最重要的发展之一是复杂动画系统的功能和可用性不断增强。尽管它们的具体功能集、内部结构、用户界面细节和价格各不相同,但大多数此类系统不仅包括对动画的广泛支持,还包括对建模和渲染的支持,从而将它们变成了完整的制作平台。使用这些系统创建静态图像也很常见。例如,本节中的许多人物图像都是使用 Alias 慷慨捐赠的 Maya 软件制作的。
Large-scale animation production is an extremely complex process which typically involves a combined effort by dozens of people with different backgrounds spread across many departments or even companies. To better coordinate this activity, a certain production pipeline is established which starts with a story and character sketches, proceeds to record necessary sound, build models, and rig characters for animation. Once actual animation commences, it is common to go back and revise the original designs, models, and rigs to fix any discovered motion and appearance problems. Setting up lighting and material properties is then necessary, after which it is possible to start rendering. In most sufficiently complex projects, extensive postprocessing and compositing stages bring together images from different sources and finalize the product.
大型动画制作是一个极其复杂的过程,通常需要来自多个部门甚至公司的数十名具有不同背景的人共同努力。为了更好地协调这项活动,建立了一定的生产流程,从故事和角色草图开始,然后录制必要的声音、构建模型并为动画角色装配。一旦实际动画开始,通常会回过头来修改原始设计、模型和装配,以修复任何发现的运动和外观问题。然后需要设置照明和材质属性,之后就可以开始渲染了。在大多数足够复杂的项目中,大量的后期处理和合成阶段会将来自不同来源的图像汇集在一起并最终完成产品。
We conclude this chapter by reminding the reader that in the field of computer animation, any technical sophistication is secondary to a good story, expressive characters, and other artistic factors, most of which are hard or simply impossible to quantify. It is safe to say that Snow White and her seven dwarfs will always share the screen with green ogres and donkeys, and most of the audience will be much more interested in the characters and the story rather than in which, if any, computers (and in what exact way) helped to create them.
我们在本章的最后提醒读者,在计算机动画领域,任何技术复杂性都是次要的,而好的故事、富有表现力的角色和其他艺术因素才是最重要的,其中大多数因素很难或根本无法量化。可以肯定地说,白雪公主和她的七个小矮人将永远与绿色妖怪和驴子一起出现在屏幕上,大多数观众对角色和故事更感兴趣,而不是对哪些计算机(以及计算机以何种方式)帮助创造了它们更感兴趣。
Peter Willemsen
Throughout most of this book, the focus is on the fundamentals that underly computer graphics rather than on any specifics relating to the APIs or hardware on which the algorithms may be implemented. This chapter takes a slightly different route and blends the details of using graphics hardware with some of the practical issues associated with programming that hardware. This chapter is designed to be an introductory guide to graphics hardware and could be used as the basis for a set of weekly labs that investigate graphics hardware.
本书的大部分内容都集中在计算机图形学的基础知识上,而不是与可以实现算法的 API 或硬件有关的任何细节。本章采用了略有不同的方式,将使用图形硬件的细节与与编程该硬件相关的一些实际问题融合在一起。本章旨在成为图形硬件的入门指南,可以用作一组研究图形硬件的每周实验室的基础。
Graphics hardware describes the hardware components necessary to quickly render 3D objects as pixels on your computer’s screen using specialized rasterization-based (and in some cases, ray-tracer–based) hardware architectures. The use of the term graphics hardware is meant to elicit a sense of the physical components necessary for performing a range of graphics computations. In other words, the hardware is the set of chipsets, transistors, buses, processors, and computing cores found on current video cards. As you will learn in this chapter, and eventually experience yourself, current graphics hardware is very good at processing descriptions of 3D objects and transforming those representations into the colored pixels that fill your monitor.
图形硬件描述了使用专门的基于光栅化(在某些情况下,基于光线跟踪器)的硬件架构将 3D 对象快速渲染为计算机屏幕上的像素所需的硬件组件。使用图形硬件这一术语是为了引出执行一系列图形计算所需的物理组件的感觉。换句话说,硬件是当前视频卡上的芯片组、晶体管、总线、处理器和计算核心的集合。正如您将在本章中学习并最终亲身体验到的那样,当前的图形硬件非常擅长处理 3D 对象的描述并将这些表示转换为填满显示器的彩色像素。
Real-Time Graphics: By real-time graphics, we generally mean that the graphics-related computations are being carried out fast enough that the results can be viewed immediately. Being able to conduct operations at 60Hz or higher is considered real time. Once the time to refresh the display (frame rate) drops below 15Hz, the speed is considered more interactive than it is real-time, but this distinction is not critical. Because the computations need to be fast, the equations used to render the graphics are often approximations to what could be done if more time were available.
实时图形:实时图形通常是指图形相关计算的执行速度足够快,可以立即查看结果。能够以 60Hz 或更高的频率进行操作被认为是实时的。一旦刷新显示的时间(帧速率)低于 15Hz,速度就被认为比实时更具交互性,但这种区别并不重要。由于计算需要快速进行,因此用于渲染图形的方程式通常是如果有更多时间可以完成的近似值。
Graphics hardware has certainly changed very rapidly over the last decade. Newer graphics hardware provides more parallel processing capabilities, as well as better support for specialized rendering. One explanation for the fast pace is the video game industry and its economic momentum. Essentially what this means is that each new graphics card provides better performance and processing capabilities. As a result, video games appear more visually realistic. The processors on graphics hardware, often called GPUs, or Graphics Processing Units, are highly parallel and afford thousands of concurrent threads of execution. The hardware is designed for throughput which allows larger numbers of pixels and vertices to be processed in shorter amounts of time. All of this parallelism is good for graphics algorithms, but other work has benefited from the parallel hardware. In addition to video games, GPUs are used to accelerate physics computations, develop real-time ray tracing codes, solve Navier-Stokes related equations for fluid flow simulations, and develop faster codes for understanding the climate (Purcell, Buck, Mark, & Hanrahan, 2002; S. G. Parker et al., 2010; Harris, 2004). Several APIs and SDKs have been developed that afford more direct general purpose computation, such as OpenCL and NVIDIA’s CUDA. Hardware accelerated ray tracing APIs also exist to accelerate ray-object intersection (S. G. Parker et al., 2010). Similarly, the standard APIs that are used to program the graphics components of video games, such as OpenGL and DirectX, also allow mechanisms to leverage the graphics hardware’s parallel capabilities. Many of these APIs change as new hardware is developed to support more sophisticated computations.
在过去十年中,图形硬件确实发生了非常迅速的变化。较新的图形硬件提供了更多的并行处理能力,以及对专业渲染的更好支持。这种快速发展的一个解释是视频游戏行业及其经济发展势头。本质上,这意味着每张新显卡都提供了更好的性能和处理能力。因此,视频游戏看起来更加逼真。图形硬件上的处理器通常称为 GPU 或图形处理单元,它们高度并行,可支持数千个并发执行线程。该硬件专为吞吐量而设计,允许在更短的时间内处理大量像素和顶点。所有这些并行性都有利于图形算法,但其他工作也受益于并行硬件。除了视频游戏之外,GPU 还用于加速物理计算、开发实时光线追踪代码、求解流体流动模拟的 Navier-Stokes 相关方程以及开发更快的代码以了解气候(Purcell、Buck、Mark 和 Hanrahan,2002 年;SG Parker 等,2010 年;Harris,2004 年)。已经开发了多个 API 和 SDK,以提供更直接的通用计算,例如 OpenCL 和 NVIDIA 的 CUDA。还存在硬件加速光线追踪 API,以加速光线与物体的交叉(SG Parker 等,2010 年)。同样,用于编程视频游戏图形组件的标准 API(例如 OpenGL 和 DirectX)也允许机制利用图形硬件的并行功能。随着新硬件的开发以支持更复杂的计算,其中许多 API 都会发生变化。
Fragment: Fragment is a term that describes the information associated with a pixel prior to being processedinthe final stages of the graphics pipeline. This definition includes much of the data that might be used to calculate the color of the pixel, such as the pixel’s scene depth, texture coordinates, or stencil information.
片段:片段是一个术语,用于描述在图形管道的最后阶段进行处理之前与像素相关的信息。此定义包括可用于计算像素颜色的许多数据,例如像素的场景深度、纹理坐标或模板信息。
Graphics hardware is programmable. As a developer, you have control over much of the computations associated with processing geometry, vertices, and the fragments that eventually become pixels. Recent hardware changes as well as ongoing updates to the APIs, such as OpenGL or DirectX, support a completely programmable pipeline. These changes afford developers creative license to exploit the computation available on GPUs. Prior to this, fixed-function rasterization pipelines forced the computation to a specific style of vertex transformations, lighting, and fragment processing. The fixed functionality of the pipeline ensured that basic coloring, lighting, and texturing could occur very quickly. Whether it is a programmable interface, or fixed-function computation, the basic computations of the rasterization pipeline are similar, and follow the illustration in Figure 17.1. In the rasterization pipeline, vertices are transformed from local space to global space, and eventually into screen coordinates, after being transformed by the viewing and projection transformation matrices. The set of screen coordinates associated with a geometry’s vertices are rasterized into fragments. The final stages of the pipeline process the fragments into pixels and can apply per-fragment texture lookups, lighting, and any necessary blending. In general, the pipeline lends itself to parallel execution and the GPU cores can be used to process both vertices and fragments concurrently. Additional details about the rasterization pipeline can be found in Chapter 8.
图形硬件是可编程的。作为开发人员,您可以控制与处理几何图形、顶点和最终变成像素的片段相关的大部分计算。最近的硬件变化以及 API(如 OpenGL 或 DirectX)的持续更新支持完全可编程的管道。这些变化为开发人员提供了创造性的许可,以利用 GPU 上可用的计算。在此之前,固定功能光栅化管道将计算强制为特定样式的顶点变换、照明和片段处理。管道的固定功能确保可以非常快速地进行基本着色、照明和纹理处理。无论是可编程接口还是固定功能计算,光栅化管道的基本计算都是相似的,并遵循图 17.1中的说明。在光栅化管道中,顶点在经过查看和投影变换矩阵的变换后,从局部空间变换到全局空间,最终变换到屏幕坐标。与几何图形的顶点相关的一组屏幕坐标被光栅化为片段。管道的最后阶段将片段处理成像素,并可以应用每个片段的纹理查找、照明和任何必要的混合。一般来说,管道适合并行执行,GPU 核心可用于同时处理顶点和片段。有关光栅化管道的更多详细信息,请参阅第 8 章。
Figure 17.1. The basic graphics hardware pipeline consists of stages that transform 3D data into 2D screen objects ready for rasterizing and coloring by the pixel processing stages.
图 17.1。基本图形硬件管道由将 3D 数据转换为 2D 屏幕对象(可供像素处理阶段进行光栅化和着色)的阶段组成。
Host: In a graphics hardware program, the host refers to the CPU components of the application.
主机:在图形硬件程序中,主机是指应用程序的 CPU 组件。
Device: The GPU side of the graphics application, including the data and computation that are stored and executed on the GPU.
设备:图形应用程序的 GPU 端,包括在 GPU 上存储和执行的数据和计算。
When using graphics hardware, it is convenient to distinquish between the CPU and the GPU as separate computational entities. In this context, the term host is used to refer to the CPU including the threads and memory available to it. The term device is used to refer to the GPU, or the graphics processing units, and the threads and memory associated with it. This makes some sense because most graphics hardware is comprised of external hardware that is connected to the machine via the PCI bus. The hardware may also be soldered to the machine as a separate chipset. In this sense, the graphics hardware represents a specialized co-processor since both the CPU (and its cores) can be programmed, as can the GPU and its cores. All programs that utilize graphics hardware must first establish a mapping between the CPU and the GPU memory. This is a rather low-level detail that is necessary so that the graphics hardware driver residing within the operating system can interface between the hardware and the operating system and windowing system software. Recall that because the host (CPU) and the device (GPU) are separate, data must be communicated between the two systems. More formally, this mapping between the operating system, the hardware driver, the hardware, and the windowing system is known as the graphics context.The context is usually established through API calls to the windowing system. Details about establishing a context is outside the scope of this chapter, but many windowing system development libraries have ways to query the graphics hardware for various capabilities and establish the graphics context based on those requirements. Because setting up the context is windowing system dependent, it also means that such code is not likely to be cross-platform code. However, in practice, or at least when starting out, it is very unlikely that such low-level context setup code will be required since many higher level APIs exist to help people develop portable interactive applications.
使用图形硬件时,将 CPU 和 GPU 区分为单独的计算实体很方便。在这种情况下,术语“主机”用于指代 CPU,包括其可用的线程和内存。术语“设备”用于指代 GPU 或图形处理单元,以及与其关联的线程和内存。这有一定道理,因为大多数图形硬件都由通过 PCI 总线连接到机器的外部硬件组成。硬件也可以作为单独的芯片组焊接到机器上。从这个意义上讲,图形硬件代表了一个专门的协处理器,因为 CPU(及其内核)都可以编程,GPU 及其内核也可以编程。所有使用图形硬件的程序都必须首先在 CPU 和 GPU 内存之间建立映射。这是一个相当低级的细节,但对于驻留在操作系统中的图形硬件驱动程序来说,它是必要的,以便硬件和操作系统以及窗口系统软件之间进行交互。回想一下,由于主机(CPU)和设备(GPU)是分开的,因此必须在两个系统之间传递数据。更正式地说,操作系统、硬件驱动程序、硬件和窗口系统之间的这种映射称为图形上下文。上下文通常通过对窗口系统的 API 调用来建立。关于建立上下文的详细信息超出了本章的范围,但许多窗口系统开发库都有方法可以查询图形硬件的各种功能并根据这些要求建立图形上下文。 由于设置上下文依赖于窗口系统,因此这也意味着此类代码不太可能是跨平台代码。然而,在实践中,或者至少在开始时,不太可能需要这种低级上下文设置代码,因为存在许多高级 API 来帮助人们开发可移植的交互式应用程序。
Many of the frameworks for developing interactive applications support querying input devices such as the keyboard or mouse. Some frameworks provide access to the network, audio system, and other higher level system resources. In this regard, many of these APIs are the preferred way to develop graphics, and even game applications.
许多用于开发交互式应用程序的框架都支持查询键盘或鼠标等输入设备。一些框架提供对网络、音频系统和其他更高级别系统资源的访问。在这方面,许多此类 API 是开发图形甚至游戏应用程序的首选方式。
Cross-platform hardware acceleration is often achieved with the OpenGL API. OpenGL is an open industry standard graphics API that supports hardware acceleration on many types of graphics hardware. OpenGL represents one of the most common APIs for programming graphics hardware along with APIs such as DirectX. While OpenGL is available on many operating systems and hardware architectures, DirectX is specific to Microsoft-based systems. For the purposes of this chapter, hardware programming concepts and examples will be presented with OpenGL.
跨平台硬件加速通常通过 OpenGL API 实现。OpenGL 是一种开放的行业标准图形 API,支持多种类型的图形硬件上的硬件加速。OpenGL 是用于编程图形硬件的最常见 API 之一,与 DirectX 等 API 一样。虽然 OpenGL 可用于许多操作系统和硬件架构,但 DirectX 特定于基于 Microsoft 的系统。出于本章的目的,将使用 OpenGL 介绍硬件编程概念和示例。
When you program with the OpenGL API, you are writing code for at least two processors: the CPU(s) and the GPU(s). OpenGL is implemented in a C-style API and all functions are prefixed with “gl” to indicate their inclusion with OpenGL. OpenGL function calls change the state of the graphics hardware and can be used to declare and define geometry, load vertex and fragment shaders, and determine how computation will occur as data passes through the hardware.
使用 OpenGL API 进行编程时,您至少要为两个处理器编写代码:CPU 和 GPU。OpenGL 以 C 样式 API 实现,所有函数都以“gl”为前缀,以表明它们包含在 OpenGL 中。OpenGL 函数调用会改变图形硬件的状态,可用于声明和定义几何图形、加载顶点和片段着色器,以及确定数据通过硬件时如何进行计算。
The variant of OpenGL that this chapter presents is the OpenGL 3.3 Core Profile version. While not the most recent version of OpenGL, the 3.3 version of OpenGL is in line with the future direction of OpenGL programming. These versions are focused on improving efficiency while also fully placing the programming of the pipeline within the hands of the developer. Many of the function calls present in earlier versions of OpenGL are not present in these newer APIs. For instance, immediate mode rendering is deprecated. Immediate mode rendering was used to send data from the CPU memory to the graphics card memory as needed each frame and was often very inefficient, especially for larger models and complex scenes. The current API focuses on storing data on the graphics card before it is needed and instancing it at render time. As another example, OpenGL’s matrix stacks have been deprecated as well, leaving the developer to use third-party matrix libraries (such as GLM) or their own classes to create the necessary matrices for viewing, projection, and transformation, as presented in Chapter 7. As a result, OpenGL’s shader language (GLSL) has taken on larger roles as well, performing the necessary matrix tranformations along with lighting and shading within the shaders. Because the fixed-function pipeline which performed per-vertex transformation and lighting is no longer present, programmers must develop all shaders themselves. The shading examples presented in this chapter will utilize the GLSL 3.3 Core Profile version shader specification. Future readers of this chapter will want to explore the current OpenGL and OpenGL Shading Language specifications for additional details on what these APIs and languages can support.
本章介绍的 OpenGL 变体是 OpenGL 3.3 Core Profile 版本。虽然 3.3 版 OpenGL 不是最新版本,但它符合 OpenGL 编程的未来发展方向。这些版本专注于提高效率,同时将管道编程完全交到开发人员手中。OpenGL 早期版本中的许多函数调用在这些较新的 API 中都不存在。例如,立即模式渲染已被弃用。立即模式渲染用于根据需要每帧将数据从 CPU 内存发送到显卡内存,这通常非常低效,尤其是对于较大的模型和复杂的场景。当前 API 专注于在需要之前将数据存储在显卡上,并在渲染时实例化它。另一个例子是,OpenGL 的矩阵堆栈也已被弃用,开发人员只能使用第三方矩阵库(如 GLM)或他们自己的类来创建查看、投影和变换所需的矩阵,如第 7 章所述。因此,OpenGL 的着色器语言 (GLSL) 也承担了更重要的角色,在着色器中执行必要的矩阵变换以及照明和阴影。由于执行每个顶点变换和照明的固定功能管道不再存在,程序员必须自己开发所有着色器。本章中介绍的着色示例将使用 GLSL 3.3 Core Profile 版本着色器规范。本章的未来读者将希望探索当前的 OpenGL 和 OpenGL 着色语言规范,以了解这些 API 和语言可以支持哪些内容的更多详细信息。
Three concepts will help to understand contemporary graphics hardware programming. The first is the notion of a data buffer, which is quite simply, a linear allocation of memory on the device that can store various data on which the GPUs will operate. The second is the idea that the graphics card maintains a computational state that determines how computations associated with scene data and shaders will occur on the graphics hardware. Moreover, state can be communicated from the host to the device and even within the device between shaders. Shaders represent the mechanism by which computation occurs on the GPU related to per-vertex or per-fragment processing. This chapter will focus on vertex and fragment shaders, but specialized geometry and compute shaders also exist in the current versions of OpenGL. Shaders play a very important role in how modern graphics hardware functions.
三个概念将有助于理解当代图形硬件编程。第一个是数据缓冲区的概念,它很简单,就是设备上内存的线性分配,可以存储 GPU 将在其上运行的各种数据。第二个概念是,显卡维护一种计算状态,该状态决定了与场景数据和着色器相关的计算将如何在图形硬件上发生。此外,状态可以从主机传达到设备,甚至可以在设备内部的着色器之间传达。着色器表示在 GPU 上发生与每个顶点或每个片段处理相关的计算的机制。本章将重点介绍顶点和片段着色器,但当前版本的 OpenGL 中也存在专门的几何和计算着色器。着色器在现代图形硬件的运行中起着非常重要的作用。
Buffers are the primary structure to store data on graphics hardware. They represent the graphics hardware’s internal memory associated with everything from geometry, textures, and image plane data. With regard to the rasterization pipeline described in Chapter 8, the computations associated with hardware-accelerated rasterization read and write the various buffers on the GPU. From a programming standpoint, an application must initialize the buffers on the GPU that are needed for the application. This amounts to a host to device copy operation. At the end of various stages of execution, device to host copies can be performed as well to pull data from the GPU to the CPU memory. Additionally, mechanisms do exist in OpenGL’s API that allow device memory to be mapped into host memory so that an application program can write directly to the buffers on the graphics card.
缓冲区是图形硬件上存储数据的主要结构。它们代表与几何、纹理和图像平面数据等所有内容相关的图形硬件内部存储器。关于第 8 章中描述的光栅化管道,与硬件加速光栅化相关的计算会读取和写入 GPU 上的各种缓冲区。从编程的角度来看,应用程序必须初始化 GPU 上应用程序所需的缓冲区。这相当于主机到设备的复制操作。在各个执行阶段结束时,也可以执行设备到主机的复制,以将数据从 GPU 拉到 CPU 内存。此外,OpenGL 的 API 中确实存在允许将设备内存映射到主机内存的机制,以便应用程序可以直接写入显卡上的缓冲区。
In the graphics pipeline, the final set of pixel colors can be linked to the display, or they may be written to disk as a PNG image. The data associated with these pixels is generally a 2D array of color values. The data is inherently 2D, but it is efficiently represented on the GPU as a 1D linear array of memory. This array implements the display buffer, which eventually gets mapped to the window. Rendering images involves communicating the changes to the display buffer on the graphics hardware through the graphics API. At the end of the rasterization pipeline, the fragment processing and blending stages write data to the output display buffer memory. Meanwhile, the windowing system reads the contents of the display buffer to produce the raster images on the monitor’s window.
在图形管道中,最终的一组像素颜色可以链接到显示器,也可以作为 PNG 图像写入磁盘。与这些像素关联的数据通常是颜色值的 2D 数组。数据本质上是 2D 的,但它在 GPU 上可以有效地表示为内存的 1D 线性数组。此数组实现显示缓冲区,最终映射到窗口。渲染图像涉及通过图形 API 将更改传达给图形硬件上的显示缓冲区。在光栅化管道的末尾,片段处理和混合阶段将数据写入输出显示缓冲区内存。同时,窗口系统读取显示缓冲区的内容以在显示器窗口上生成光栅图像。
Most applications prefer a double-buffered display state. What this means is that there are two buffers associated with a graphics window: the front buffer and the back buffer. The purpose of the double-buffered system is that the application can communicate changes to the back buffer (and thus, write changes to that buffer) while the front-buffer memory is used to drive the pixel colors on the window.
大多数应用程序都喜欢双缓冲显示状态。这意味着图形窗口有两个缓冲区:前缓冲区和后缓冲区。双缓冲系统的目的是,应用程序可以将更改传达给后缓冲区(从而将更改写入该缓冲区),而前缓冲区内存用于驱动窗口上的像素颜色。
At the end of the rendering loop, the buffers are swapped through a pointer exchange. The front-buffer pointer points to the back buffer and the back-buffer pointer is then assigned to the previous front buffer. In this way, the windowing system will refresh the content of the window with the most up-to-date buffer. If the buffer pointer swap is synchronized with the windowing system’s refresh of the entire display, the rendering will appear seamless. Otherwise, users may observe a tearing of the geometry on the actual display as changes to the scene’s geometry and fragments are processed (and thus written to the display buffer) faster than the screen is refreshed.
在渲染循环结束时,通过指针交换来交换缓冲区。前缓冲区指针指向后缓冲区,然后将后缓冲区指针分配给前一个前缓冲区。这样,窗口系统将使用最新的缓冲区刷新窗口的内容。如果缓冲区指针交换与窗口系统对整个显示器的刷新同步,则渲染将显得无缝。否则,用户可能会在实际显示器上观察到几何图形的撕裂,因为对场景的几何图形和片段的更改的处理(从而写入显示缓冲区)比屏幕刷新的速度更快。
When the display is considered a memory buffer, one of the simplest operations on the display is essentially a memory setting (or copying) operation that zeros-out, or clears the memory to a default state. For a graphics program, this likely means clearing the background of the window to a specific color. To clear the background color (to black) in an OpenGL application, the following code can be used:
当显示器被视为内存缓冲区时,显示器上最简单的操作之一本质上是内存设置(或复制)操作,该操作将内存清零或清除为默认状态。对于图形程序,这可能意味着将窗口的背景清除为特定颜色。要在 OpenGL 应用程序中清除背景颜色(为黑色),可以使用以下代码:
glClearColor( 0.0f, 0.0f, 0.0f, 1.0f );
glClear( GL_COLOR_BUFFER_BIT );
The first three arguments for the glClearColor function represent the red, green,and blue color components, specified within the range [0, 1]. The fourth argument represents opacity, or alpha value, ranging from 0.0 being completely transparent to 1.0 being completely opaque. The alpha value is used to determine transparency through various fragment blending operations in the final stages of the pipeline.
glClearColor函数的前三个参数表示红色、绿色和蓝色分量,范围在 [0 1] 内。第四个参数表示不透明度或alpha值,范围从 00(表示完全透明)到 10(表示完全不透明)。alpha值用于在管道的最后阶段通过各种片段混合操作确定透明度。
This operation only clears the color buffer. In addition to the color buffer, specified by GL_COLOR_BUFFER_BIT, being cleared to black in this case, graphics hardware also uses a depth buffer to represent the distance that fragments are relative to the camera (you may recall the discussion of the z-buffer algorithm in Chapter 8). Clearing the depth buffer is necessary to ensure operation of the z-buffer algorithm and allow correct hidden surface removal to occur. Clearing the depth buffer can be achieved by or’ing two bit field values together, as follows:
此操作仅清除颜色缓冲区。除了由GL_COLOR_BUFFER_BIT指定的颜色缓冲区(在本例中被清除为黑色)之外,图形硬件还使用深度缓冲区来表示片段相对于相机的距离(您可能还记得第 8 章中对 z 缓冲区算法的讨论)。清除深度缓冲区是必要的,以确保 z 缓冲区算法的运行并允许正确的隐藏表面移除。清除深度缓冲区可以通过将两个位字段值或在一起来实现,如下所示:
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
Within a basic interactive graphics application, this step of clearing is normally the first operation performed before any geometry or fragments are processed.
在基本的交互式图形应用程序中,清除步骤通常是在处理任何几何图形或片段之前执行的第一个操作。
By illustrating the buffer-clearing operation for the display’s color and depth buffers, the idea of graphics hardware state is also introduced. The glClearColor function sets the default color values that are written to all the pixels within the color buffer when glClear is called. The clear call initializes the color component of the display buffer and can also reset the values of the depth buffer. If the clear color does not change within an application, the clear color need only be set once, and often this is done in the initialization of an OpenGL program. Each time that glClear is called it uses the previously set state of the clear color.
通过说明显示器颜色和深度缓冲区的缓冲区清除操作,还介绍了图形硬件状态的概念。glClearColor函数设置在调用glClear时写入颜色缓冲区内所有像素的默认颜色值。清除调用初始化显示缓冲区的颜色组件,还可以重置深度缓冲区的值。如果清除颜色在应用程序内没有变化,则只需设置一次清除颜色,通常在 OpenGL 程序的初始化中完成。每次调用glClear时,它都会使用先前设置的清除颜色状态。
Note also that the z-buffer algorithm state can be enabled and disabled as needed. The z-buffer algorithm is also known in OpenGL as the depth test. By enabling it, a fragment’s depth value will be compared to the depth value currently stored in the depth buffer prior to writing any fragment colors to the color buffer. Sometimes, the depth test is not necessary and could potentially slow down an application. Disabling the depth test will prevent the z-buffer computation and change the behavior of the executable. Enabling the z-buffer test with OpenGL is done as follows:
还要注意,可以根据需要启用和禁用 z 缓冲区算法状态。z 缓冲区算法在 OpenGL 中也称为深度测试。通过启用它,在将任何片段颜色写入颜色缓冲区之前,片段的深度值将与深度缓冲区中当前存储的深度值进行比较。有时,深度测试不是必需的,可能会降低应用程序的速度。禁用深度测试将阻止 z 缓冲区计算并改变可执行文件的行为。使用 OpenGL 启用 z 缓冲区测试的操作如下:
glEnable(GL_DEPTH_TEST);
glDepthFunc(GL_LESS);
The glEnable call turns on the depth test while the glDepthFunc call sets the mechanism for how the depth comparison is performed. In this case, the depth function is set to its default value of GL_LESS to show that other state variables exist and can be modified. The converse of the glEnable calls are glDisable calls.
glEnable调用打开深度测试,而glDepthFunc调用设置执行深度比较的机制。在这种情况下,深度函数设置为其默认值GL_LESS ,以显示存在其他状态变量并且可以修改。glEnable调用的逆向是glDisable调用。
The idea of state in OpenGL mimics the use of static variables in object-oriented classes. As needed, programmers enable, disable, and/or set the state of OpenGL variables that reside on the graphics card. These state then affect any succeeding computations on the hardware. In general, efficient OpenGL programs attempt to minimize state changes, enabling states that are needed, while disabling states that are not required for rendering.
OpenGL 中的状态概念模仿了面向对象类中静态变量的使用。根据需要,程序员可以启用、禁用和/或设置驻留在显卡上的 OpenGL 变量的状态。然后,这些状态会影响硬件上的任何后续计算。通常,高效的 OpenGL 程序会尝试尽量减少状态更改,启用所需的状态,同时禁用渲染不需要的状态。
A simple and basic OpenGL application has, at its heart, a display loop that is called either as fast as possible, or at a rate that coincides with the refresh rate of the monitor or display device. The example loop below uses the GLfW library, which supports OpenGL coding across multiple platforms.
简单而基本的 OpenGL 应用程序的核心是显示循环,该循环要么尽可能快地调用,要么以与显示器或显示设备的刷新率一致的速率调用。下面的示例循环使用 GLfW 库,该库支持跨多个平台的 OpenGL 编码。
while (!glfwWindowShouldClose(window)) {
{
// OpenGL code is called here,
// each time this loop is executed.
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// Swap front and back buffers
glfwSwapBuffers(window);
// Poll for events
glfwPollEvents();
if (glfwGetKey( window, GLFW_KEY_ESCAPE ) == GLFW_PRESS)
glfwSetWindowShouldClose(window, 1);
}
The loop is tightly constrained to operate only while the window is open. This example loop resets the color buffer values and also resets the z-buffer depth values in the graphics hardware memory based on previously set (or default) values. Input devices, such as keyboards, mouse, network, or some other interaction mechanism are processed at the end of the loop to change the state of data structures associated with the program. The call to glfwSwapBuffers synchronizes the graphics context with the display refresh, performing the pointer swap between the front and back buffers so that the updated graphics state is displayed on the user’s screen. The call to swap the buffers occurs after all graphics calls have been issued.
循环严格限制为仅在窗口打开时运行。此示例循环重置颜色缓冲区值,并根据先前设置的(或默认)值重置图形硬件内存中的 z 缓冲区深度值。输入设备(如键盘、鼠标、网络或其他交互机制)在循环结束时进行处理,以更改与程序相关的数据结构的状态。对glfwSwapBuffers的调用将图形上下文与显示刷新同步,在前后缓冲区之间执行指针交换,以便在用户的屏幕上显示更新的图形状态。交换缓冲区的调用发生在所有图形调用都已发出之后。
While conceptually separate, the depth and color buffers are often collectively called the framebuffer. By clearing the contents of the framebuffer, the application can proceed with additional OpenGL calls to push geometry and fragments through the graphics pipeline. The framebuffer is directly related to the size of the window that has been opened to contain the graphics context. The window, or viewport, dimensions are needed by OpenGL to construct the Mvp matrix (from Chapter 7) within the hardware. This is accomplished through the following code, demonstrated again with the GLfW toolkit, which provides functions for querying the requested window (or framebuffer) dimensions:
尽管从概念上讲,深度缓冲区和颜色缓冲区是分开的,但它们通常统称为帧缓冲区。通过清除帧缓冲区的内容,应用程序可以继续进行其他 OpenGL 调用,以将几何图形和片段推送到图形管道。帧缓冲区与已打开以包含图形上下文的窗口的大小直接相关。OpenGL 需要窗口或视口尺寸来在硬件内构建M vp矩阵(来自第 7 章)。这是通过以下代码完成的,再次使用 GLfW 工具包演示,它提供了查询请求的窗口(或帧缓冲区)尺寸的函数:
int nx, ny;
glfwGetFramebufferSize(window, &nx, &ny);
glViewport(0, 0, nx, ny);
In this example, glViewport sets the OpenGL state for the window dimension using nx and ny for the width and height of the window and the viewport being specified to start at the origin.
在此示例中, glViewport使用nx和ny表示窗口的宽度和高度,并指定视口从原点开始,从而设置窗口尺寸的 OpenGL 状态。
Technically, OpenGL writes to the framebuffer memory as a result of operations that rasterize geometry, and process fragments. These writes happen before the pixels are displayed on the user’s monitor.
从技术上讲,OpenGL 写入帧缓冲区内存是光栅化几何体和处理片段操作的结果。这些写入操作发生在像素显示在用户显示器上之前。
Similar to the idea of a display buffer, geometry is also specified using arrays to store vertex data and other vertex attributes, such as vertex colors, normals, or texture coordinates needed for shading. The concept of buffers will be used to allocate storage on the graphics hardware, transferring data from the host to the device.
与显示缓冲区的概念类似,几何图形也使用数组来指定,以存储顶点数据和其他顶点属性,例如着色所需的顶点颜色、法线或纹理坐标。缓冲区的概念将用于在图形硬件上分配存储空间,将数据从主机传输到设备。
One of the challenges with graphics hardware programming is the management of the 3D data and its transfer to and from the memory of the graphics hardware. Most graphics hardware work with specific sets of geometric primitives. The different primitive types leverage primitive complexity for processing speed on the graphics hardware. Simpler primitives can sometimes be processed very fast. The caveat is that the primitive types need to be general purpose so as to model a wide range of geometry from very simple to very complex. On typical graphics hardware, the primitive types are limited to one or more of the following:
图形硬件编程的挑战之一是 3D 数据的管理及其与图形硬件内存之间的传输。大多数图形硬件都使用特定的几何图元集。不同的图元类型利用图元复杂性来提高图形硬件的处理速度。有时可以非常快速地处理较简单的图元。需要注意的是,图元类型需要具有通用性,以便对从非常简单到非常复杂的各种几何图形进行建模。在典型的图形硬件上,图元类型仅限于以下一种或多种:
Primitives: The three primitives (points, lines, triangles, and quads) are really the only primitives available! Even when creating spline-based surfaces, such as NURBS, the surfaces are tessellated into triangle primitives by the graphics hardware.
基元:三种基元(点、线、三角形和四边形)实际上是唯一可用的基元!即使在创建基于样条线的曲面(如 NURBS)时,图形硬件也会将曲面细分为三角形基元。
Point Rendering: Point and line primitives may initially appear to be limited in use, but researchers have used points to render very complex geometry (Rusinkiewicz & Levoy, 2000; Dachsbacher, Vogelgsang, & Stamminger, 2003).
点渲染:点和线图元最初可能看起来用途有限,但研究人员已经使用点来渲染非常复杂的几何图形(Rusinkiewicz & Levoy,2000;Dachsbacher,Vogelgsang & Stamminger,2003)。
points—single vertices used to represent points or particle systems;
点——用于表示点或粒子系统的单个顶点;
lines—pairs of vertices used to represent lines, silhouettes, or edge-highlighting;
线——用来表示线条、轮廓或边缘突出显示的顶点对;
triangles—triangles, triangle strips, indexed triangles, indexed triangle strips, quadrilaterals, or triangle meshes approximating geometric surfaces.
三角形— 三角形、三角形条带、索引三角形、索引三角形条带、四边形或近似几何表面的三角形网格。
These three primitive types form the basic building blocks for most geometry that can be defined. An example of a triangle mesh rendered with OpenGL is shown in Figure 17.2.
这三种图元类型构成了大多数可定义的几何图形的基本构建块。图 17.2显示了使用 OpenGL 渲染的三角形网格的示例。
Figure 17.2. How your geometry is organized will affect the performance of your application. This wireframe depiction of the Little Cottonwood Canyon terrain dataset shows tens of thousands of triangles organized as a triangle mesh running at real-time rates. The image is rendered using the VTerrain Project terrain system courtesy of Ben Discoe.
图 17.2。几何图形的组织方式将影响应用程序的性能。这个 Little Cottonwood Canyon 地形数据集的线框描绘显示了数以万计的三角形,它们以实时速率运行,并组织成三角形网格。该图像是使用 Ben Discoe 提供的 VTerrain Project 地形系统渲染的。
Modern versions of OpenGL require that shaders be used to process vertices and fragments. As such, no primitives can be rendered without at least one vertex shader to process the incoming primitive vertices and another shader to process the rasterized fragments. Advanced shader types exist within OpenGL and the OpenGL Shading Language: geometry shaders and compute shaders. Geometry shaders are designed to process primitives, potentially creating additional primitives, and can support geometric instancing operations. Compute shaders are designed for performing general computation on the GPU, and can be linked into the set of shaders necessary for a specific application. For more information on geometry and compute shaders, the reader is referred the OpenGL specification documents and other resources.
OpenGL 的现代版本要求使用着色器来处理顶点和片段。因此,如果没有至少一个顶点着色器来处理传入的图元顶点,以及另一个着色器来处理光栅化片段,则无法渲染图元。OpenGL 和 OpenGL 着色语言中存在高级着色器类型:几何着色器和计算着色器。几何着色器旨在处理图元,可能创建其他图元,并可支持几何实例化操作。计算着色器旨在在 GPU 上执行一般计算,可链接到特定应用程序所需的着色器集。有关几何和计算着色器的更多信息,读者可以参考 OpenGL 规范文档和其他资源。
Vertex shaders provide control over how vertices are transformed and often help prepare data for use in fragment shaders. In addition to standard transformations and potential per-vertex lighting operations, vertex shaders could be used to perform general computation on the GPU. For instance, if the vertices represent particles and the particle motion can be (simply) modeled within the vertex shader computations, the CPU can mostly be removed from performing those computations. The ability to perform computations on the vertices already stored in the graphics hardware memory is a potential performance gain. While this approach is useful in some situations, advanced general computation may be more appropriately coded with compute shaders.
顶点着色器控制顶点的变换方式,通常有助于准备用于片段着色器的数据。除了标准变换和潜在的每个顶点的照明操作之外,顶点着色器还可用于在 GPU 上执行一般计算。例如,如果顶点表示粒子,并且粒子运动可以在顶点着色器计算中(简单地)建模,则 CPU 基本上可以不再执行这些计算。对已存储在图形硬件内存中的顶点执行计算的能力是一种潜在的性能提升。虽然这种方法在某些情况下很有用,但高级一般计算可能更适合用计算着色器进行编码。
In Chapter 7, the viewport matrix Mvp was introduced. It transforms the canonical view volume coordinates to screen coordinates. Within the canonical view volume, coordinates exist in the range of [–1, 1]. Anything outside of this range is clipped. If we make an initial assumption that the geometry exists within this range and the z-value is ignored, we can create a very simple vertex shader. This vertex shader passes the vertex positions through to the rasterization stage, where the final viewport transformation will occur. Note that because of this simplification, there are no projection, viewing, or model transforms that will be applied to the incoming vertices. This is initially cumbersome for creating anything except very simple scenes, but will help introduce the concepts of shaders and allow you to render an initial triangle to the screen. The passthrough vertex shader follows:
在第 7 章中,我们介绍了视口矩阵M vp 。它将标准视口坐标转换为屏幕坐标。在标准视口中,坐标的范围为 [-1, 1]。超出此范围的坐标将被裁剪。如果我们最初假设几何体存在于此范围内,并且忽略 z 值,则可以创建一个非常简单的顶点着色器。此顶点着色器将顶点位置传递到光栅化阶段,最终的视口变换将在此发生。请注意,由于这种简化,不会对传入的顶点应用任何投影、查看或模型变换。除了非常简单的场景外,这对于创建任何东西来说最初都很麻烦,但有助于引入着色器的概念,并允许您将初始三角形渲染到屏幕上。直通顶点着色器如下:
#version 330 core
layout(location=0) in vec3 in_Position;
void main(void)
{
gl_Position = vec4(in_Position, 1.0);
}
This vertex shader does only one thing. It passes the incoming vertex position out as the gl_Position that OpenGL uses to rasterize fragments. Note that gl_Position is a built-in, reserved variable that signifies one of the key outputs required from a vertex shader. Also note the version string in the first line. In this case, the string instructs the GLSL compiler that version 3.3 of the GLSL Core profile is to be used to compile the shading language.
这个顶点着色器只做一件事。它将传入的顶点位置作为 OpenGL 用于光栅化片段的gl_Position传递出去。请注意, gl_Position是一个内置的保留变量,表示顶点着色器所需的关键输出之一。还请注意第一行中的版本字符串。在本例中,该字符串指示 GLSL 编译器使用 GLSL Core 配置文件的 3.3 版本来编译着色语言。
Vertex and fragment shaders are SIMD operations that respectively operate on all the vertices or fragments being processed in the pipeline. Additional data can be communicated from the host to the shaders executing on the device by using input, output, or uniform variables. Data that is passed into a shader is prefixed with the keyword in. The location of that data as it relates to specificvertex attributes or fragment output indices is also specified directly in the shader. Thus,
顶点和片段着色器是 SIMD 操作,分别对管道中正在处理的所有顶点或片段进行操作。可以使用输入、输出或统一变量将其他数据从主机传送到在设备上执行的着色器。传入着色器的数据以关键字in 为前缀。该数据与特定顶点属性或片段输出索引相关的位置也直接在着色器中指定。因此,
layout(location=0) in vec3 in_Position;
specifies that in_Position is an input variable that is of type vec3. The source of that data is the attribute index 0 that is associated with the geometry. The name of this variable is determined by the programmer, and the link between the incoming geometry and the shader occurs while setting up the vertex data on the device. The GLSL contains a nice variety of types useful to graphics programs, including vec2, vec3, vec4, mat2, mat3,and mat4 to name a few. Standard types such as int or float also exist. In shader programming, vectors, such as vec4 hold 4-components corresponding to the x, y, z,and w components of a homogeneous coordinate, or the r, g, b,and a components of a RGBA tuple. The labels for the types can be interchanged as needed (and even repeated) in what is called swizzling (e.g., in_Position.zyxa). Moreover, the component-wise labels are overloaded and can be used appropriately to provide context.
指定in_Position是vec3类型的输入变量。该数据的来源是与几何图形关联的属性索引 0。此变量的名称由程序员确定,传入几何图形与着色器之间的链接在设备上设置顶点数据时发生。GLSL 包含多种对图形程序有用的类型,包括vec2 、 vec3 、 vec4 、 mat2 、 mat3和mat4等等。还存在int或float等标准类型。在着色器编程中,向量(例如vec4 )包含与齐次坐标的x 、 y 、 z和w分量或 RGBA 元组的r 、 g 、 b和a分量相对应的 4 个分量。类型的标签可以根据需要互换(甚至重复),这被称为swizzling (例如,in_Position.zyxa)。此外,组件标签超载,可以适当使用来提供上下文。
All shaders must have a main function that performs the primary computation across all inputs. In this example, the main function simply copies the input vertex position (in_Position), which is of type vec3 into the built-in vertex shader output variable, which is of type vec4. Note that many of the built-in types have constructors that are useful for conversions such as the one presented here to convert the incoming vertex position’s vec3 type into gl_Position’s vec4 type. Homogeneous coordinates are used with OpenGL, so 1.0 is specified as the fourth coordinate to indicate that the vector is a position.
所有着色器都必须有一个主函数,用于对所有输入执行主要计算。在此示例中,主函数只是将输入顶点位置( in_Position )复制到内置顶点着色器输出变量(类型为vec3 )中,该变量的类型为vec4 。请注意,许多内置类型都有可用于转换的构造函数,例如此处介绍的将传入顶点位置的vec3类型转换为gl_Position的vec4类型的构造函数。OpenGL 使用齐次坐标,因此将 10 指定为第四个坐标,以指示该向量是一个位置。
If the simplest vertex shader simply passes clip coordinates through, the simplest fragment shader sets the color of the fragment to a constant value.
如果最简单的顶点着色器只是传递剪辑坐标,那么最简单的片段着色器就会将片段的颜色设置为常数值。
#version 330 core
layout(location=0) out vec4 out_FragmentColor;
void main(void)
{
out_FragmentColor = vec4(0.49, 0.87, 0.59, 1.0);
}
In this example, all fragments will be set to a light shade of green. One key difference is the use of the out keyword. In general, the keywords in and out in shader programs indicate the flow of data into, and out of, shaders. While the vertex shader received incoming vertices and output them to a built-in variable, the fragment shader declares its outgoing value which is written out to the color buffer:
在此示例中,所有片段都将设置为浅绿色。一个关键区别是使用out关键字。通常,着色器程序中的关键字in和out表示数据流入和流出着色器。顶点着色器接收传入的顶点并将其输出到内置变量,而片段着色器则声明其输出值,该值将写入颜色缓冲区:
layout(location=0) out vec4 out_FragmentColor;
The output variable out_FragmentColor is again user defined. The location of the output is color buffer index 0. Fragment shaders can output to multiple buffers, but this is an advanced topic left to the reader that will be needed if OpenGL’s framebuffer objects are investigated. The use of the layout and location keywords makes an explicit connection between the application’s geometric data in the vertex shader and the output color buffers in the fragment shader.
输出变量out_FragmentColor也是用户定义的。输出的位置是颜色缓冲区索引 0。片段着色器可以输出到多个缓冲区,但这是一个留给读者的高级主题,如果研究 OpenGL 的帧缓冲区对象,则需要它。使用布局和位置关键字在顶点着色器中的应用程序几何数据和片段着色器中的输出颜色缓冲区之间建立了明确的联系。
Shader programs are transferred onto the graphics hardware in the form of character strings. They must then be compiled and linked. Furthermore, shaders are coupled together into shader programs so that vertex and fragment processing occur in a consistent manner. A developer can activate a shader that has been successfully compiled and linked into a shader program as needed, while also deactivating shaders when not required. While the detailed process of creating, loading, compiling, and linking shader programs is not provided in this chapter, the following OpenGL functions will be helpful in creating shaders:
着色器程序以字符串的形式传输到图形硬件上。然后必须对其进行编译和链接。此外,着色器被耦合到着色器程序中,以便以一致的方式进行顶点和片段处理。开发人员可以根据需要激活已成功编译并链接到着色器程序的着色器,同时在不需要时停用着色器。虽然本章未提供创建、加载、编译和链接着色器程序的详细过程,但以下 OpenGL 函数将有助于创建着色器:
glCreateShader creates a handle to a shader on the hardware.
glCreateShader创建硬件上着色器的句柄。
glShaderSource loads the character strings into the graphics hardware memory.
glShaderSource将字符串加载到图形硬件内存中。
glCompileShader performs the actual compilation of the shader within the hardware.
glCompileShader在硬件内执行着色器的实际编译。
The functions above need to be called for each shader. So, for the simple pass-through shaders, each of those functions would be called for both the vertex shader code and the fragment shader code provided. At the end of the compilation phase, compilation status and any errors can be queried using additional OpenGL commands.
需要为每个着色器调用上述函数。因此,对于简单的直通着色器,将为提供的顶点着色器代码和片段着色器代码调用上述每个函数。在编译阶段结束时,可以使用其他 OpenGL 命令查询编译状态和任何错误。
After both shader codes are loaded and compiled, they can be linked into a shader program. The shader program is what is used to affect rendering of geometry.
两个着色器代码加载并编译后,可以将它们链接到着色器程序中。着色器程序用于影响几何体的渲染。
glCreateProgram creates a program object that will contain the previously compiled shaders.
glCreateProgram创建一个包含之前编译的着色器的程序对象。
glAttachShader attaches a shader to the shader program object. In the simple example, this function will be called for both the compiled vertex shader and the compiled fragment shader objects.
glAttachShader将着色器附加到着色器程序对象。在简单的示例中,此函数将针对已编译的顶点着色器和已编译的片段着色器对象调用。
glLinkProgram links the shaders internally after all shaders have been attached to the program object.
在所有着色器附加到程序对象后, glLinkProgram在内部链接着色器。
glUseProgram binds the shader program for use on the graphics hardware. As shaders are needed, the program handles are bound using this function. When no shaders are needed, they can be unbound by using the shader program handle 0 as an argument to this function.
glUseProgram绑定着色器程序以供图形硬件使用。当需要着色器时,使用此函数绑定程序句柄。当不需要着色器时,可以使用着色器程序句柄 0 作为此函数的参数来解除着色器的绑定。
Vertices are stored on the graphics hardware using buffers, known as vertex buffer objects. In addition to vertices, any additional vertex attributes, such as colors, normal vectors, or texture coordinates, will also be specified using vertex buffer objects.
顶点使用缓冲区(称为顶点缓冲区对象)存储在图形硬件上。除了顶点之外,任何其他顶点属性(例如颜色、法线向量或纹理坐标)也将使用顶点缓冲区对象指定。
First, let’s focus on specifying the geometric primitive themselves. This starts by allocating the vertices associated with the primitive within the host memory of the application. The most general way to do this is to define an array on the host to contain the vertices needed for the primitive. For instance, a single triangle, fully contained within the canonical volume, could be defined statically on the host as follows:
首先,让我们专注于指定几何图元本身。首先,在应用程序的主机内存中分配与图元相关的顶点。最常用的方法是定义主机上的数组来包含图元所需的顶点。例如,一个完全包含在规范体积内的单个三角形可以在主机上静态定义,如下所示:
GLfloat vertices[] = {-0.5f, -0.5f, 0.0f, 0.5f, -0.5f, 0.0f, 0.0f, 0.5f, 0.0f};
If the simple passthrough shaders are used for this triangle, then all vertices will be rendered. Although the triangle is placed on the z = 0 plane, the z coordinates for this example do not really matter since they are essentially dropped in the final transformation into screen coordinates. Another thing to note is the use of the type GLfloat in these examples. Just as the GLSL language has specialized types, OpenGL has related type which generally can intermix well with the standard types (like float). For preciseness, the OpenGL types will be used when necessary.
如果对此三角形使用简单的直通着色器,则将渲染所有顶点。尽管三角形位于z = 0 平面上,但此示例中的z坐标实际上并不重要,因为它们在最终转换为屏幕坐标时基本上被丢弃。另外要注意的是这些示例中使用了GLfloat类型。正如 GLSL 语言具有专门的类型一样,OpenGL 具有相关类型,通常可以与标准类型(如 float)很好地混合。为了准确起见,将在必要时使用 OpenGL 类型。
OpenGL Coordinate System: The coordinate system used by OpenGL is identical to that presented in this book. It is a right-handed coordinate system with +x to the right, +y up, and +z away from the screen (or window). Thus, –z points into the monitor.
OpenGL 坐标系: OpenGL 使用的坐标系与本书中介绍的坐标系相同。它是一个右手坐标系,+ x向右,+ y向上,+ z远离屏幕(或窗口)。因此, -z指向显示器内部。
Before the vertices can be processed, a vertex buffer is first created on the device to store the vertices. The vertices on the host are then transferred to the device. After this, the vertex buffer can be referenced as needed to draw the array of vertices stored in the buffer. Moreover, after the initial transfer of vertex data, no additional copying of data across the host to device bus need occur, especially if the geometry remains static across rendering loop updates. Any host memory can also be deleted if it was dynamically allocated.
在处理顶点之前,首先在设备上创建一个顶点缓冲区来存储顶点。然后将主机上的顶点传输到设备。此后,可以根据需要引用顶点缓冲区来绘制存储在缓冲区中的顶点数组。此外,在初始传输顶点数据后,无需在主机到设备总线之间进行额外的数据复制,特别是如果几何图形在渲染循环更新期间保持静态。如果主机内存是动态分配的,也可以删除它。
Vertex buffer objects, often called VBOs, represent the primary mechanism with modern OpenGL to store vertex and vertex attributes in the graphics memory. For efficiency purposes, the initial setup of a VBO and the transfer of vertex-related data mostly happens prior to entering the display loop. As an example, to create a VBO for this triangle, the following code could be used:
顶点缓冲区对象(通常称为 VBO)是现代 OpenGL 在图形内存中存储顶点和顶点属性的主要机制。出于效率考虑,VBO 的初始设置和顶点相关数据的传输大多发生在进入显示循环之前。例如,要为这个三角形创建 VBO,可以使用以下代码:
GLuint triangleVBO[1];
glGenBuffers(1, triangleVBO);
glBindBuffer(GL_ARRAY_BUFFER, triangleVBO[0]);
glBufferData(GL_ARRAY_BUFFER, 9 * sizeof(GLfloat), vertices, GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
Three OpenGL calls are required to create and allocate the vertex buffer object. The first, glGenBuffers creates a handle that can be used to refer to the VBO once it is stored on the device. Multiple handles to VBOs (stored in arrays) can be created in a single glGenBuffers call, as illustrated but not utilized here. Note that when a buffer object is generated, the actual allocation of space on the device is not yet performed.
需要三个 OpenGL 调用来创建和分配顶点缓冲区对象。第一个, glGenBuffers创建一个句柄,一旦 VBO 存储在设备上,该句柄可用于引用它。可以在单个glGenBuffers调用中创建多个 VBO 句柄(存储在数组中),如图所示,但这里没有使用。请注意,当生成缓冲区对象时,尚未执行设备上的实际空间分配。
With OpenGL, objects, such as vertex buffer objects, are primary targets for computation and processing. Objects must be bound to a known OpenGL state when used and unbound when not in use. Examples of OpenGL’s use of objects include the vertex buffer objects, framebuffer objects, texture objects, and shader programs, to name a few. In the current example, the GL_ARRAY_BUFFER state of OpenGL is bound to the triangle VBO handle that was generated previously. This essentially makes the triangle VBO the active vertex buffer object. Any operations that affect vertex buffers that follow the glBindBuffer(GL_ARRAY_BUFFER, triangleVBO[0]) command will use the triangle data in the VBO either by reading the data or writing to it.
使用 OpenGL,对象(例如顶点缓冲区对象)是计算和处理的主要目标。使用时,对象必须绑定到已知的 OpenGL 状态,不使用时则必须解除绑定。OpenGL 使用对象的示例包括顶点缓冲区对象、帧缓冲区对象、纹理对象和着色器程序等等。在当前示例中,OpenGL 的GL_ARRAY_BUFFER状态绑定到先前生成的三角形 VBO 句柄。这实际上使三角形 VBO 成为活动的顶点缓冲区对象。在glBindBuffer(GL_ARRAY_BUFFER,triangleVBO[0])命令之后影响顶点缓冲区的任何操作都将通过读取或写入数据来使用 VBO 中的三角形数据。
Vertex data is copied from the host (the vertices array) to the device (currently bound GL_ARRAY_BUffER)using the
顶点数据从主机(顶点数组)复制到设备(当前绑定GL_ARRAY_BUffER ),使用
glBufferData(GL_ARRAY_BUFFER, 9 * sizeof(GLfloat), vertices, GL_STATIC_DRAW);
call. The arguments represent the type of target, the size in bytes of the buffer to be copied, the pointer to the host buffer, and an enumerated type that indicates how the buffer will be used. In the current example, the target is GL_ARRAY_BUFFER, the size of the data is 9* sizeof(GLfloat), and the last argument is GL_STATIC_DRAW indicating to OpenGL that the vertices will not change over the course of the rendering. Finally, when the VBO no longer needs to be an active target for reading or writing, it is unbound with the glBindBuffer(GL_ARRAY_BUFFER, 0) call. In general, binding any of OpenGL’s objects or buffers to handle 0, unbinds, or disables that buffer from affecting subsequent functionality.
调用。参数表示目标的类型、要复制的缓冲区的大小(以字节为单位)、指向主机缓冲区的指针,以及指示如何使用缓冲区的枚举类型。在当前示例中,目标是GL_ARRAY_BUFFER ,数据的大小是 9* sizeof(GLfloat) ,最后一个参数是GL_STATIC_DRAW ,它向 OpenGL 指示顶点在渲染过程中不会改变。最后,当 VBO 不再需要作为读写的活动目标时,可以使用glBindBuffer(GL_ARRAY_BUFFER, 0)调用解除其绑定。通常,将任何 OpenGL 的对象或缓冲区绑定到句柄0 ,都会解除绑定或禁用该缓冲区以使其不再影响后续功能。
While vertex buffer objects are the storage containers for vertices (and vertex attributes), vertex array objects represent OpenGL’s mechanism to bundle vertex buffers together into a consistent vertex state that can be communicated and linked with shaders in the graphics hardware. Recall that the fixed function pipeline of the past no longer exists and therefore, per-vertex state, such as normals or even vertex colors, must be stored in hardware buffers and then referenced in shaders, using input variables (e.g., in).
虽然顶点缓冲区对象是顶点(和顶点属性)的存储容器,但顶点数组对象代表 OpenGL 将顶点缓冲区捆绑在一起形成一致顶点状态的机制,该机制可以与图形硬件中的着色器进行通信和链接。回想一下,过去的固定功能管道已不复存在,因此,每个顶点的状态(例如法线甚至顶点颜色)必须存储在硬件缓冲区中,然后使用输入变量(例如,在)在着色器中引用。
As with vertex buffer objects, vertex array objects, or VAOs, must be created and allocated with any necessary state being set while the vertex array object is bound. For instance, the following code shows how to create a VAO to contain the triangle VBO previously defined:
与顶点缓冲区对象一样,顶点数组对象(或 VAO)必须在绑定顶点数组对象时创建和分配,并设置所有必要的状态。例如,以下代码显示了如何创建一个 VAO 来包含先前定义的三角形 VBO:
GLuint VAO;
glGenVertexArrays(1, &VAO);
glBindVertexArray(VAO);
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER, triangleVBO[0]);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(GLfloat), 0);
glBindVertexArray(0);
When defining a vertex array object, specific vertex buffer objects can be bound to specific vertex attributes (or inputs) in shader code. Recall the use of
定义顶点数组对象时,可以在着色器代码中将特定顶点缓冲区对象绑定到特定顶点属性(或输入)。回想一下
layout(location=0) in vec3 in_Position
in the passthrough vertex shader. This syntax indicate that the shader variable will receive its data from attribute index 0 in the bound vertex array object. In host code, the mapping is created using the
在直通顶点着色器中。此语法表示着色器变量将从绑定顶点数组对象中的属性索引 0 接收其数据。在主机代码中,使用
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER, triangleVBO[0]);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(GLfloat), 0);
calls. The first call enables the vertex attribute index (in this case, 0). The next two calls connect the previously defined vertex buffer object that holds the vertices to the vertex attribute itself. Because glVertexAttribPointer utilizes the currently bound VBO, it is important that the glBindBuffer is issued before assigning the vertex attribute pointer. These function calls create a mapping that binds the vertices in our vertex buffer to the in_Position variable within the vertex shader. The glVertexAttribPointer calls seems complicated but it basically sets attribute index 0 to hold three components (e.g., x, y, z)of GLfloats (the 2nd and 3rd arguments) that are not normalized (the fourth argument). The fifth argument instructs OpenGL that three float values separate the starts of each vertex set. In other words, the vertices are tightly packed in the memory, one after the other. The final argument is a pointer to the data, but because a vertex buffer has been bound prior to this call, the data will be associated with the vertex buffer.
调用。第一个调用启用顶点属性索引(在本例中为 0)。接下来的两个调用将先前定义的保存顶点的顶点缓冲区对象连接到顶点属性本身。由于glVertexAttribPointer使用当前绑定的 VBO,因此在分配顶点属性指针之前发出glBindBuffer非常重要。这些函数调用创建一个映射,将顶点缓冲区中的顶点绑定到顶点着色器中的in_Position变量。glVertexAttribPointer 调用看起来很复杂,但它基本上设置属性索引 0 来保存GLfloats的三个组件(例如x、y、z )(第二和第三个参数)未规范化(第四个参数)。第五个参数指示 OpenGL 三个浮点值将每个顶点集的起点分开。换句话说,顶点紧密地一个接一个地打包在内存中。最后一个参数是指向数据的指针,但由于在此调用之前已经绑定了顶点缓冲区,因此数据将与顶点缓冲区相关联。
The previous steps that initialize and construct the vertex array object, the vertex buffer objects, and the shaders should all be executed prior to entering the display loop. All memory from the vertex buffer will have been transferred to the GPU and the vertex array objects will make the connection between the data and shader input variable indexes. In the display loop, the following calls will trigger the processing of the vertex array object:
初始化和构造顶点数组对象、顶点缓冲区对象和着色器的上述步骤都应在进入显示循环之前执行。顶点缓冲区的所有内存都将被传输到 GPU,并且顶点数组对象将在数据和着色器输入变量索引之间建立连接。在显示循环中,以下调用将触发顶点数组对象的处理:
glBindVertexArray(VAO);
glDrawArrays(GL_TRIANGLES, 0, 3);
glBindVertexArray(0);
Note again, that a bind call makes the vertex array object active. The call to glDrawArrays initiates the pipeline for this geometry, describing that the geometry should be interpreted as a series of triangle primitives starting at offset 0 and only rendering three of the indices. In this example, there are only three elements in the array and the primitive is a triangle, so a single triangle will be rendered.
再次注意,绑定调用使顶点数组对象处于活动状态。对glDrawArrays的调用启动了此几何图形的管道,描述了几何图形应被解释为一系列三角形图元,从偏移量 0 开始,并且仅渲染三个索引。在此示例中,数组中只有三个元素,图元是一个三角形,因此将渲染一个三角形。
Combining all of these steps, the assembled code for the triangle would resemble the following, assuming that shader and vertex data loading are contained in external functions:
结合所有这些步骤,假设着色器和顶点数据加载包含在外部函数中,三角形的汇编代码将类似于以下内容:
// Set the viewport once
int nx, ny;
glfwGetFramebufferSize(window, &nx, &ny);
glViewport(0, 0, nx, ny);
// Set clear color state
glClearColor( 0.0f, 0.0f, 0.0f, 1.0f );
// Create the Shader programs, VBO, and VAO
GLuint shaderID = loadPassthroughShader();
GLuint VAO = loadVertexData();
while (!glfwWindowShouldClose(window)) {
{
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glUseProgram( shaderID );
glBindVertexArray(VAO);
glDrawArrays(GL_TRIANGLES, 0, 3);
glBindVertexArray(0);
glUseProgram( 0 );
// Swap front and back buffers
glfwSwapBuffers(window);
// Poll for events
glfwPollEvents();
if (glfwGetKey( window, GLFW_KEY_ESCAPE ) == GLFW_PRESS)
glfwSetWindowShouldClose(window, 1);
}
Figure 17.3 shows the result of using the shaders and vertex state to render the canonical view volume triangle.
图 17.3展示了使用着色器和顶点状态渲染规范视点三角形的结果。
Figure 17.3. The canonical triangle rendered using the simple vertex and fragment shaders.
图 17.3.使用简单的顶点和片段着色器渲染的标准三角形。
Current versions of OpenGL have removed the matrix stacks that were once used to reference the projection and modelview matrices from the hardware. Because these matrix stacks no longer exist, the programmer must write matrix code that can be transferred to vertex shaders where the transformations will occur. That initially may seem challenging. However, several libraries and toolkits have been developed to assist with cross-platform development of OpenGL code. One of these libraries, GLM, or OpenGL Mathematics, has been developed to track the OpenGL and GLSL specifications closely so that interoperation between GLM and the hardware will work seamlessly.
OpenGL 的当前版本已删除了曾经用于从硬件引用投影和模型视图矩阵的矩阵堆栈。由于这些矩阵堆栈不再存在,程序员必须编写可以传输到将发生转换的顶点着色器的矩阵代码。这最初似乎很有挑战性。但是,已经开发了多个库和工具包来协助跨平台开发 OpenGL 代码。其中一个库 GLM 或 OpenGL Mathematics 已经开发出来以密切跟踪 OpenGL 和 GLSL 规范,以便 GLM 和硬件之间的互操作可以无缝运行。
GLM provides several basic math types useful to computer graphics. For our purposes, we will focus on just a few types and a handful of functions that make use of matrix transforms within the shaders easy. A few types that will be used include the following:
GLM 提供了几种对计算机图形有用的基本数学类型。为了便于理解,我们将只关注几种类型和一些函数,这些函数可轻松在着色器中使用矩阵变换。将使用的类型包括:
glm::vec3—a compact array of 3 floats that can be accessed using the same component-wise access found in the shaders;
glm::vec3 — 一个由 3 个浮点数组成的紧凑数组,可以使用着色器中相同的组件访问方式进行访问;
glm::vec4—a compact array of 4 floats that can be accessed using the same component-wise access found in the shaders;
glm::vec4 — 一个由 4 个浮点数组成的紧凑数组,可以使用着色器中相同的组件访问方式进行访问;
glm::mat4—a 4 × 4 matrix storage represented as 16 floats. The matrix is stored in column-major format.
glm::mat4 — 一个 4 × 4 矩阵存储,表示为 16 个浮点数。该矩阵以列主格式存储。
Similarly, GLM provides functions for creating the projection matrices, Morth and Mp, as well as functions for generating the view matrix, Mcam:
类似地,GLM 提供了创建投影矩阵M orth和M p 的函数,以及生成视图矩阵M cam 的函数:
glm::ortho creates a 4 × 4 orthographic projection matrix.
glm::ortho创建一个 4 × 4 正交投影矩阵。
glm::perspective creates the 4 × 4 perspective matrix.
glm::perspective创建 4×4 透视矩阵。
glm::lookAt creates the 4 × 4 homogeneous transform that translates and orients the camera.
glm::lookAt创建 4×4 齐次变换,用于平移和调整相机方向。
A simple extension to the previous example would be to place the triangle vertices into a more flexible coordinate system and render the scene using an orthographic projection. The vertices in the previous example could become:
对上一个示例的一个简单扩展是将三角形顶点放入更灵活的坐标系中,并使用正交投影渲染场景。上一个示例中的顶点可以变成:
GLfloat vertices[] = {-3.0f, -3.0f, 0.0f, 3.0f, -3.0f, 0.0f, 0.0f, 3.0f, 0.0f};
Using GLM, an orthographic projection can be created easily on the host. For instance,
使用 GLM,可以在主机上轻松创建正交投影。例如,
glm::mat4 projMatrix = glm::ortho(-5.0f, 5.0f, -5.0, 5.0, -10.0f, 10.0f);
The projection matrix can then be applied to each vertex transforming it into clip coordinates. The vertex shader will be modified to perform this operation:
然后可以将投影矩阵应用于每个顶点,将其转换为裁剪坐标。顶点着色器将被修改以执行此操作:
vcanon = Morthv.
v canon = M orth v 。
This computation will occur in a modified vertex shader that uses uniform variables to communicate data from the host to the device. Uniform variables represent static data that is invariant across the execution of a shader program. The data is the same for all elements and remains static. However, uniform variables can be modified by an application between executions of a shader. This is the primary mechanism that data within the host application can communicate changes to shader computations. Uniform data often represent the graphics state associated with an application. For instance, the projection, view, or model matrices can be set and accessed through uniform variables. Information about light sources within a scene may also be obtained through uniform variables.
此计算将在经过修改的顶点着色器中进行,该着色器使用统一变量将数据从主机传递到设备。统一变量表示在着色器程序执行过程中保持不变的静态数据。所有元素的数据都是相同的,并且保持静态。但是,应用程序可以在着色器执行之间修改统一变量。这是主机应用程序中的数据可以将更改传达给着色器计算的主要机制。统一数据通常表示与应用程序相关的图形状态。例如,可以通过统一变量设置和访问投影、视图或模型矩阵。也可以通过统一变量获取有关场景内光源的信息。
Modifying the vertex shader requires adding a uniform variable to hold the projection matrix. We can use GLSL’s mat4 type to store this data. The projection matrix can then be used naturally to tranform the incoming vertices into the canonical coordinate system:
修改顶点着色器需要添加一个统一变量来保存投影矩阵。我们可以使用 GLSL 的mat4类型来存储此数据。然后可以自然地使用投影矩阵将传入的顶点转换为规范坐标系:
#version 330 core
layout(location=0) in vec3 in_Position;
uniform mat4 projMatrix;
void main(void)
{
gl_Position = projMatrix * vec4(in_Position, 1.0);
}
The application code need only transfer the uniform variable from the host memory (a GLM mat4) into the device’s shader program (a GLSL mat4). This is easy enough, but requires that the host side of the application acquire a handle to the uniform variable after the shader program has been linked. For instance, to obtain a handle to the projMatrix variable, the following call would be issued once, after shader program linking is complete:
应用程序代码只需将统一变量从主机内存(GLM mat4)传输到设备的着色器程序(GLSL mat4)中。这很容易,但要求应用程序的主机端在链接着色器程序后获取统一变量的句柄。例如,要获取projMatrix变量的句柄,在着色器程序链接完成后,将发出一次以下调用:
GLint pMatID = glGetUniformLocation(shaderProgram, "projMatrix");
The first argument is the shader program object handle and the second argument is the character string of the variable name in the shader. The id can then be used with a variety of OpenGL glUniform function call to transfer the memory on the host into the device. However, shader programs must first be bound prior to setting the value related to a uniform variable. Also, because GLM is used to store the projection matrix on the host, a GLM helper function will be used to obtain a pointer to the underlying matrix, and allow the copy to proceed.
第一个参数是着色器程序对象句柄,第二个参数是着色器中变量名称的字符串。然后可以使用 id 和各种 OpenGL glUniform函数调用将主机上的内存传输到设备中。但是,在设置与统一变量相关的值之前,必须先绑定着色器程序。此外,由于 GLM 用于在主机上存储投影矩阵,因此将使用 GLM 辅助函数来获取指向底层矩阵的指针,并允许复制继续进行。
glUseProgram( shaderID );
glUniformMatrix4fv(pMatID, 1, GL_FALSE, glm::value_ptr(projMatrix));
glBindVertexArray(VAO);
glDrawArrays(GL_TRIANGLES, 0, 3);
glBindVertexArray(0);
glUseProgram( 0 );
Notice the form that glUniform takes. The function name ends with characters that help define how it is used. In this case, a single 4 × 4 matrix of floats is being tranferred into the uniform variable. The v indicates that an array contains the data, rather than passing by value. The third argument lets OpenGL know whether the matrix should be transposed (a potentially handy feature), and the last argument is a pointer to the memory where the matrix resides.
注意glUniform所采用的形式。函数名称以有助于定义其用法的字符结尾。在本例中,单个 4 × 4 浮点矩阵被传输到 uniform 变量中。v 表示数组包含数据,而不是按值传递。第三个参数让 OpenGL 知道矩阵是否应该转置(一个可能很方便的功能),最后一个参数是指向矩阵所在内存的指针。
By this section of the chapter, you should have a sense for the role that shaders and vertex data play in rendering objects with OpenGL. Shaders, in particular, form a very important role in modern OpenGL. The remaining sections will further explore the role of shaders in rendering scenes, atempting to build upon the role that shaders play in other rendering styles presented in this book.
通过本章的这一部分,您应该了解着色器和顶点数据在使用 OpenGL 渲染对象时所起的作用。着色器在现代 OpenGL 中起着非常重要的作用。其余部分将进一步探讨着色器在渲染场景中的作用,并尝试以着色器在本书介绍的其他渲染样式中所起的作用为基础。
The previous examples specified a single triangle with no additional data. Vertex attributes, such as normal vectors, texture coordinates, or even colors, can be interleaved with the vertex data in a vertex buffer. The memory layout is straightforward. Below, the color of each vertex is set after each vertex in the array. Three components are used to represent the red, green, and blue channels. Allocating the vertex buffer is identical with the exception being that the size of the array is now 18 GLfloats instead of 9.
前面的示例指定了一个三角形,没有其他数据。顶点属性(例如法线向量、纹理坐标甚至颜色)可以与顶点缓冲区中的顶点数据交错。内存布局很简单。下面,每个顶点的颜色在数组中的每个顶点之后设置。三个组件用于表示红色、绿色和蓝色通道。分配顶点缓冲区是相同的,只是数组的大小现在是 18 个GLfloat,而不是 9 个。
GLfloat vertexData[] = {0.0f, 3.0f, 0.0f, 1.0f, 1.0f, 0.0f, -3.0f, -3.0f, 0.0f, 0.0f, 1.0f, 1.0f, 3.0f, -3.0f, 0.0f, 1.0f, 0.0f, 1.0f};
The vertex array object specification is different. Because the color data is interleaved between vertices, the vertex attribute pointers must stride across the data appropriately. The second vertex attribute index must also be enabled. Building off the previous examples, we construct the new VAO as follows:
顶点数组对象规范有所不同。由于颜色数据在顶点之间交错,因此顶点属性指针必须适当地跨过数据。还必须启用第二个顶点属性索引。基于前面的示例,我们构造新的 VAO 如下:
glBindBuffer(GL_ARRAY_BUFFER, m_triangleVBO[0]);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 6 * sizeof(GLfloat),
0);
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 6 * sizeof(GLfloat),
(const GLvoid *)12);
A single VBO is used and bound prior to setting the attributes since both vertex and color data are contained within the VBO. The first vertex attribute is enabled at index 0, which will represent the vertices in the shader. Note that the stride (the 5th argument) is different as the vertices are separated by six floats (e.g., the x, y, z of the vertex followed by the r, g, b of the color). The second vertex attribute index is enabled and will represent the vertex color attributes in the shader at location 1. It has the same stride, but the last argument now represents the pointer offset forthestartofthe first color value. While 12 is used in the above example, this is identical to stating 3 * sizeof(GLfloat). In other words, we need to jump across the three floats representing the vertex x, y, z values to locate the first color attribute in the array.
由于顶点和颜色数据都包含在 VBO 中,因此在设置属性之前,会使用并绑定单个 VBO。第一个顶点属性在索引 0 处启用,它将表示着色器中的顶点。请注意,步幅(第 5 个参数)不同,因为顶点由六个浮点数分隔(例如,顶点的x、y、z后跟颜色的r、g、b )。第二个顶点属性索引已启用,它将表示着色器中位置 1 处的顶点颜色属性。它具有相同的步幅,但最后一个参数现在表示第一个颜色值的起始指针偏移量。虽然在上面的例子中使用了 12,但这与声明3 * sizeof(GLfloat)相同。换句话说,我们需要跳过表示顶点x、y、z值的三个浮点数来定位数组中的第一个颜色属性。
The shaders for this example are only slightly modified. The primary differences in the vertex shader (shown below) are (1) the second attribute, color, is at location 1 and (2) vColor is an output variable that is set in the main body of the vertex shader.
本例中的着色器仅做了轻微修改。顶点着色器(如下所示)的主要区别在于:(1) 第二个属性 color 位于位置 1;(2) vColor是在顶点着色器主体中设置的输出变量。
#version 330 core
layout(location=0) in vec3 in_Position;
layout(location=1) in vec3 in_Color;
out vec3 vColor;
uniform mat4 projMatrix;
void main(void)
{
vColor = in_Color;
gl_Position = projMatrix * vec4(in_Position, 1.0);
}
Recall that the keywords in and out refer to the flow of data between shaders. Data that flows out of the vertex shader becomes input data in the connected fragment shader, provided that the variable names match up. Moreover, out variables that are passed to fragment shaders are interpolated across the fragments using barycentric interpolation. Some modification of the interpolation can be achieved with additional keywords, but this detail will be left to the reader. In this example, three vertices are specified, each with a specific color value. Within the fragment shader, the colors will be interpolated across the face of the triangle.
回想一下,关键字in和out 指的是着色器之间的数据流。如果变量名称匹配,从顶点着色器流出的数据将成为所连接片段着色器中的输入数据。此外,传递给片段着色器的输出变量使用重心插值在片段之间进行插值。可以使用其他关键字对插值进行一些修改,但这个细节将留给读者。在此示例中,指定了三个顶点,每个顶点都有特定的颜色值。在片段着色器中,颜色将在三角形的面上进行插值。
The fragment shader changes are simple. The vColor variable that was set and passed out of the vertex shader now becomes an in variable. As fragments are processed, the vColor vec3 will contain the correctly interpolated values based on the location of the fragment within the triangle.
片段着色器的变化很简单。设置并从顶点着色器中传出的vColor变量现在变成了输入变量。在处理片段时, vColor vec3将包含基于片段在三角形内的位置的正确插值。
#version 330 core
layout(location=0) out vec4 fragmentColor;
in vec3 vColor;
void main(void)
{
fragmentColor = vec4(vColor, 1.0);
}
The image that results from running this shader with the triangle data is shown in Figure 17.4.
使用三角形数据运行此着色器得到的图像如图 17.4所示。
Figure 17.4. Setting the colors of each vertex in the vertex shader and passing the data to the fragment shader results in barycentric interpolation of the colors.
图 17.4.在顶点着色器中设置每个顶点的颜色并将数据传递给片段着色器会导致颜色的重心插值。
The previous example illustrates the interleaving of data in an array. Vertex buffers can be used in a variety of ways, including separate vertex buffers for different model attributes. Interleaving data has advantages as the attributes associated with a vertex are near the vertex in memory and can likely take advantage of memory locality when operating in the shaders. While the use of these interleaved arrays is straightforward, it can become cumbersome to manage large models in this way, especially as data structures are used for building robust (and sustainable) software infrastructure for graphics (see Chapter 12). It is rather simple to store vertex data as vectors of structs that contain the vertex and any related attributes. When done this way, the structure need only be mapped into the vertex buffer. For instance, the following structure contains the vertex position and vertex color, using GLM’s vec3 type:
上例说明了数组中数据的交错。顶点缓冲区有多种使用方式,包括为不同的模型属性使用单独的顶点缓冲区。交错数据具有优势,因为与顶点相关的属性在内存中靠近顶点,并且在着色器中操作时可能利用内存局部性。虽然使用这些交错数组很简单,但以这种方式管理大型模型可能会变得很麻烦,尤其是当数据结构用于构建强大(且可持续)的图形软件基础架构时(参见第 12 章)。将顶点数据存储为包含顶点和任何相关属性的结构向量相当简单。以这种方式完成时,只需将结构映射到顶点缓冲区中。例如,以下结构包含顶点位置和顶点颜色,使用 GLM 的vec3类型:
struct vertexData
{
glm::vec3 pos;
glm::vec3 color;
};
std::vector< vertexData > modelData;
The STL vector will hold all vertices related to all the triangles in the model. We will continue to use the same layout for triangles as in previous examples, which is a basic triangle strip. Every three vertices represents a triangle in the list. There are other data organizations that can be used with OpenGL, and Chapter 12 presents other options for organizing data more efficiently.
STL 向量将保存与模型中所有三角形相关的所有顶点。我们将继续使用与前面的示例相同的三角形布局,即基本三角形带。每三个顶点代表列表中的一个三角形。还有其他可用于 OpenGL 的数据组织,第 12 章介绍了更有效地组织数据的其他选项。
Once the data is loaded into the vector, the same calls used before load the data into the vertex buffer object:
一旦将数据加载到向量中,就会使用与将数据加载到顶点缓冲区对象之前相同的调用:
int numBytes = modelData.size() * sizeof(vertexData);
glBufferData(GL_ARRAY_BUFFER, numBytes, modelData.data(), GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
STL vectors store data contiguously. The vertexData struct used above is represented by a flat memory layout (it does not contain pointers to other data elements) and is contiguous. However, the STL vector is an abstraction and the pointer that references the underlying memory must be queried using the data() member. That pointer is provided to the call to glBufferData. Attribute assignment in the vertex array object is identical as the locality of the vertex attributes remains the same.
STL 向量连续存储数据。上面使用的vertexData结构由平面内存布局表示(它不包含指向其他数据元素的指针),并且是连续的。但是,STL 向量是一种抽象,引用底层内存的指针必须使用data()成员进行查询。该指针提供给对glBufferData的调用。顶点数组对象中的属性分配是相同的,因为顶点属性的局部性保持不变。
The graphics pipeline chapter (Chapter 8) and the surface shading chapter (Chapter 10) do a nice job of describing and illustrating the effects of per-vertex and per-fragment shading as they relate to rasterization and shading in general. With modern graphics hardware, applying shading algorithms in the fragment processor produces better visual results and more accurately approximates lighting. Shading that is computed on a per-vertex basis is often subject to visual artifacts related to the underlying geometry tessellation. In particular, per-vertex based shading often fails to approximate the appropriate intensities across the face of the triangle since the lighting is only being calculated at each vertex. For example, when the distance to the light source is small, as compared with the size of the face being shaded, the illumination on the face will be incorrect. Figure 17.5 illustrates this situation. The center of the triangle will not be illuminated brightly, despite being very close to the light source, since the lighting on the vertices, which are far from the light source, are used to interpolate the shading across the face. Of course, increasing the tessellation of the geometry can improve the visuals. However, this solution is of limited use in real-time graphics as the added geometry required for more accurate illumination can result in slower rendering.
图形管道章节(第 8 章)和表面着色章节(第 10 章)很好地描述和说明了逐顶点和逐片段着色的效果,因为它们与光栅化和着色总体相关。借助现代图形硬件,在片段处理器中应用着色算法可以产生更好的视觉效果,并更准确地近似照明。基于逐顶点计算的着色通常会受到与底层几何镶嵌相关的视觉伪影的影响。特别是,基于逐顶点的着色通常无法近似三角形面上的适当强度,因为只在每个顶点处计算照明。例如,当与被着色面的大小相比,到光源的距离较小时,面上的照明将不正确。图 17.5说明了这种情况。尽管三角形中心非常靠近光源,但它不会被照亮,因为远离光源的顶点上的光照用于对整个面进行阴影插值。当然,增加几何体的镶嵌可以改善视觉效果。然而,这种解决方案在实时图形中的用途有限,因为更精确的照明所需的额外几何体可能会导致渲染速度变慢。
Figure 17.5. The distance to the light source is small relative to the size of the triangle.
图 17.5.相对于三角形的大小,到光源的距离较小。
Fragment shaders operate on the fragments that emerge from rasterization after vertices have been transformed and clipped. Generally speaking, fragment shaders must output a value that is written to a framebuffer. Often times, this is the color of the pixel. If the depth test is enabled, the fragment’s depth value will be used to control whether the color and its depth are written to the framebuffer memory. The data that fragment shaders use for computation comes from various sources:
片段着色器对顶点变换和裁剪后从光栅化中出现的片段进行操作。一般来说,片段着色器必须输出一个写入帧缓冲区的值。通常,这是像素的颜色。如果启用了深度测试,片段的深度值将用于控制是否将颜色及其深度写入帧缓冲区内存。片段着色器用于计算的数据来自各种来源:
Built-in OpenGL variables. These variables are provided by the system. Examples of fragment shader variables include gl_FragCoord or gl_FrontFacing. These variables can change based on revisions to OpenGL and GLSL, so it is advised that you check the specification for the version of OpenGL and GLSL that you are targeting.
内置 OpenGL 变量。这些变量由系统提供。片段着色器变量的示例包括gl_FragCoord或gl_FrontFacing 。这些变量可能会根据 OpenGL 和 GLSL 的修订而发生变化,因此建议您检查目标 OpenGL 和 GLSL 版本的规范。
Uniform variables. Uniform variables are transferred from the host to the device and can change as needed based on user input or changing simulation state in the application. These variables are declared and defined by the programmer for use within both vertex and fragment shaders. The projection matrix in the previous vertex shader examples was communicated to the shader via a uniform variable. If needed, the same uniform variable names can be used within both vertex and fragment shaders.
统一变量。统一变量从主机传输到设备,可以根据用户输入或应用程序中模拟状态的变化而根据需要进行更改。这些变量由程序员声明和定义,供顶点和片段着色器使用。前面顶点着色器示例中的投影矩阵通过统一变量传达给着色器。如果需要,可以在顶点和片段着色器中使用相同的统一变量名称。
Input variables. Input variables are specified in the fragment shader with the prefixed keyword in. Recall that data can flow into and out of shaders. Vertex shaders can output data to the next shader stage using the out keyword (e.g., out vec3 vColor, in a previous example). The outputs are linked to inputs when the next stage uses an in keyword followed by the same type and name qualifiers (e.g., in vec3 vColor in the previous example’s corresponding fragment shader).
输入变量。输入变量在片段着色器中用前缀关键字in指定。回想一下,数据可以流入和流出着色器。顶点着色器可以使用out关键字将数据输出到下一个着色器阶段(例如,上例中的out vec3 vColor )。当下一个阶段使用in关键字后跟相同类型和名称限定符时,输出将链接到输入(例如,上例中相应的片段着色器中的 in vec3 vColor )。
Any data that is passed to a fragment shader through the in-out linking mechanism will vary on a per-fragment basis using barycentric interpolation. The interpolation is computed outside of the shader by the graphics hardware. Within this infrastructure, fragment shaders can be used to perform per-fragment shading algorithms that evaluate specific equations across the face of the triangle. Vertex shaders provide support computations, transforming vertices and staging intermediate per-vertex values that will be interpolated for the fragment code.
通过输入输出链接机制传递给片段着色器的任何数据都将使用重心插值根据每个片段而变化。插值由图形硬件在着色器外部计算。在此基础结构中,片段着色器可用于执行每个片段着色算法,该算法评估三角形面上的特定方程式。顶点着色器提供支持计算、转换顶点和暂存将为片段代码插值的中间每个顶点值。
The following shader program code implements per-fragment, Blinn-Phong shading. It brings together much of what has been presented in this chapter thus far and binds it to the shader descriptions from Chapter 4. An interleaved vertex buffer is used to contain the vertex position and normal vectors. These values manifest in the vertex shader as vertex array attributes for index 0 and index 1. The shading computations that occur in the fragment shader code are performed in camera coordinates (sometimes referred to as eye-space).
以下着色器程序代码实现了逐片段 Blinn-Phong 着色。它汇集了本章迄今为止介绍的大部分内容,并将其与第 4 章中的着色器描述绑定在一起。交错顶点缓冲区用于包含顶点位置和法线向量。这些值在顶点着色器中显示为索引 0 和索引 1 的顶点数组属性。片段着色器代码中发生的着色计算是在相机坐标(有时称为眼空间)中执行的。
The vertex shader stage of our program is used to transform the incoming vertices using the Mmodel and Mcam matrices into camera coordinates. It also uses the normal matrix, (M–1)T, to appropriately transform the incoming normal vector attribute. The vertex shader outputs three variables to the fragment stage:
我们程序的顶点着色器阶段用于使用M模型和M相机矩阵将传入的顶点转换为相机坐标。它还使用法线矩阵 ( M –1 ) T来适当地转换传入的法线矢量属性。顶点着色器向片段阶段输出三个变量:
• normal. The vertex’s normal vector as transformed into the camera coordinate system.
•法线. 顶点的法线向量转换到相机坐标系中。
• h. The half-vector needed for Blinn-Phong shading.
• h.Blinn -Phong 着色所需的半向量。
• l. The light direction transformed into the camera coordinate system.
• l . 光线方向转换到相机坐标系中。
Each of these variables will then be available for fragment computation, after applying barycentric interpolation across the three vertices in the triangle.
在三角形的三个顶点上应用重心插值之后,每个变量都将可用于片段计算。
A single point light is used with this shader program. The light position and intensity is communicated to both the vertex and fragment shaders using a uniform variable. The light data is declared using GLSL’s struct qualifer, which allows variables to be grouped together in meaningful ways. Although not presented here, GLSL supports arrays and for-loop control structures, so additional lights could easily be added to this example.
此着色器程序使用单点光源。光源位置和强度使用统一变量传达给顶点和片段着色器。光源数据使用 GLSL 的结构限定符声明,允许以有意义的方式将变量组合在一起。虽然这里没有介绍,但 GLSL 支持数组和 for 循环控制结构,因此可以轻松地将其他光源添加到此示例中。
All matrices are also provided to the vertex shader using uniform variables. For now, we will imagine that the model (or local transform) matrix will be set to the indentity matrix. In the following section, more detail will be provided to expand on how the model matrix can be specified on the host using GLM.
所有矩阵也使用统一变量提供给顶点着色器。现在,我们假设模型(或局部变换)矩阵将被设置为单位矩阵。在下一节中,将提供更多细节,以扩展如何使用 GLM 在主机上指定模型矩阵。
#version 330 core
//
// Blinn-Phong Vertex Shader
//
layout(location=0) in vec3 in_Position;
layout(location=1) in vec3 in_Normal;
out vec4 normal;
out vec3 half;
out vec3 lightdir;
struct LightData {
vec3 position;
vec3 intensity;
};
uniform LightData light;
uniform mat4 projMatrix;
uniform mat4 viewMatrix;
uniform mat4 modelMatrix;
uniform mat4 normalMatrix;
void main(void)
{
// Calculate lighting in eye space: transform the local
// position to world and then camera coordinates.
vec4 pos = viewMatrix * modelMatrix * vec4(in_Position, 1.0);
vec4 lightPos = viewMatrix * vec4(light.position, 1.0);
normal = normalMatrix * vec4(in_Normal, 0.0);
vec3 v = normalize( -pos.xyz );
lightdir = normalize( lightPos.xyz - pos.xyz );
half = normalize( v + lightdir );
gl_Position = projMatrix * pos;
}
The vertex shader’s main function first transforms the position and light position into camera coordinates using vec4 types to correspond with the 4 × 4 matrices of GLSL’s mat4. We then transform the normal vector and store it in the out vec4 normal variable. The view (or eye) vector and light direction vector are then calculated, which leads to the computation of the half vector needed for Blinn-Phong shading. The final computation completes the calculation of
顶点着色器的主函数首先使用vec4类型将位置和光线位置转换为相机坐标,以与 GLSL 的mat4的 4×4 矩阵相对应。然后我们转换法线向量并将其存储在out vec4 normal变量中。然后计算视图(或眼睛)向量和光线方向向量,从而计算 Blinn-Phong 着色所需的半向量。最终计算完成
vcanon = MprojMcamMmodelv
v佳能= M项目M凸轮M模型v
by applying the projection matrix. It then sets the canonical coordinates of the vertex to the built-in GLSL vertex shader output variable gl_Position.After this, the vertex is in clip-coordinates and is ready for rasterization.
通过应用投影矩阵。然后它将顶点的标准坐标设置为内置 GLSL 顶点着色器输出变量gl_Position 。此后,顶点处于剪辑坐标中并准备好进行光栅化。
The fragment shader computes the Blinn-Phong shading model. It receives barycentric interpolated values for the vertex normal, half vector, and light direction. Note that these variables are specified using the in keyword as they come in from the vertex processing stage. The light data is also shared with the fragment shader using the same uniform specification that was used in the vertex shader. The matrices are not required so no uniform matrix variables are declared. The material properties for the geometric model are communicated through uniform variables to specify ka,kd,ks,Ia,and p. Together, the data allow the fragment shader to compute Equation 4.3:
片段着色器计算 Blinn-Phong 着色模型。它接收顶点法线、半矢量和光线方向的重心插值。请注意,这些变量是使用in关键字指定的,因为它们来自顶点处理阶段。光数据也使用与顶点着色器中相同的统一规范与片段着色器共享。矩阵不是必需的,因此无需声明统一矩阵变量。几何模型的材料属性通过统一变量传递,以指定k a 、k d 、k s 、I a和p 。这些数据共同允许片段着色器计算公式 4.3:
L = ka Ia + kd Imax(0, n · l)+ ks Imax(0, n · h)p
L = k a I a + k d Imax (0 n · l ) + k s Imax (0 n · h ) p
at each fragment.
在每个片段。
#version 330 core
//
// Blinn-Phong Fragment Shader
//
in vec4 normal;
in vec3 half;
in vec3 lightdir;
layout(location=0) out vec4 fragmentColor;
struct LightData {
vec3 position;
vec3 intensity;
};
uniform LightData light;
uniform vec3 Ia;
uniform vec3 ka, kd, ks;
uniform float phongExp;
void main(void)
{
vec3 n = normalize(normal.xyz);
vec3 h = normalize(half);
vec3 l = normalize(lightdir);
vec3 intensity = ka * Ia
+ kd * light.intensity * max( 0.0, dot(n, l) )
+ ks * light.intensity
* pow( max( 0.0, dot(n, h) ), phongExp );
fragmentColor = vec4( intensity, 1.0 );
}
The fragment shader writes the computed intensity to the fragment color output buffer. Figure 17.6 illustrates several examples that show the effect of per-fragment shading across varying degrees of tessellation on a geometric model. This fragment shader introduces the use of structures for holding uniform variables. It should be noted that they are user-defined structures, and in this example, the LightData type holds only the light position and its intensity. In host code, the uniform variables in structures are referenced using the fully qualified variable name when requesting the handle to the uniform variable, as in:
片段着色器将计算出的强度写入片段颜色输出缓冲区。图 17.6说明了几个示例,这些示例显示了对几何模型上不同程度的细分进行逐片段着色的效果。此片段着色器引入了使用结构来保存统一变量。应该注意的是,它们是用户定义的结构,在此示例中, LightData类型仅保存光位置及其强度。在主机代码中,在请求统一变量的句柄时,使用完全限定的变量名引用结构中的统一变量,如下所示:
lightPosID = shader.createUniform( "light.position" );
lightIntensityID = shader.createUniform( "light.intensity" );
Figure 17.6. Per-fragment shading applied across increasing tessellation of a subdivision sphere. The specular highlight is apparent with lower tessellations.
图 17.6。逐片段着色应用于细分球体的不断增加的镶嵌。镜面高光在较低的镶嵌中很明显。
Once you have a working shader program, such as the Blinn-Phong one presented here, it is easy to expand your ideas and develop new shaders. It may also be helpful to develop a set of very specific shaders for debugging. One such shader is the normal shader program. Normal shading is often helpful to understand whether the incoming geometry is organized correctly or whether the computations are correct. In this example, the vertex shader remains the same. Only the fragment shader changes:
一旦你有了一个可以工作的着色器程序,比如这里介绍的 Blinn-Phong 程序,你就很容易扩展你的想法并开发新的着色器。开发一组非常具体的着色器用于调试也可能会有所帮助。一个这样的着色器就是普通着色器程序。普通着色通常有助于了解传入的几何图形是否组织正确或计算是否正确。在此示例中,顶点着色器保持不变。只有片段着色器发生变化:
#version 330 core
in vec4 normal;
layout(location=0) out vec4 fragmentColor;
void main(void)
{
// Notice the use of swizzling here to access
// only the xyz values to convert the normal vec4
// into a vec3 type!
vec3 intensity = normalize(normal.xyz) * 0.5 + 0.5;
fragmentColor = vec4( intensity, 1.0 );
}
Whichever shaders you start building, be sure to comment them! The GLSL specification allows comments to be included in shader code, so leave yourself some details that can guide you later.
无论您开始构建哪个着色器,请务必对其进行注释!GLSL 规范允许在着色器代码中包含注释,因此请留下一些细节,以便以后为您提供指导。
Once basic shaders are working, it’s interesting to start creating more complex scenes. Some 3D model files are simple to load and others require more effort. One simple 3D object file representation is the OBJ format. OBJ is a widely used format and several codes are available to load these types of files. The array of structs mechanism presented earlier works well for containing the OBJ data on the host. It can then easily be transferred into a VBO and vertex array objects.
一旦基本着色器开始工作,开始创建更复杂的场景就很有趣了。一些 3D 模型文件很容易加载,而另一些则需要更多努力。一种简单的 3D 对象文件表示是 OBJ 格式。OBJ 是一种广泛使用的格式,有几种代码可用于加载这些类型的文件。前面介绍的结构体数组机制非常适合在主机上包含 OBJ 数据。然后可以轻松地将其传输到 VBO 和顶点数组对象中。
Many 3D models are defined in their own local coordinate systems and need various transformations to align them with the OpenGL coordinate system. For instance, when the Stanford Dragon’s OBJ file is loaded into the OpenGL coordinate system, it appears lying on its side at the origin. Using GLM, we can create the model transformations to place objects within our scenes. For the dragon model, this means rotating –90 degrees about , and then translating up in . The effective model transform becomes
许多 3D 模型都是在自己的局部坐标系中定义的,需要进行各种变换才能与 OpenGL 坐标系对齐。例如,当将斯坦福龙的 OBJ 文件加载到 OpenGL 坐标系中时,它看起来是侧躺在原点。使用 GLM,我们可以创建模型变换以将对象放置在场景中。对于龙模型,这意味着旋转 -90 度十→ ,然后向上翻译是→ .有效模型变换变为
Mmodel = Mtranslate MrotX,
M模型= M平移M rotX ,
and the dragon is presented upright and above the ground plane, as shown in Figure 17.7. To do this we utilize several functions from GLM for generating local model transforms:
龙直立在地面上方,如图 17.7所示。为此,我们利用 GLM 中的几个函数来生成局部模型变换:
glm::translate creates a translation matrix.
glm::translate创建一个翻译矩阵。
glm::rotate creates a rotation matrix, specified in either degrees or radians about a specificaxis.
glm::rotate创建一个围绕特定轴的旋转矩阵,以度数或弧度为单位。
glm::scale creates a scale matrix.
glm::scale创建一个比例矩阵。
Figure 17.7. Images are described from left to right. The default local orientation of the dragon, lying on its side. After a –90 degree rotation about , the dragon is upright but still centered about the origin. Finally, after applying a translation of 1.0 in , the dragon is ready for instancing.
图 17.7。图像从左到右描述。龙的默认局部方向,侧卧。在旋转 -90 度后十→ ,龙是直立的,但仍然以原点为中心。最后,在应用 1.0 的平移后是→ ,龙已准备好实例化。
We can apply these functions to create the model transforms and pass the model matrix to the shader using uniform variables. The Blinn-Phong vertex shader contains instructions that apply the local transform to the incoming vertex. The following code shows how the dragon model is rendered:
我们可以应用这些函数来创建模型变换,并使用统一变量将模型矩阵传递给着色器。Blinn-Phong 顶点着色器包含将局部变换应用于传入顶点的指令。以下代码显示了如何渲染龙模型:
glUseProgram( BlinnPhongShaderID );
// Describe the Local Transform Matrix
glm::mat4 modelMatrix = glm::mat4(1.0); // Identity Matrix
modelMatrix = glm::translate(modelMatrix, glm::vec3(0.0f, 1.0f, ↩ 0.0f));
float rot = (-90.0f / 180.0f) * M_PI;
modelMatrix = glm::rotate(modelMatrix, rot, glm::vec3(1, 0, 0));
// Set the Normal Matrix
glm::mat4 normalMatrix = glm::transpose( glm::inverse( viewMatrix↩ * modelMatrix ) );
// Pass the matrices to the GPU memory
glUniformMatrix4fv(nMatID, 1, GL_FALSE, glm::value_ptr(↩normalMatrix));
glUniformMatrix4fv(pMatID, 1, GL_FALSE, glm::value_ptr(projMatrix↩));
glUniformMatrix4fv(vMatID, 1, GL_FALSE, glm::value_ptr(viewMatrix↩));
glUniformMatrix4fv(mMatID, 1, GL_FALSE, glm::value_ptr(↩modelMatrix));
// Set material for this object
glm::vec3 kd( 0.2, 0.2, 1.0 );
glm::vec3 ka = kd * 0.15f;
glm::vec3 ks( 1.0, 1.0, 1.0 );
float phongExp = 32.0;
glUniform3fv(kaID, 1, glm::value_ptr(ka));
glUniform3fv(kdID, 1, glm::value_ptr(kd));
glUniform3fv(ksID, 1, glm::value_ptr(ks));
glUniform1f(phongExpID, phongExp);
// Process the object and note that modelData.size() holds
// the number of vertices, not the number of triangles!
glBindVertexArray(VAO);
glDrawArrays(GL_TRIANGLES, 0, modelData.size());
glBindVertexArray(0);
glUseProgram( 0 );
Instancing with OpenGL is implemented differently than instancing with the ray tracer. With the ray tracer, rays are inversely transformed into the local space of the object using the model transform matrix. With OpenGL, instancing is performed by loading a single copy of the object as a vertex array object (with associated vertex buffer objects), and then reusing the geometry as needed. Like the ray tracer, only a single object is loaded into memory, but many may be rendered.
OpenGL 中的实例化与光线追踪器中的实例化实现方式不同。使用光线追踪器时,光线会使用模型变换矩阵逆变换到对象的局部空间中。使用 OpenGL 时,实例化是通过将对象的单个副本加载为顶点数组对象(带有关联的顶点缓冲区对象)然后根据需要重复使用几何图形来执行的。与光线追踪器一样,只有一个对象会加载到内存中,但可能会渲染多个对象。
Modern OpenGL nicely supports this style of instancing because vertex shaders can (and must) compute the necessary transformations to transform vertices into clip coordinates. By writing generalized shaders that embed these transformations, such as presented with the Blinn-Phong vertex shader, models can be rerendered with the same underlying local geometry. Different material types and transforms can be queried from higher-level class structures to populate the uniform variables passed from host to device each frame. Animations and interactive control are also easily created as the model transforms can change over time across the the display loop iteration. Figures 17.8 and 17.9 use the memory footprint of one dragon, yet render three different dragon models to the screen.
现代 OpenGL 很好地支持这种实例化样式,因为顶点着色器可以(并且必须)计算将顶点转换为剪辑坐标所需的变换。通过编写嵌入这些变换的通用着色器(例如使用 Blinn-Phong 顶点着色器呈现的),可以使用相同的底层局部几何体重新渲染模型。可以从更高级别的类结构查询不同的材质类型和变换,以填充每帧从主机传递到设备的统一变量。动画和交互式控制也很容易创建,因为模型变换可以在显示循环迭代中随时间变化。图 17.8和17.9使用一条龙的内存占用,但将三个不同的龙模型渲染到屏幕上。
Figure 17.8. The results of running the Blinn-Phong shader program on the three dragons using uniform variables to specify material properties and transformations.
图 17.8.使用统一变量指定材料属性和变换,在三条龙上运行 Blinn-Phong 着色器程序的结果。
Figure 17.9. Setting the uniform variable ks = (0, 0, 0) in the Blinn-Phong shader program produces Lambertian shading.
图 17.9。在 Blinn-Phong 着色器程序中设置统一变量k s = (0, 0, 0) 会产生 Lambertian 着色。
Textures are an effective means to manipulate visual effects with OpenGL shaders. They are used extensively with many hardware-based graphics algorithms and OpenGL supports them natively with Texture objects. Like the previous OpenGL concepts, texture objects must be allocated and initialized by copying data on the host to the GPU memory and setting OpenGL state. Texture coordinates are often integrated into the vertex buffer objects and passed as vertex attributes to shader programs. Fragment shaders typically perform the texture lookup function using interpolated texture coordinate passed from the vertex shaders.
纹理是使用 OpenGL 着色器操纵视觉效果的有效手段。它们广泛用于许多基于硬件的图形算法,OpenGL 通过纹理对象原生支持它们。与之前的 OpenGL 概念一样,必须通过将主机上的数据复制到 GPU 内存并设置 OpenGL 状态来分配和初始化纹理对象。纹理坐标通常集成到顶点缓冲区对象中,并作为顶点属性传递给着色器程序。片段着色器通常使用从顶点着色器传递的插值纹理坐标执行纹理查找功能。
Textures are rather simple to add to your code if you already have working shader and vertex array objects. The standard OpenGL techniques for creating objects on the hardware are used with textures. However, the source of the texture data must first be determined. Data can either be loaded from a file (e.g., PNG, JPG, EXR, or HDR image file formats) or generated procedurally on the host (and even on the GPU). After the data is loaded into host memory, the data is copied to GPU memory, and optionally, OpenGL state associated with textures can be set. OpenGL texture data is loaded as a linear buffer of memory containing the data used for textures. Texture lookups on the hardware can be 1D, 2D, or 3D queries. Regardless of the texture dimension query, the data is loaded onto the memory in the same way, using linearly allocated memory on the host. In the following example, the process of loading data from an image file (or generating it procedurally) is left to the reader, but variable names are provided that match what might be present if an image is loaded (e.g., imgData, imgWidth, imgHeight).
如果您已经有可用的着色器和顶点数组对象,那么将纹理添加到代码中相当简单。在硬件上创建对象的标准 OpenGL 技术与纹理一起使用。但是,必须首先确定纹理数据的来源。数据可以从文件(例如 PNG、JPG、EXR 或 HDR 图像文件格式)加载,也可以在主机(甚至在 GPU 上)上程序化生成。将数据加载到主机内存后,数据将复制到 GPU 内存,并且可以选择设置与纹理相关的 OpenGL 状态。OpenGL 纹理数据作为包含用于纹理的数据的内存线性缓冲区加载。硬件上的纹理查找可以是 1D、2D 或 3D 查询。无论纹理维度查询如何,数据都以相同的方式加载到内存中,使用主机上线性分配的内存。在下面的例子中,从图像文件加载数据(或在程序中生成数据)的过程留给读者,但提供的变量名与加载图像时可能存在的变量名相匹配(例如,imgData、imgWidth、imgHeight)。
float *imgData = new float[ imgHeight * imgWidth * 3 ];
...
GLuint texID;
glGenTextures(1, &texID);
glBindTexture(GL_TEXTURE_2D, texID);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, imgWidth, imgHeight, 0,
GL_RGB, GL_fLOAT, imgData);
glBindTexture(GL_TEXTURE_2D, 0);
delete [] imgData;
The example presented here highlights how to set up and use basic 2D OpenGL textures with shader programs. The process for creating OpenGL objects should be familiar by now. A handle (or ID) must be generated on the device to refer to the texture object (e.g., in this case, texID). The id is then bound to allow any subsequent texture state operations to affect the state of the texture. A fairly extensive set of OpenGL texture state and parameters exist that affect texture coordinate interpretation and texture lookup filtering. Various texture targets exist with graphics hardware. In this case, the texture target is specified as GL_TEXTURE_2D and will appear as the first argument in the texture-related functions. For OpenGL this particular texture target implies that texture coordinates will be specified in a device normalized manner (i.e., in the range of [0, 1]). Moreover, texture data must be allocated so that the width and height dimensions are powers of two (e.g., 512 × 512, 1024 × 512, etc.). Texture parameters are set for the currently bound texture by calling glTexParameter. This signature for this function takes on a variety of forms depending on the types of data being set. In this case, texture coordinates will be clamped by the hardware to the explicit range [0, 1]. The minifying and magnifying filters of OpenGL texture objects are set to use linear filtering (rather than nearest neighbor - GL_NEAREST) automatically when performing texture lookups. Chapter 11 provides substantial details on texturing, including details about the filtering that can occur with texture lookups. Graphics hardware can perform many of these operations automatically by setting the associated texture state.
这里介绍的示例重点介绍了如何设置和使用着色器程序中的 2D OpenGL 基本纹理。现在您应该已经熟悉了创建 OpenGL 对象的过程。必须在设备上生成一个句柄(或 ID)来引用纹理对象(例如,在本例中为texID )。然后绑定 id 以允许任何后续纹理状态操作影响纹理的状态。存在一组相当广泛的 OpenGL 纹理状态和参数,它们会影响纹理坐标解释和纹理查找过滤。图形硬件存在各种纹理目标。在本例中,纹理目标被指定为GL_TEXTURE_2D ,并将作为纹理相关函数中的第一个参数出现。对于 OpenGL,这个特定的纹理目标意味着纹理坐标将以设备规范化的方式指定(即在 [0, 1] 范围内)。此外,必须分配纹理数据,使宽度和高度尺寸为 2 的幂(例如,512 × 512、1024 × 512 等)。通过调用glTexParameter来设置当前绑定纹理的纹理参数。此函数的签名根据设置的数据类型采用多种形式。在这种情况下,纹理坐标将由硬件限制在明确的范围 [0, 1] 内。在执行纹理查找时,OpenGL 纹理对象的缩小和放大过滤器自动设置为使用线性过滤(而不是最近邻 - GL_NEAREST )。第 11 章提供了有关纹理的大量细节,包括有关纹理查找中可能发生的过滤的细节。 图形硬件可以通过设置相关的纹理状态自动执行许多这些操作。
Finally, the call to glTexImage2D performs the host to device copy for the texture. There are several arguments to this function, but the overall operation is to allocate space on the graphics card (e.g., imageWidth X imgHeight)ofthree floats (7th and 8th arguments: GL_RGB and GL_FLOAT) and copy the linear texture data to the hardware (e.g., imgData pointer). The remaining arguments deal with setting the mipmap level of detail (2nd argument), specifying the internal format (e.g., 3rd argument’s GL_RGB) and whether the texture has a border or not (6th argument). When learning OpenGL textures it is safe to keep these as the defaults listed here. However, the reader is advised to learn more about mipmaps and the potential internal formats of textures as more advanced graphics processing is required.
最后,对glTexImage2D的调用执行纹理从主机到设备的复制。此函数有几个参数,但总体操作是在显卡上分配三个浮点数(第 7 和第 8 个参数: GL_RGB和GL_FLOAT )的空间(例如, imageWidth X imgHeight ),并将线性纹理数据复制到硬件(例如, imgData指针)。其余参数用于设置 mipmap 细节级别(第 2 个参数)、指定内部格式(例如,第 3 个参数的GL_RGB )以及纹理是否有边框(第 6 个参数)。学习 OpenGL 纹理时,可以安全地将这些保留为此处列出的默认值。但是,建议读者了解有关 mipmap 和纹理的潜在内部格式的更多信息,因为需要更高级的图形处理。
Texture object allocation and initialization happens with the code above. Additional modifications must be made to vertex buffers and vertex array objects to link in the correct texture coordinates with the geometric description. Following the previous examples, the storage for texture coordinates is a straightforward modification to the vertex data structure:
纹理对象分配和初始化发生在上述代码中。必须对顶点缓冲区和顶点数组对象进行额外的修改,以将正确的纹理坐标与几何描述链接起来。按照前面的示例,纹理坐标的存储是对顶点数据结构的直接修改:
struct vertexData
{
glm::vec3 pos;
glm::vec3 normal;
glm::vec2 texCoord;
};
As a result, the vertex buffer object will increase in size and the interleaving of texture coordinates will require a change to the stride in the vertex attribute specification for the vertex array objects. Figure 17.10 illustrates the basic interleaving of data within the vertex buffer.
因此,顶点缓冲区对象的大小将会增加,而纹理坐标的交错将需要更改顶点数组对象的顶点属性规范中的步幅。图 17.10说明了顶点缓冲区内数据的基本交错。
Figure 17.10. Data layout after adding the texture coordinate to the vertex buffer. Each block represents a GLfloat, which is 4 bytes. The position is encoded as a white block, the normals as purple, and the texture coordinates as orange.
图 17.10。将纹理坐标添加到顶点缓冲区后的数据布局。每个块代表一个 GLfloat,占 4 个字节。位置编码为白色块,法线编码为紫色,纹理坐标编码为橙色。
glBindBuffer(GL_ARRAY_BUFFER, m_triangleVBO[0]);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 8 * sizeof(GLfloat), 0);
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 8 * sizeof(GLfloat), (const GLvoid *)12);
glEnableVertexAttribArray(2);
glVertexAttribPointer(2, 2, GL_FLOAT, GL_FALSE, 8 * sizeof(GLfloat), (const GLvoid *)24);
glBindVertexArray(0);
With the code snippet above, the texture coordinates are placed at vertex attribute location 2. Note the change in size of the texture coordinate’s size (e.g., 2nd argument of glVertexAttribPointer is 2 for texture coordinates to coincide with the vec2 type in the structure). At this point, all initialization will have been completed for the texture object.
使用上面的代码片段,纹理坐标被放置在顶点属性位置 2。注意纹理坐标的大小变化(例如, glVertexAttribPointer的第二个参数为 2,以使纹理坐标与结构中的vec2类型一致)。此时,纹理对象的所有初始化都将完成。
The texture object must be enabled (or bound) prior to rendering the vertex array object with your shaders. In general, graphics hardware allows the use of multiple texture objects when executing a shader program. In this way, shader programs can apply sophisticated texturing and visual effects. Thus, to bind a texture for use with a shader, it must be associated to one of potentially many texture units. Texture units represent the mechanism by which shaders can use multiple textures. In the sample below, only one texture is used so texture unit 0 will be made active and bound to our texture.
在使用着色器渲染顶点数组对象之前,必须先启用(或绑定)纹理对象。通常,图形硬件允许在执行着色器程序时使用多个纹理对象。这样,着色器程序就可以应用复杂的纹理和视觉效果。因此,要绑定纹理以供着色器使用,必须将其与可能的许多纹理单元之一相关联。纹理单元表示着色器可以使用多个纹理的机制。在下面的示例中,仅使用了一个纹理,因此纹理单元 0 将处于活动状态并绑定到我们的纹理。
The function that activates a texture unit is glActiveTexture. Its only argument is the texture unit to make active. It is set to GL_TEXTURE0 below, but it could be GL_TEXTURE1 or GL_TEXTURE2, for instance, if multiple textures were needed in the shader. Once a texture unit is made active, a texture object can be bound to it using the glBindTexture call.
激活纹理单元的函数是glActiveTexture 。它的唯一参数是要激活的纹理单元。它在下面设置为GL_TEXTURE0 ,但它可以是GL_TEXTURE1或GL_TEXTURE2 ,例如,如果着色器中需要多个纹理。一旦纹理单元处于活动状态,就可以使用glBindTexture调用将纹理对象绑定到它。
glUseProgram(shaderID);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, texID);
glUniform1i(texUnitID, 0);
glBindVertexArray(VAO);
glDrawArrays(GL_TRIANGLES, 0, 3);
glBindVertexArray(0);
glBindTexture(GL_TEXTURE_2D, 0);
glUseProgram(0);
Most of the code above should be logical extensions to what you’ve developed thus far. Note the call to glUniform prior to rendering the vertex array object. In modern graphics hardware programming, shaders perform the work of texture lookups and blending, and therefore, must have data about which texture units hold the textures used in the shader. The active texture units are supplied to shaders using uniform variables. In this case, 0 is set to indicate that the texture lookups will come from texture unit 0. This will be expanded upon in the following section.
上面的大部分代码应该是您迄今为止开发的代码的逻辑扩展。请注意在渲染顶点数组对象之前对glUniform的调用。在现代图形硬件编程中,着色器执行纹理查找和混合的工作,因此必须具有关于哪些纹理单元保存着色器中使用的纹理的数据。使用统一变量将活动纹理单元提供给着色器。在这种情况下,设置为 0 以指示纹理查找将来自纹理单元 0。这将在下一节中展开。
Shader programs perform the lookup and any blending that may be required. The bulk of that computation typically goes into the fragment shader, but the vertex shader often stages the fragment computation by passing the texture coordinate out to the fragment shader. In this way, the texture coordinates will be interpolated and afford per-fragment lookup of texture data.
着色器程序执行查找和可能需要的任何混合。大部分计算通常进入片段着色器,但顶点着色器通常通过将纹理坐标传递给片段着色器来分阶段进行片段计算。这样,纹理坐标将被插值并提供纹理数据的每个片段查找。
Simple changes are required to use texture data in shader programs. Using the Blinn-Phong vertex shader provided previously, only three changes are needed:
在着色器程序中使用纹理数据需要进行一些简单的更改。使用前面提供的 Blinn-Phong 顶点着色器,只需要进行三处更改:
The texture coordinates are a per-vertex attribute stored within the vertex array object. They are associated with vertex attribute index 2 (or location 2).
纹理坐标是存储在顶点数组对象内的每个顶点属性。它们与顶点属性索引 2(或位置 2)相关联。
layout(location=2) in vec2 in_TexCoord;
The fragment shader will perform the texture lookup and will need an interpolated texture coordinate. This variable will be added as an output variable that gets passed to the fragment shader.
片段着色器将执行纹理查找,并需要插值纹理坐标。此变量将作为输出变量添加,并传递给片段着色器。
out vec2 tCoord;
Copy the the incoming vertex attribute to the output variable in the main function.
将传入的顶点属性复制到主函数中的输出变量。
// Pass the texture coordinate to the fragment shader tCoord = in_TexCoord;
The fragment shader also requires simple changes. First, the incoming interpolated texture coordinates passed from the vertex shader must be declared. Also recall that a uniform variable should store the texture unit to which the texture is bound. This must be communicated to the shader as a sampler type. Samplers are a shading language type that allows the lookup of data from a single texture object. In this example, only one sampler is required, but in shaders in which multiple texture lookups are used, multiple sampler variables will be used. There are also multiple sampler types depending upon the type of texture object. In the example presented here, a GL_TEXTURE 2D type was used to create the texture state. The associated sampler within the fragment shader is of type sampler2D. The following two variable declarations must be added to the fragment shader:
片段着色器也需要进行一些简单的更改。首先,必须声明从顶点着色器传递过来的传入插值纹理坐标。另外回想一下,统一变量应该存储纹理绑定到的纹理单元。这必须作为采样器类型传达给着色器。采样器是一种着色语言类型,允许从单个纹理对象查找数据。在此示例中,只需要一个采样器,但在使用多个纹理查找的着色器中,将使用多个采样器变量。根据纹理对象的类型,还有多种采样器类型。在此处介绍的示例中,使用GL_TEXTURE 2D类型来创建纹理状态。片段着色器中关联的采样器类型为sampler2D 。必须将以下两个变量声明添加到片段着色器:
in vec2 tCoord;
uniform sampler2D textureUnit;
The final modification goes into the main function of the fragment shader code. The texture is sampled using the GLSL texture lookup function and (in this case), replaces the diffuse coefficient of the geometry. The first argument to texture takes the sampler type which holds the texture unit to which the texture is bound. The second argument is the texture coordinate. The function returns a vec4 type. In the code snippet below, no alpha values are utilized in the final computation so the resulting texture lookup value is component-wise selected to only the RGB components. The diffuse coefficient from the texture lookup is set to a vec3 type that is used in the illumination equation.
最后的修改进入片段着色器代码的主函数。使用 GLSL纹理查找函数对纹理进行采样,并(在本例中)替换几何体的漫反射系数。纹理的第一个参数采用采样器类型,该类型保存纹理绑定到的纹理单元。第二个参数是纹理坐标。该函数返回vec4类型。在下面的代码片段中,最终计算中未使用任何 alpha 值,因此生成的纹理查找值是按组件选择的,仅选择 RGB 组件。纹理查找中的漫反射系数设置为照明方程中使用的vec3类型。
vec3 kdTexel = texture(textureUnit, tCoord).rgb;
vec3 intensity = ka * Ia + kdTexel * light.intensity
* max( 0.0, dot(n, l) ) + ks * light.intensity
* pow( max( 0.0, dot(n, h) ), phongExp );
Figure 17.11 illustrates the results of using these shader modifications. The right-most image in the figure extends the example code by enabling texture tiling with the OpenGL state. Note that these changes are only done in host code and the shaders do not change. To enable this tiling, which allows for texture coordinates outside of the device normalized ranges, the texture parameters for GL_TEXTURE WRAP S and GL_TEXTURE_WRAP_T are changed from GL_CLAMP to GL_REPEAT. Additionally, the host code that sets the texture coordinates now ranges from [0, 5].
图 17.11说明了使用这些着色器修改的结果。图中最右边的图像通过启用 OpenGL 状态下的纹理平铺来扩展示例代码。请注意,这些更改仅在主机代码中完成,着色器不会更改。为了启用此平铺(允许纹理坐标超出设备规范化范围), GL_TEXTURE WRAP S和GL_TEXTURE_WRAP_T的纹理参数从GL_CLAMP更改为GL_REPEAT 。此外,设置纹理坐标的主机代码现在范围为 [0, 5]。
Figure 17.11. The left-most image shows the texture, a 1024 × 1024 pixel image. The middle image shows the scene with the texture applied using texture coordinates in the range of [0, 1] so that only one image is tiled onto the ground plane. The right-most image modifies the texture parameters so that GL_REPEAT is used for GL_TEXTURE_WRAP_S and GL_TEXTURE_WRAP_T and the texture coordinate range from [0, 5]. The result is a tiled texture repeat five times in both texture dimensions.
图 17.11。最左侧的图像显示纹理,即 1024 × 1024 像素的图像。中间的图像显示使用 [0, 1] 范围内的纹理坐标应用纹理的场景,因此只有一个图像平铺到地面上。最右侧的图像修改了纹理参数,以便GL_REPEAT用于GL_TEXTURE_WRAP_S和GL_TEXTURE_WRAP_T ,纹理坐标范围为 [0, 5]。结果是平铺纹理在两个纹理维度上重复五次。
As a side note, another texture target that may be useful for various applications is the GL_TEXTURE_RECTANGLE. Texture rectangle are unique texture objects that are not constrained with the power-of-two width and height image requirements and use non-normalized texture coordinates. Furthermore, they do not allow repeated tiling. If texture rectangles are used, shaders must reference them using the special sampler type: sampler2DRect.
顺便提一下,另一个可能对各种应用程序有用的纹理目标是GL_TEXTURE_RECTANGLE 。纹理矩形是唯一的纹理对象,不受 2 的幂宽度和高度图像要求的限制,并使用非规范化的纹理坐标。此外,它们不允许重复平铺。如果使用纹理矩形,着色器必须使用特殊采样器类型来引用它们: sampler2DRect 。
As your familiarity with OpenGL increases, it becomes wise to encapsulate most of what is described in this chapter into class structures that can contain the model specific data and afford rendering of a variety of objects within the scene. For instance, in Figure 17.12, a single sphere is instanced six times to create the three ellipsoids and three spheres. Each model uses the same underlying geometry yet has different material properties and model transforms. If you’ve followed through the book and implemented the ray tracer, as detailed in Chapter 4, then it is likely that your implementation is based on a solid object-oriented design. That design can be leveraged to make developing a graphics hardware program with OpenGL easier. A typical ray tracer software architecture will include several classes that map directly into graphics hardware as well as software rasterization applications. The abstract base classes in the ray tracer that represent surfaces, materials, lights, shaders, and cameras can be adapted to initialize the graphics hardware state, update that state, and if necessary render the class data to the framebuffer. The interfaces to these virtual functions will likely need to be adapted to your specific implementation, but a first pass that extends the surface class design might resemble the following:
随着您对 OpenGL 的熟悉程度不断提高,将本章中描述的大部分内容封装到类结构中是明智之举,这些类结构可以包含模型特定数据并支持渲染场景中的各种对象。例如,在图 17.12中,单个球体被实例化六次以创建三个椭圆体和三个球体。每个模型使用相同的底层几何体,但具有不同的材料属性和模型变换。如果您按照本书的说明并实现了光线跟踪器(如第 4 章所述),那么您的实现很可能基于可靠的面向对象设计。可以利用该设计使使用 OpenGL 开发图形硬件程序变得更容易。典型的光线跟踪器软件架构将包括几个直接映射到图形硬件以及软件光栅化应用程序的类。光线跟踪器中表示表面、材料、灯光、着色器和相机的抽象基类可以适应初始化图形硬件状态、更新该状态,并在必要时将类数据渲染到帧缓冲区。这些虚拟函数的接口可能需要根据您的具体实现进行调整,但扩展表面类设计的第一步可能类似于以下内容:
class surface
类表面
virtual bool initializeOpenGL()
虚拟 bool 初始化OpenGL()
virtual bool renderOpenGL( glm::mat4& Mp, glm::mat4& Mcam)
虚拟 bool renderOpenGL( glm::mat4& M p , glm::mat4& M cam )
Figure 17.12. On the left, a single tessellated sphere is instanced six times using different model transforms to create this scene using the per-fragment shader program. The image on the right is rendered using a basic Whitted ray tracer. Notice the effect that shadows have on the perception of the scene. Per-fragment shading allows the specular highlight to be similar in both rendering styles.
图 17.12。左侧,单个镶嵌球体使用不同的模型变换实例化六次,以使用逐片段着色器程序创建此场景。右侧的图像使用基本的 Whitted 光线跟踪器渲染。注意阴影对场景感知的影响。逐片段着色允许镜面高光在两种渲染样式中相似。
Passing the projection and view matrices to the render functions affords an indirection for how these matrices are managed. These matrices would come from the camera classes which may be manipulated by interpreting keyboard, mouse, or joystick input. The initialization functions (at least for the surface derivatives) would contain the vertex buffer object and vertex array object allocation and initialization code. Aside from initiating the draw arrays for any vertex array objects, the render function would also need to activate shader programs and pass in the necessary matrices into the shaders, as illustrated previously in the dragon model example. As you work to integrate the image-order and object-order (hardware and software) algorithms into the same underlying data framework, a few software design challenges will pop up, mostly related to data access and organization. However, this is a highly useful exercise to become adept at software engineering for graphics programming and eventually gain solid experience hybridizing your rendering algorithms.
将投影和视图矩阵传递给渲染函数为这些矩阵的管理提供了一种间接方式。这些矩阵将来自相机类,可以通过解释键盘、鼠标或操纵杆输入来操纵它们。初始化函数(至少对于表面导数)将包含顶点缓冲区对象和顶点数组对象分配和初始化代码。除了为任何顶点数组对象启动绘制数组之外,渲染函数还需要激活着色器程序并将必要的矩阵传入着色器,如前面龙模型示例中所示。当您努力将图像顺序和对象顺序(硬件和软件)算法集成到相同的底层数据框架中时,会出现一些软件设计挑战,主要与数据访问和组织有关。然而,这是一个非常有用的练习,可以让您熟练掌握图形编程的软件工程,并最终获得混合渲染算法的丰富经验。
This chapter was designed to provide an introductory glimpse into graphics hardware programming, influenced by the OpenGL API. There are many directions that your continued learning could go. Many topics, such as framebuffer objects, render to texture, environment mapping, geometry shaders, compute shaders, and advanced illumination shaders were not covered. These areas represent the next stages in learning about graphics hardware, but even within the areas covered, there are many directions that one could go to develop stronger graphics hardware understanding. Graphics hardware programming will continue to evolve and change. Interested readers should expect these changes and look to the specification documents for OpenGL and the OpenGL Shading Language for many more details about what OpenGL is capable of doing and how the hardware relates to those computations.
本章旨在提供受 OpenGL API 影响的图形硬件编程的入门介绍。您可以朝许多方向继续学习。许多主题,如帧缓冲区对象、渲染到纹理、环境映射、几何着色器、计算着色器和高级照明着色器均未涵盖。这些领域代表了学习图形硬件的下一阶段,但即使在涵盖的领域内,也有许多方向可以加深对图形硬件的理解。图形硬件编程将继续发展和变化。感兴趣的读者应该期待这些变化,并查看 OpenGL 和 OpenGL 着色语言的规范文档,了解有关 OpenGL 能够做什么以及硬件与这些计算的关系的更多详细信息。
How do I debug shader programs?
如何调试着色器程序?
On most platforms, debugging both vertex shaders and fragment shaders is not simple. However, more and more support is available through various drivers, operating system extensions, and IDEs to provide pertinent information to the developer. It still can be challenging, so use the shaders to visually debug your code. If nothing comes up on the screen, try rendering the normal vectors, the half vector, or anything that give you a sense for where the error might be (or not be). Figure 17.13 illustrates a normal shader in operation. If images do appear on your window, make sure they are what you expect (refer to Figure 17.14)!
在大多数平台上,调试顶点着色器和片段着色器并不简单。但是,通过各种驱动程序、操作系统扩展和 IDE 提供的支持越来越多,可以为开发人员提供相关信息。这仍然具有挑战性,因此请使用着色器以可视化方式调试代码。如果屏幕上没有显示任何内容,请尝试渲染法线向量、半向量或任何可以让您了解错误可能出现(或不出现)的位置的内容。图 17.13说明了正在运行的正常着色器。如果图像确实出现在您的窗口中,请确保它们是您期望的(参见图 17.14 )!
Figure 17.13. Applying the normal shader to a complex model for debugging purposes.
图 17.13.将普通着色器应用于复杂模型以进行调试。
Figure 17.14. Visual debugging is important! Can you figure out what is wrong from the image or where to start debugging? When the incorrect stride is applied to the vertex array object, rendering goes awry.
图 17.14。视觉调试很重要!您能从图像中找出问题所在或从哪里开始调试吗?当对顶点数组对象应用错误的步幅时,渲染就会出错。
There are many good resources available to learn more about the technical details involved with programming graphics hardware. A good starting point might be the OpenGL and GLSL specification documents. They are available for free online at the opengl.org website. These documents will provide complete details for all the different and emerging versions of OpenGL.
有许多很好的资源可用于了解有关编程图形硬件所涉及的技术细节。OpenGL 和 GLSL 规范文档可能是一个很好的起点。它们可在opengl.org网站上免费在线获取。这些文档将提供所有不同版本和新兴版本的 OpenGL 的完整详细信息。
The sections of this chapter are roughly organized to step students through the process of creating a modern OpenGL application. Some extra effort will be required to understand the details relating to setting up windows and OpenGL contexts. However, it should be possible to following the sections for a set of weekly one hour labs:
本章的各部分大致组织为指导学生完成创建现代 OpenGL 应用程序的过程。需要付出一些额外的努力才能理解与设置窗口和 OpenGL 上下文有关的细节。但是,应该可以按照这些部分进行每周一小时的实验:
1. Lab 1: Basic code setup for OpenGL applications. This includes installing the necessary drivers and related software such as GLM and GLfW. Students can then write code to open a window and clear the color buffers.
1.实验 1:OpenGL 应用程序的基本代码设置。这包括安装必要的驱动程序和相关软件,例如 GLM 和 GLfW。然后学生可以编写代码来打开窗口并清除颜色缓冲区。
2. Lab 2: Creating a shader. Since a rudimentary shader is necessary to visualize the output in modern OpenGL, starting with efforts to create a very basic shader will go a long way. In this lab, or labs, students could build (or use provided) classes to load, compile, and link shaders into shader programs.
2.实验 2:创建着色器。由于基本的着色器对于在现代 OpenGL 中可视化输出必不可少,因此从创建非常基本的着色器开始将大有裨益。在这个实验或多个实验中,学生可以构建(或使用提供的)类来加载、编译和链接着色器到着色器程序中。
3. Lab 3: Create a clip coordinate triangle and shade. Using the shader classes from the previous lab, students will add the passthrough shader and create simple geometry to render.
3.实验 3:创建裁剪坐标三角形并着色。使用上一个实验中的着色器类,学生将添加直通着色器并创建要渲染的简单几何体。
4. Lab 4: Introduce GLM. Start using GLM to generate projection matrices and viewing matrices for viewing more generalized, yet simple, scenes.
4.实验 4:介绍 GLM。开始使用 GLM 生成投影矩阵和查看矩阵,以查看更通用但更简单的场景。
5. Lab 5: Use GLM for local transformations. Students can expand their working shader program to use local transforms, perhaps applying animations based on changing transforms.
5.实验 5:使用 GLM 进行局部变换。学生可以扩展他们的工作着色器程序以使用局部变换,或许可以根据变换的变化应用动画。
6. Lab 6: Shader development. Develop the Lambertian or Blinn-Phong shaders.
6.实验 6:着色器开发。开发 Lambertian 或 Blinn-Phong 着色器。
7. Lab 7: Work with materials. Students can explore additional material properties and rendering styles with different shader programs.
7.实验 7:使用材料。学生可以使用不同的着色器程序探索其他材料属性和渲染样式。
8. Lab 8: Load 3D models. Using code to load OBJ files, students can further explore the capabilities of their graphics hardware including the limits of hardware processing for real-time applications.
8.实验 8:加载 3D 模型。使用代码加载 OBJ 文件,学生可以进一步探索图形硬件的功能,包括实时应用程序硬件处理的极限。
9. Lab 9: Textures. Using PNG (or other formats), students can load images onto the hardware and practice a variety of texture-mapping strategies.
9.实验 9:纹理。使用 PNG(或其他格式),学生可以将图像加载到硬件上并练习各种纹理映射策略。
10. Lab 10: Integration with rendering code. If scene files are used to describe scenes for the ray tracer (or rasterizer), students’ OpenGL code can be integrated into a complete rendering framework using common structures and classes to build a complete system.
10.实验 10:与渲染代码集成。如果使用场景文件来描述光线追踪器(或光栅化器)的场景,则可以使用通用结构和类将学生的 OpenGL 代码集成到完整的渲染框架中,以构建完整的系统。
This list is only a guide. In labs for my computer graphics course, students are provided material to get them started on the week’s idea. After they get the basic idea working, the lab is completed once they add their spin or a creative exploration of the idea to their code. As students get familiar with graphics hardware programming, they can explore additional areas of interest, such as textures, render to texture, or more advanced shaders and graphics algorithms.
此列表仅供参考。在我的计算机图形学课程的实验室中,学生会获得材料,帮助他们开始实施本周的创意。在他们掌握基本创意后,只要他们在代码中添加自己的创意或创意探索,实验室就完成了。随着学生熟悉图形硬件编程,他们可以探索其他感兴趣的领域,例如纹理、渲染到纹理或更高级的着色器和图形算法。
Erik Reinhard and Garrett Johnson
Photons are the carriers of optical information. They propagate through media taking on properties associated with waves. At surface boundaries they interact with matter, behaving more as particles. They can also be absorbed by the retina, where the information they carry is transcoded into electrical signals that are subsequently processed by the brain. It is only there that a sensation of color is generated.
光子是光学信息的载体。它们通过具有与波相关的属性的介质传播。在表面边界,它们与物质相互作用,表现得更像粒子。它们也可以被视网膜吸收,它们携带的信息在视网膜上被转换成电信号,随后由大脑处理。只有在那里,才会产生色彩的感觉。
As a consequence, the study of color in all its guises touches upon several different fields: physics for the propagation of light through space, chemistry for its interaction with matter, and neuroscience and psychology for aspects relating to perception and cognition of color (Reinhard et al., 2008).
因此,对各种色彩的研究都涉及到多个不同的领域:物理学研究光在空间中的传播,化学研究光与物质的相互作用,神经科学和心理学研究与色彩感知和认知有关的方面(Reinhard 等,2008)。
In computer graphics, we traditionally take a simplified view of how light propagates through space. Photons travel along straight paths until they hit a surface boundary and are then reflected according to a reflection function of some sort. A single photon will carry a certain amount of energy, which is represented by its wavelength. Thus, a photon will have only one wavelength. The relationship between its wavelength λ and the amount of energy it carries (ΔE) is given by
在计算机图形学中,我们通常采用简化的视图来观察光在空间中的传播方式。光子沿着直线路径传播,直到它们到达表面边界,然后根据某种反射函数进行反射。单个光子将携带一定量的能量,该能量由其波长表示。因此,一个光子只有一个波长。其波长 λ 与其携带的能量 ( ΔE ) 之间的关系为
where ΔE is measured in electron volts (eV).
其中 Δ E的单位是电子伏特(eV)。
In computer graphics, it is not very efficient to simulate single photons; instead large collections of them are simulated at the same time. If we take a very large number of photons, each carrying a possibly different amount of energy, then together they represent a spectrum. A spectrum can be thought of as a graph where the number of photons is plotted against wavelength. Because two photons of the same wavelength carry twice as much energy as a single photon of that wavelength, this graph can also be seen as a plot of energy against wavelength. An example of a spectrum is shown in Figure 18.1. The range of wavelengths to which humans are sensitive is roughly between 380 and 800 nanometers (nm).
在计算机图形学中,模拟单个光子效率不高,而是同时模拟大量的光子。如果我们取大量的光子,每个光子携带的能量可能不同,那么它们一起就代表了一个光谱。光谱可以看作是光子数量与波长的关系图。因为两个相同波长的光子携带的能量是单个该波长的光子的两倍,所以该图也可以看作是能量与波长的关系图。图 18.1显示了一个光谱示例。人类敏感的波长范围大约在 380 到 800 纳米 (nm) 之间。
Figure 18.1. A spectrum describes how much energy is available at each wavelength λ, here measured as relative radiant power. This specific spectrum represents average daylight.
图 18.1。光谱描述了每个波长 λ 有多少能量,这里以相对辐射功率来衡量。这个特定的光谱代表平均日光。
When simulating light, it would therefore be possible to trace rays that each carry a spectrum. A renderer that accomplishes this is normally called a spectral renderer. From preceding chapters, it should be clear that we are not normally going through the expense of building spectral renderers. Instead, we replace spectra with representations that typically use red, green, and blue components. The reason that this is possible at all has to do with human vision and will be discussed in Section 18.1.
因此,在模拟光时,可以追踪每条带有光谱的光线。实现此功能的渲染器通常称为光谱渲染器。从前面的章节中可以清楚地看出,我们通常不会花费大量金钱来构建光谱渲染器。相反,我们用通常使用红色、绿色和蓝色成分的表示来代替光谱。这完全可能的原因与人类视觉有关,将在第 18.1 节中讨论。
Simulating light by tracing rays takes care of the physics of light, although it should be noted that several properties of light, including, for instance, polarization, diffraction, and interference, are not modeled in this manner.
通过追踪光线来模拟光可以解决光的物理问题,但需要注意的是,光的几种特性,例如偏振、衍射和干涉,并不能以这种方式建模。
At surface boundaries, we normally model what happens with light by means of a reflectance function. These functions can be measured directly by means of gonioreflectometers, leading to a large amount of tabled data, which can be more compactly represented by various different functions. Nonetheless, these reflectance functions are empirical in nature, i.e., they abstract away the chemistry that happens when a photon is absorbed and re-emitted by an electron. Thus, reflectance functions are useful for modeling in computer graphics, but do not offer an explanation as to why certain wavelengths of light are absorbed and others are reflected. We can therefore not use reflectance functions to explain why the light reflected off a banana has a spectral composition that appears to us as yellow. For that, we would have to study molecular orbital theory, a topic beyond the scope of this book.
在表面边界,我们通常通过反射函数来模拟光的发生情况。这些函数可以通过以下方式直接测量:测角反射计,产生大量的表格数据,这些数据可以用各种不同的函数更紧凑地表示。尽管如此,这些反射函数本质上是经验性的,也就是说,它们抽象出了光子被电子吸收和重新发射时发生的化学反应。因此,反射函数对于计算机图形学中的建模很有用,但不能解释为什么某些波长的光被吸收而其他波长的光被反射。因此,我们不能用反射函数来解释为什么从香蕉反射的光的光谱组成在我们看来是黄色的。为此,我们必须研究分子轨道理论,而这超出了本书的范围。
Finally, when light reaches the retina, it is transcoded into electrical signals that are propagated to the brain. A large part of the brain is devoted to processing visual signals, part of which gives rise to the sensation of color. Thus, even if we know the spectrum of light that is reflected off a banana, we do not know yet why humans associate the term “yellow” with it. Moreover, as we will find out in the remainder of this chapter, our perception of color is vastly more complicated than it would seem at first glance. It changes with illumination, varies between observers, and varies within an observer over time.
最后,当光线到达视网膜时,它会被转换成电信号,然后传播到大脑。大脑的很大一部分用于处理视觉信号,其中一部分会产生颜色的感觉。因此,即使我们知道香蕉反射的光谱,我们也不知道为什么人类会将“黄色”一词与香蕉联系起来。此外,正如我们将在本章的其余部分发现的那样,我们对颜色的感知比乍一看要复杂得多。它会随着照明而变化,因观察者而异,并且会随着观察者的时间而变化。
In other words, the spectrum of light coming off a banana is perceived in the context of an environment. To predict how an observer perceives a “banana spectrum” requires knowledge of the environment that contains the banana as well as the observer’s environment. In many instances, these two environments are the same. However, when we are displaying a photograph of a banana on a monitor, then these two environments will be different. As human visual perception depends on the environment the observer is in, it may perceive the banana in the photograph differently from how an observer directly looking at the banana would perceive it. This has a significant impact on how we should deal with color and illustrates the complexities associated with color.
换句话说,香蕉发出的光谱是在环境背景下感知的。要预测观察者如何感知“香蕉光谱”,需要了解包含香蕉的环境以及观察者的环境。在许多情况下,这两个环境是相同的。但是,当我们在显示器上显示香蕉的照片时,这两个环境就会有所不同。由于人类的视觉感知取决于观察者所处的环境,因此它对照片中香蕉的感知可能与直接看香蕉的观察者对香蕉的感知不同。这对我们如何处理颜色有着重大影响,并说明了与颜色相关的复杂性。
To emphasize the crucial role that human vision plays, we only have to look at the definition of color: “Color is the aspect of visual perception by which an observer may distinguish differences between two structure-free fields of view of the same size and shape, such as may be caused by differences in the spectral composition of the radiant energy concerned in the observation” (Wyszecki & Stiles, 2000). In essence, without a human observer there is no color.
为了强调人类视觉所起的关键作用,我们只需看看颜色的定义:“颜色是视觉感知的一个方面,观察者可以通过颜色区分两个大小和形状相同的无结构视野之间的差异,这些差异可能是由观察中涉及的辐射能的光谱成分差异引起的”(Wyszecki & Stiles,2000)。从本质上讲,没有人类观察者就没有颜色。
Luckily, much of what we know about color can be quantified, so that we can carry out computations to correct for the idiosyncrasies of human vision and thereby display images that will appear to observers the way the designer of those images intended. This chapter contains the theory and mathematics required to do so.
幸运的是,我们对颜色的了解大部分都可以量化,因此我们可以进行计算来纠正人类视觉的特性,从而向观察者显示图像,使其看起来与图像设计者的意图一致。本章包含实现此目的所需的理论和数学。
Colorimetry is the science of color measurement and description. Since color is ultimately a human response, color measurement should begin with human observation. The photodetectors in the human retina consist of rods and cones. The rods are highly sensitive and come into play in low-light conditions. Under normal lighting conditions, the cones are operational, mediating human vision. There are three cone types and together they are primarily responsible for color vision.
比色法是一门测量和描述颜色的科学。由于颜色最终是人类的反应,因此颜色测量应从人类观察开始。人类视网膜中的光电探测器由视杆细胞和视锥细胞组成。视杆细胞非常敏感,在弱光条件下发挥作用。在正常照明条件下,视锥细胞可以运作,调节人类视觉。视锥细胞有三种类型,它们共同主要负责颜色视觉。
Although it may be possible to directly record the electrical output of cones while some visual stimulus is being presented, such a procedure would be invasive, while at the same time ignoring the sometimes substantial differences between observers. Moreover, much of the measurement of color was developed well before such direct recording techniques were available.
尽管在呈现某些视觉刺激时,可能直接记录视锥细胞的电输出,但这种方法具有侵入性,同时忽略了观察者之间有时存在的巨大差异。此外,许多颜色测量方法早在此类直接记录技术出现之前就已开发出来。
The alternative is to measure color by means of measuring the human response to patches of color. This leads to color matching experiments, which will be described later in this section. Carrying out these experiments have resulted in several standardized observers, which can be thought of as statistical approximations of actual human observers. First, however, we need to describe some of the assumptions underlying the possibility of color matching, which are summarized by Grassmann’s laws.
另一种方法是通过测量人类对色块的反应来测量颜色。这导致了颜色匹配实验,本节后面将对此进行描述。进行这些实验产生了几个标准化观察者,可以将其视为实际人类观察者的统计近似值。但是,首先,我们需要描述一些颜色匹配可能性背后的假设,这些假设由格拉斯曼定律总结。
Given that humans have three different cone types, the experimental laws of color matching can be summed up as the trichromatic generalization (Wyszecki & Stiles, 2000), which states that any color stimulus can be matched completely with an additive mixture of three appropriately modulated color sources. This feature of color is often used in practice, for instance by televisions and monitors which reproduce many different colors by adding a mixture of red, green, and blue light for each pixel. It is also the reason that renderers can be built using only three values to describe each color.
鉴于人类有三种不同的视锥细胞类型,颜色匹配的实验定律可以总结为三色泛化(Wyszecki & Stiles,2000),即任何颜色刺激都可以与三种适当调制的颜色源的加法混合完全匹配。颜色的这一特性经常在实践中使用,例如电视和显示器通过为每个像素添加红、绿和蓝光的混合来再现许多不同的颜色。这也是渲染器可以仅使用三个值来描述每种颜色的原因。
The trichromatic generalization allows us to make color matches between any given stimulus and an additive mixture of three other color stimuli. Hermann Grassmann was the first to describe the algebraic rules to which color matching adheres. They are known as Grassmann’s laws of additive color matching (Grassmann, 1853) and are the following:
三色概括使我们能够在任何给定刺激物和三种其他颜色刺激物的加性混合物之间进行颜色匹配。赫尔曼·格拉斯曼 (Hermann Grassmann) 是第一个描述颜色匹配所遵循的代数规则的人。它们被称为格拉斯曼加性颜色匹配定律 (Grassmann, 1853),具体如下:
Symmetry law. If color stimulus A matches color stimulus B, then B matches A.
对称定律。如果颜色刺激物A与颜色刺激物B匹配,则B与A匹配。
Transitive law. If A matches B and B matches C, then A matches C.
传递律。如果A与B匹配,且B与C匹配,则A与 C匹配。
Proportionality law. If A matches B, then αA matches αB, where α is a positive scale factor.
比例定律。如果A与B匹配,则αA与αB匹配,其中α是正比例因子。
Additivity law. If A matches B, C matches D, and A + C matches B + D, then it follows that A + D matches B + C.
可加性定律。如果A与B匹配, C与D匹配,且A + C与B + D匹配,则A + D与B + C匹配。
The additivity law forms the basis for color matching and colorimetry as a whole.
加性定律构成了配色和整个比色法的基础。
Each cone type is sensitive to a range of wavelengths, spanning most of the full visible range. However, sensitivity to wavelengths is not evenly distributed, but contains a peak wavelength at which sensitivity is greatest. The location of this peak wavelength is different for each cone type. The three cone types are classified as S, M, and L cones, where the letters stand for short, medium, and long, indicating where in the visible spectrum the peak sensitivity is located.
每种视锥细胞类型都对一系列波长敏感,涵盖了整个可见光范围的大部分。但是,对波长的敏感度并不是均匀分布的,而是包含一个峰值波长,在该波长处敏感度最高。每种视锥细胞类型的峰值波长位置都不同。这三种视锥细胞类型分为 S、M 和 L 视锥细胞,其中字母代表短、中、长,表示峰值敏感度位于可见光谱中的位置。
The response of a given cone is then the magnitude of the electrical signal it outputs, as a function of the spectrum of wavelengths incident upon the cone. The cone response functions for each cone type as a function of wavelength λ are then given by L(λ), M (λ), and S(λ). They are plotted in Figure 18.2.
给定视锥细胞的响应是其输出电信号的幅度,是入射到视锥细胞上的波长谱的函数。每种视锥细胞类型的视锥细胞响应函数作为波长 λ 的函数,由L (λ)、 M (λ) 和S (λ) 给出。它们绘制在图 18.2中。
Figure 18.2. The cone response functions for L, M, and S cones.
图 18.2. L、M 和 S 视锥细胞的视锥细胞响应函数。
The actual response to a stimulus with a given spectral composition Φ(λ) is then given for each cone type by
对于给定光谱成分 Φ(λ) 的刺激,每种视锥细胞的实际响应由下式给出:
These three integrated responses are known as tristimulus values.
这三种综合反应被称为三刺激值。
Given that tristimulus values are created by integrating the product of two functions over the visible range, it is immediately clear that the human visual system does not act as a simple wavelength detector. Rather, our photo-receptors act as approximately linear integrators. As a result, it is possible to find two different spectral compositions, say Φ1(λ) and Φ2(λ), that after integration yield the same response (L, M, S). This phenomenon is known as metamerism, an example of which is shown in Figure 18.3.
鉴于三刺激值是通过在可见光范围内对两个函数的乘积进行积分而产生的,因此可以立即看出,人类视觉系统并不是一个简单的波长检测器。相反,我们的光感受器充当近似线性积分器。因此,可以找到两个不同的光谱成分,例如 Φ 1 (λ) 和 Φ 2 (λ),它们在积分后产生相同的响应 ( L, M, S )。这种现象称为同色异谱,图 18.3显示了一个例子。
Figure 18.3. Two stimuli Φ1(λ) and Φ2(λ) leading to the same tristimulus values after integration.
图 18.3.两个刺激 Φ 1 (λ) 和 Φ 2 (λ) 经过整合后产生相同的三刺激值。
Metamerism is the key feature of human vision that allows the construction of color reproduction devices, including the color figures in this book and anything reproduced on printers, televisions, and monitors.
同色异谱是人类视觉的主要特征,它使得构建色彩再现设备成为可能,包括本书中的彩色图形以及打印机、电视和显示器上再现的任何内容。
Color matching experiments also rely on the principle of metamerism. Suppose we have three differently colored light sources, each with a dial to alter its intensity. We call these three light sources primaries. We should now be able to adjust the intensity of each in such a way that when mixed together additively, the resulting spectrum integrates to a tristimulus value that matches the perceived color of a fourth unknown light source. When we carry out such an experiment, we have essentially matched our primaries to an unknown color. The positions of our three dials are then a representation of the color of the fourth light source.
颜色匹配实验也依赖于同色异谱的原理。假设我们有三个不同颜色的光源,每个光源都有一个刻度盘来改变其强度。我们将这三个光源称为原色。现在我们应该能够调整每个光源的强度,这样当它们混合在一起时,得到的光谱就会积分为一个三刺激值,与第四个未知光源的感知颜色相匹配。当我们进行这样的实验时,我们基本上将原色与未知颜色相匹配。那么,我们三个刻度盘的位置就代表了第四个光源的颜色。
In such an experiment, we have used Grassmann’s laws to add the three spectra of our primaries. We have also used metamerism, because the combined spectrum of our three primaries is almost certainly different from the spectrum of the fourth light source. However, the tristimulus values computed from these two spectra will be identical, having produced a color match.
在这样的实验中,我们利用格拉斯曼定律将原色的三个光谱相加。我们还利用了同色异谱,因为三原色的组合光谱几乎肯定不同于第四个光源的光谱。然而,从这两个光谱计算出的三刺激值将相同,从而产生颜色匹配。
Note that we do not actually have to know the cone response functions to carry out such an experiment. As long as we use the same observer under the same conditions, we are able to match colors and record the positions of our dials for each color. However, it is quite inconvenient to have to carry out such experiments every time we want to measure colors. For this reason, we do want to know the spectral cone response functions and average those for a set of different observers to eliminate interobserver variability.
请注意,我们实际上不必知道视锥细胞反应函数即可进行此类实验。只要我们在相同条件下使用相同的观察者,我们就能匹配颜色并记录每种颜色的刻度盘位置。但是,每次我们想要测量颜色时都必须进行此类实验,这非常不方便。因此,我们确实需要了解光谱视锥细胞反应函数,并针对一组不同的观察者计算平均数,以消除观察者之间的差异。
If we perform a color matching experiment for a large range of colors, carried out by a set of different observers, it is possible to generate an average color matching dataset. If we specifically use monochromatic light sources against which to match our primaries, we can repeat this experiment for all visible wavelengths. The resulting tristimulus values are then called spectral tristimulus values, and can be plotted against wavelength λ, shown in Figure 18.4.
如果我们由一组不同的观察者对大量颜色进行配色实验,就有可能生成一个平均配色数据集。如果我们专门使用单色光源来匹配我们的原色,我们可以对所有可见波长重复此实验。然后将得到的三刺激值称为光谱三刺激值,并可绘制为波长λ 的关系,如图 18.4所示。
Figure 18.4. Spectral tristimulus values averaged over many observers. The primaries where monochromatic light sources with wavelengths of 435.8, 546.1, and 700 nm.
图 18.4。多位观察者的平均光谱三刺激值。原色为波长为 435.8、546.1 和 700 nm 的单色光源。
By using a well-defined set of primary light sources, the spectral tristimulus values lead to three color matching functions. The Commission Internationale d’Eclairage (CIE) has defined three such primaries to be monochromatic light sources of 435.8, 546.1, and 700 nm, respectively. With these three monochromatic light sources, all other visible wavelengths can be matched by adding different amounts of each. The amount of each required to match a given wavelength λ is encoded in color matching functions, given by , and and plotted in Figure 18.4. Tristimulus values associated with these color matching functions are termed R, G,and B.
通过使用一组定义明确的原色光源,光谱三刺激值可产生三个配色函数。国际照明委员会 (CIE) 已将这三个原色定义为单色光源,分别为 435.8、546.1 和 700 nm。利用这三个单色光源,可以通过添加不同数量的每种光源来匹配所有其他可见波长。匹配给定波长λ所需的每种光源的数量编码在配色函数中,由 r¯(λ)g¯(λ) 给出,并且b ¯ ( λ )并绘制在图18.4中。与这些颜色匹配函数相关的三刺激值称为R 、 G和B。
Given that we are adding light, and light cannot be negative, you may have noticed an anomaly in Figure 18.4: to create a match for some wavelengths, it is necessary to subtract light. Although there is no such thing as negative light, we can use Grassmann’s laws once more, and instead of subtracting light from the mixture of primaries, we can add the same amount of light to the color that is being matched.
考虑到我们要添加光,而光不能为负,您可能已经注意到图 18.4中的一个异常:要匹配某些波长,必须减去光。虽然没有负光这样的东西,但我们可以再次使用格拉斯曼定律,而不是从原色混合物中减去光,而是将相同数量的光添加到要匹配的颜色中。
The CIE , and color matching functions allow us to determine if a spectral distribution Φ1 matches a second spectral distribution Φ2 by simply comparing the resulting tristimulus values obtained by integrating with these color matching functions:
CIE r¯(λ)g¯(λ),以及b ¯ ( λ )通过颜色匹配函数,我们只需比较通过这些颜色匹配函数积分得到的三刺激值,就能确定光谱分布 Φ 1是否与第二个光谱分布 Φ 2匹配:
Of course, a color match is only guaranteed if all three tristimulus values match.
当然,只有所有三个三刺激值都匹配,才能保证颜色匹配。
The importance of these color matching functions lies in the fact that we are now able to communicate and describe colors compactly by means of tristimulus values. For a given spectral function, the CIE color matching functions provide a precise way in which to calculate tristimulus values. As long as everybody uses the same color matching functions, it should always be possible to generate a match.
这些配色函数的重要性在于,我们现在能够通过三刺激值简洁地传达和描述颜色。对于给定的光谱函数,CIE 配色函数提供了一种计算三刺激值的精确方法。只要每个人都使用相同的配色函数,就应该总能生成匹配。
If the same color matching functions are not available, then it is possible to transform one set of tristimulus values into a different set of tristimulus values appropriate for a corresponding set of primaries. The CIE has defined one such a transform for two specific reasons. First, in the 1930s numerical integrations were difficult to perform, and even more so for functions that can be both positive and negative. Second, the CIE had already developed the photopic luminance response function, CIE V (λ). It became desirable to have three integrating functions, of which V (λ) is one and all three being positive over the visible range.
如果没有相同的配色函数,那么可以将一组三刺激值转换为适合相应原色组的另一组三刺激值。CIE 出于两个具体原因定义了一种这样的变换。首先,在 20 世纪 30 年代,数值积分很难执行,对于既可以为正也可以为负的函数来说更是如此。其次,CIE 已经开发了明视亮度响应函数 CIE V (λ)。人们希望有三个积分函数,其中V (λ) 为 1,并且所有三个在可见光范围内均为正。
To create a set of positive color matching functions, it is necessary to define imaginary primaries. In other words, to reproduce any color in the visible spectrum, we need light sources that cannot be physically realized. The color matching functions that were settled upon by the CIE are named , and and are shown in Figure 18.5. Note that is equal to the photopic luminance response function V (λ) and that each of these functions is indeed positive. They are known as the CIE 1931 standard observer.
要创建一组正色匹配函数,必须定义虚原色。换句话说,要重现可见光谱中的任何颜色,我们需要无法物理实现的光源。CIE 确定的配色函数称为 x¯(λ)y¯(λ),以及是¯ ( λ )并如图 18.5所示。请注意是¯ ( λ )等于明视亮度响应函数V (λ),并且这些函数确实都是正值。他们被称为 CIE 1931 标准观察者。
Figure 18.5. The CIE , and color matching functions.
图 18.5。 CIE x(λ)y(λ),和是¯ ( λ )颜色匹配功能。
The corresponding tristimulus values are termed X, Y , and Z, to avoid confusion with R, G,and B tristimulus values that are normally associated with realizable primaries. The conversion from (R, G, B) tristimulus values to (X, Y, Z) tristimulus values is defined by a simple 3 × 3 transform:
相应的三刺激值称为X 、 Y和Z ,以避免与通常与可实现原色相关的R 、 G和B三刺激值混淆。从 ( R, G, B ) 三刺激值到 ( X, Y, Z ) 三刺激值的转换由简单的 3 × 3 变换定义:
To calculate tristimulus values, we typically directly integrate the standard observer color matching functions with the spectrum of interest Φ(λ), rather than go through the CIE , and color matching functions first, followed by the above transformation. It allows us to calculate consistent color measurements and also determine when two colors match each other.
为了计算三刺激值,我们通常直接将标准观察者颜色匹配函数与感兴趣的光谱 Φ(λ) 积分,而不是通过 CIE r¯(λ)g¯(λ),并且b ¯ ( λ )首先使用颜色匹配函数,然后进行上述转换。它使我们能够计算一致的颜色测量值,并确定两种颜色何时相互匹配。
Every color can be represented by a set of three tristimulus values (X, Y, Z). We could define an orthogonal coordinate system with X, Y , and Z axes and plot each color in the resulting 3D space. This is called a color space. The spatial extent of the volume in which colors lie is then called the color gamut.
每种颜色都可以用一组三个三刺激值 ( X、Y、Z ) 表示。我们可以用X 、 Y和Z轴定义一个正交坐标系,并在得到的 3D 空间中绘制每种颜色。这称为颜色空间。颜色所在的体积的空间范围称为色域。
Visualizing colors in a 3D color space is fairly difficult. Moreover, the Y value of any color corresponds to its luminance, by virtue of the fact that equals V (λ). We could therefore project tristimulus values to a 2D space which approximates chromatic information, i.e., information which is independent of luminance. This projection is called a chromaticity diagram and is obtained by normalization while at the same time removing luminance information:
在 3D 颜色空间中可视化颜色相当困难。此外,任何颜色的Y值都与其亮度相对应,因为是¯ ( λ )等于V (λ)。因此,我们可以将三刺激值投射到二维空间,该空间近似于色度信息,即与亮度无关的信息。此投影称为色度图,通过标准化同时去除亮度信息获得:
Given that x + y + z equals 1, the z-value is redundant, allowing us to plot the x and y chromaticities against each other in a chromaticity diagram. Although x and y by themselves are not sufficient to fully describe a color, we can use these two chromaticity coordinates and one of the three tristimulus values, traditionally Y, to recover the other two tristimulus values:
假设x + y + z等于 1, z值是多余的,这样我们就可以在色度图中绘制x和y 的色度。虽然x和y本身不足以完全描述一种颜色,但我们可以使用这两个色度坐标和三个三刺激值之一(传统上是Y )来恢复另外两个三刺激值:
By plotting all monochromatic (spectral) colors in a chromaticity diagram, we obtain a horseshoe-shaped curve. The points on this curve are called spectrum loci. All other colors will generate points lying inside this curve. The spectrum locus for the 1931 standard observer is shown in Figure 18.6. The purple line between either end of the horseshoe does not represent a monochromatic color, but rather a combination of short and long wavelength stimuli.
通过在色度图中绘制所有单色(光谱)颜色,我们得到一条马蹄形曲线。该曲线上的点称为光谱轨迹。所有其他颜色都会产生位于该曲线内的点。1931 年标准观察者的光谱轨迹如图 18.6所示。马蹄两端之间的紫线并不代表单色,而是短波长和长波长刺激的组合。
Figure 18.6. The spectrum locus for the CIE 1931 standard observer.
图 18.6.CIE 1931 标准观察者的光谱轨迹。
A (non-monochromatic) primary can be integrated over all visible wavelengths, leading to (X, Y, Z) tristimulus values, and subsequently to an (x, y) chromaticity coordinate, i.e., a point on a chromaticity diagram. Repeating this for two or more primaries yields a set of points on a chromaticity diagram that can be connected by straight lines. The volume spanned in this manner represents the range of colors that can be reproduced by the additive mixture of these primaries. Examples of three-primary systems are shown in Figure 18.7.
可以将 (非单色) 原色积分到所有可见波长,得到 ( X, Y, Z ) 三刺激值,然后得到 ( x, y ) 色度坐标,即色度图上的一个点。对两个或更多原色重复此操作,将在色度图上产生一组可以用直线连接的点。以这种方式跨越的体积表示可以通过这些原色的加法混合再现的颜色范围。图 18.7显示了三原色系统的示例。
Figure 18.7. The chromaticity boundaries of the CIE RGB primaries at 435.8, 546.1, and 700 nm (solid) and a typical HDTV (dashed).
图 18.7. CIE RGB 原色在 435.8、546.1 和 700 nm(实线)以及典型 HDTV(虚线)的色度边界。
Chromaticity diagrams provide insight into additive color mixtures. However, they should be used with care. First, the interior of the horseshoe should not be colored, as any color reproduction system will have its own primaries and can only reproduce some parts of the chromaticity diagram. Second, as the CIE color matching functions do not represent human cone sensitivities, the distance between any two points on a chromaticity diagram is not a good indicator for how differently these colors will be perceived.
色度图可以深入了解加色混合。但是,应谨慎使用。首先,马蹄铁的内部不应着色,因为任何色彩再现系统都有自己的原色,只能再现色度图的某些部分。其次,由于 CIE 色彩匹配函数不代表人类视锥细胞的敏感度,因此色度图上任意两点之间的距离并不能很好地指示这些颜色的感知差异。
A more uniform chromaticity diagram was developed to at least in part address the second of these problems. The CIE u′ v′ chromaticity diagram provides a perceptually more uniform spacing and is therefore generally preferred over (x, y) chromaticity diagrams. It is computed from (X, Y, Z) tristimulus values by applying a different normalization,
为了至少部分解决这些问题中的第二个问题,人们开发了一种更均匀的色度图。CIE u ′ v ′ 色度图提供了感知上更均匀的间距,因此通常比 ( x, y ) 色度图更受欢迎。它是通过应用不同的归一化方法从 ( X, Y, Z ) 三刺激值计算得出的,
and can alternatively be computed directly from (x, y) chromaticity coordinates:
也可以直接从( x,y )色度坐标计算:
A CIE u' v' chromaticity diagram is shown in Figure 18.8.
图 18.8显示了CIE u'v '色度图。
Figure 18.8. The CIE u' v' chromaticity diagram.
图 18.8. CIE u'v '色度图。
As explained above, each color can be represented by three numbers, for instance defined by (X, Y, Z) tristimulus values. However, its primaries are imaginary, meaning that it is not possible to construct a device that has three light sources (all positive) that can reproduce all colors in the visible spectrum.
如上所述,每种颜色可以用三个数字表示,例如由 ( X, Y, Z ) 三刺激值定义。但是,其原色是虚数,这意味着不可能构建一个具有三个光源(均为正极)的设备,该设备可以再现可见光谱中的所有颜色。
For the same reason, image encoding and computations on images may not be practical. There is, for instance, a large number of possible XYZ values that do not correspond to any physical color. This would lead to inefficient use of available bits for storage and to a higher requirement for bit-depth to preserve visual integrity after image processing. Although it may be possible to build a capture device that has primaries that are close to the CIE XYZ color matching functions, the cost of hardware and image processing make this an unattractive option. It is not possible to build a display that corresponds to CIE XYZ. For these reasons, it is necessary to design other color spaces: physical realizability, efficient encoding, perceptual uniformity, and intuitive color specification.
出于同样的原因,图像编码和图像计算可能不切实际。例如,存在大量不对应于任何物理颜色的可能的XYZ值。这将导致可用存储位的使用效率低下,并对位深度提出更高的要求以在图像处理后保持视觉完整性。尽管可以构建一个具有接近 CIE XYZ颜色匹配函数的原色的捕获设备,但是硬件和图像处理的成本使这成为一个没有吸引力的选择。构建与 CIE XYZ相对应的显示器是不可能的。出于这些原因,有必要设计其他颜色空间:物理可实现性、高效编码、感知一致性和直观的颜色规范。
The CIE XYZ color space is still actively used, mostly for the conversion between other color spaces. It can be seen as a device-independent color space. Other color spaces can then be defined in terms of their relationship to CIE XY Z, which is often specified by a specific transform. For instance, linear and additive trichromatic display devices can be transformed to and from CIE XY Z by means of a simple 3 × 3 matrix. Some nonlinear additional transform may also be specified, for instance to minimize perceptual errors when data is stored with a limited bit-depth, or to enable display directly on devices that have a nonlinear relationship between input signal and the amount of light emitted.
CIE XYZ颜色空间仍在积极使用,主要用于其他颜色空间之间的转换。它可以被视为与设备无关的颜色空间。然后可以根据它们与 CIE XY Z的关系来定义其他颜色空间,这通常由特定的变换指定。例如,线性和加法三色显示设备可以通过简单的 3 × 3 矩阵转换为 CIE XY Z或从 CIE XY Z 转换为 CIE XY Z。还可以指定一些非线性附加变换,例如,当数据以有限的位深度存储时,为了最大限度地减少感知错误,或者为了在输入信号和发光量之间存在非线性关系的设备上直接显示。
For a display device with three primaries, say red, green, and blue, we can measure the spectral composition of the emitted light by sending the color vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1). These vectors represent the three cases namely where one of the primaries is full on, and the other two are off. From the measured spectral output, we can then compute the corresponding chromaticity coordinates (xR,yR), (xG,yG), and (xB,yB).
对于具有三原色(例如红、绿、蓝)的显示设备,我们可以通过发送颜色向量 (1 0 0)、(0 1 0) 和 (0 0 1) 来测量发射光的光谱成分。这些向量代表三种情况,即其中一个原色完全打开,而其他两个原色关闭。根据测量的光谱输出,我们可以计算相应的色度坐标 ( x R ,y R )、( x G ,y G ) 和 ( x B ,y B )。
The white point of a display is defined as the spectrum emitted when the color vector (1, 1, 1) is sent to the display. Its corresponding chromaticity coordinate is (xW,yW ). The three primaries and the white point characterize the display and are each required to construct a transformation matrix between the display’s color space and CIE XY Z.
显示器的白点定义为将颜色向量 (1 1 1) 发送到显示器时发出的光谱。其对应的色度坐标为 ( x W ,y W )。三原色和白点表征显示器,并且每个原色和白点都需要在显示器的色彩空间和 CIE XY Z之间构建一个变换矩阵。
These four chromaticity coordinates can be extended to chromaticity triplets reconstructing the z-coordinate from z = 1—x—y, leading to triplets (xR,yR,zR) (xG,yG,zG), (xB,yB,zB), and (xW,yW,zW ). If we know the maximum luminance of the white point, we can compute its corresponding tristimulus value (XW,YW,ZW ) and then solve the following set of equations for the luminance ratio scalars SR, SG, and SB:
这四个色度坐标可以扩展为色度三元组,从z = 1— x — y重构z坐标,得到三元组 ( x R ,y R ,z R ) ( x G ,y G ,z G ),( x B ,y B ,z B ) 和 ( x W ,y W ,z W )。如果我们知道白点的最大亮度,我们可以计算其对应的三刺激值 ( X W ,Y W ,Z W ),然后求解以下方程组以获得亮度比标量S R 、 S G和S B :
The conversion between RGB and XYZ is then given by
RGB 和 XYZ 之间的转换如下
The luminance of any given color can be computed by evaluating the middle row of a matrix constructed in this manner:
任何给定颜色的亮度都可以通过评估以这种方式构建的矩阵的中间行来计算:
To convert between XYZ and RGB of a given device, the above matrix can simply be inverted.
为了在给定设备的 XYZ 和 RGB 之间进行转换,只需简单地反转上述矩阵即可。
If an image is represented in an RGB color space for which the primaries and white point are unknown, then the next best thing is to assume that the image was encoded in a standard RGB color space. A reasonable choice is then to assume that the image was specified according to ITU-R BT.709, which is the specification used for encoding and broadcasting of HDTV. Its primaries and white point are specified in Table 18.1. Note that the same primaries and white point are used to define the well-known sRGB color space. The transformation between this RGB color space and CIE XYZ is and vice versa given by
如果图像以 RGB 颜色空间表示,但其原色和白点未知,那么下一个最佳选择就是假设该图像是在标准 RGB 颜色空间中编码的。然后,一个合理的选择是假设该图像是根据 ITU-R BT.709 指定的,这是用于编码和广播 HDTV 的规范。其原色和白点在表 18.1中指定。请注意,相同的原色和白点用于定义众所周知的 sRGB 颜色空间。此 RGB 颜色空间与 CIE XYZ 之间的转换以及反之亦然,由以下公式给出
R |
G |
B |
White |
|
---|---|---|---|---|
x |
0.6400 |
0.3000 |
0.1500 |
0.3127 |
y |
0.3300 |
0.6000 |
0.0600 |
0.3290 |
By substituting the maximum RGB values of the device, we can compute the white point. For ITU-R BT.709, the maximum values are (RW,GW,BW ) = (100, 100, 100), leading to a white point of (XW,YW,ZW ) = (95.05, 100.00, 108.90).
通过替换设备的最大RGB值,我们可以计算出白点。对于 ITU-R BT.709,最大值为 ( R W ,G W ,B W ) = (100 100 100),从而得出白点为 ( X W ,Y W ,Z W ) = (9505 10000 10890)。
In addition to a linear transformation, the sRGB color space is characterized by a subsequent nonlinear transform. The nonlinear encoding is given by
除了线性变换之外,sRGB 颜色空间还具有随后的非线性变换的特征。非线性编码由以下公式给出
This nonlinear encoding helps minimize perceptual errors due to quantization errors in digital applications.
这种非线性编码有助于最大限度地减少由于数字应用中的量化误差而导致的感知误差。
As each device typically has its own set of primaries and white point, we call the associated RGB color spaces device-dependent. It should be noted that even if all these devices operate in an RGB space, they may have very different primaries and white points. If we therefore have an image specified in some RGB space, it may appear very different to us, depending upon which device we display it.
由于每台设备通常都有自己的一组原色和白点,因此我们将相关的 RGB 颜色空间称为设备相关。需要注意的是,即使所有这些设备都在 RGB 空间中运行,它们的原色和白点也可能非常不同。因此,如果我们在某个 RGB 空间中指定了图像,它看起来可能会非常不同,具体取决于我们在哪个设备上显示它。
This is clearly an undesirable situation, resulting from a lack of color management. However, if the image is specified in a known RGB color space, it can first be converted to XYZ, which is device independent, and then subsequently it can be converted to the RGB space of the device on which it will be displayed.
这显然是一种不理想的情况,因为缺乏色彩管理。但是,如果图像是在已知的 RGB 颜色空间中指定的,则可以先将其转换为与设备无关的 XYZ,然后再将其转换为要在其上显示的设备的 RGB 空间。
There are several other RGB color spaces that are well defined. They each consist of a linear matrix transform followed by a nonlinear transform, akin to the aforementioned sRGB color space. The nonlinear transform can be parameterized as follows:
还有其他几个定义明确的 RGB 颜色空间。它们每个都由线性矩阵变换和非线性变换组成,类似于前面提到的 sRGB 颜色空间。非线性变换可以参数化如下:
The parameters s, f , t and γ, together with primaries and white point, specify a class of RGB color spaces that are used in various industries. Several common transformations are listed in Table 18.2.
参数s 、 f 、 t和 γ 与原色和白点一起指定了用于各个行业的一类 RGB 颜色空间。表 18.2列出了几种常见的转换。
Color space |
XYZ to RGB matrix |
RGB to XYZ matrix |
Nonlinear transform |
---|---|---|---|
sRGB |
|||
Adobe RGB (1998) |
|||
HDTV (HD-CIF) |
|||
NTSC (1953)/ITU-R BT.601-4 |
|||
PAL/SECAM |
|||
SMPTE-C |
|||
SMPTE-240M |
|||
Wide Gamut |
The aforementioned cone signals can be expressed in terms of the CIE XYZ color space. The matrix transform to compute LM S signals from XY Z and vice versa are given by
上述视锥细胞信号可以用 CIE XYZ 颜色空间来表示。从XY Z计算 LM S 信号和从 XY Z 计算LM S信号的矩阵变换如下:
This transform is known as the Hunt-Pointer-Estevez transform (Hunt, 2004) and is used in chromatic adaptation transforms as well as in color appearance modeling.
这种变换被称为 Hunt-Pointer-Estevez 变换(Hunt, 2004),用于色度适应变换以及色彩外观建模。
Color opponent spaces are characterized by a channel representing an achromatic channel (luminance), as well as two channels encoding color opponency. These are frequently red-green and yellow-blue channels. These color opponent channels thus encode two chromaticities along one axis, which can have both positive and negative values. For instance, a red-green channel encodes red for positive values and green for negative values. The value zero encodes a special case: neutral which is neither red or green. The yellow-blue channel works in much the same way.
颜色对立空间的特点是,一个通道代表非彩色通道(亮度),以及两个通道编码颜色对立。这些通道通常是红绿通道和黄蓝通道。因此,这些颜色对立通道沿一个轴编码两个色度,它们可以具有正值和负值。例如,红绿通道将红色编码为正值,将绿色编码为负值。值零编码一种特殊情况:中性,既不是红色也不是绿色。黄蓝通道的工作方式大致相同。
As at least two colors are encoded on each of the two chromatic axes, it is not possible to encode a mixture of red and green. Neither is it possible to encode yellow and blue simultaneously. While this may seem a disadvantage, it is known that the human visual system computes similar attributes early in the visual pathway. As a result, humans are not able to perceive colors that are simultaneously red and green, or yellow and blue. We do not see anything resembling reddish-green, or yellowish-blue. We are, however, able to perceive mixtures of colors such as yellowish-red (orange) or greenish-blue, as these are encoded across the chromatic channels.
由于两个色度轴上至少编码了两种颜色,因此不可能对红色和绿色的混合进行编码。也不可能同时对黄色和蓝色进行编码。虽然这似乎是一个缺点,但众所周知,人类视觉系统在视觉通路的早期就计算了类似的属性。因此,人类无法感知同时为红色和绿色或黄色和蓝色的颜色。我们看不到任何类似于红绿色或黄蓝色的东西。然而,我们能够感知黄红色(橙色)或绿蓝色等颜色的混合,因为这些是在色度通道上编码的。
The most relevant color opponent system for computer graphics is the CIE 1976 L*a*b* color model. It is a perceptually more or less uniform color space, useful, among other things, for the computation of color differences. It is also known as CIELAB.
计算机图形学中最相关的色彩对照系统是 CIE 1976 L * a * b * 颜色模型。这是一个感知上或多或少均匀的色彩空间,除其他用途外,还可用于计算颜色差异。它也被称为 CIELAB。
The input to CIELAB are the stimulus (X, Y, Z) tristimulus values as well as the tristimulus values of a diffuse white reflecting surface that is lit by a known illuminant, (Xn,Yn,Zn). CIELAB therefore goes beyond being an ordinary color space, as it takes into account a patch of color in the context of a known illumination. It can thus be seen as a rudimentary color appearance space.
CIELAB 的输入是刺激 ( X, Y, Z ) 三刺激值以及由已知光源 ( X n ,Y n ,Z n ) 照亮的漫反射白色反射表面的三刺激值。因此,CIELAB 不仅仅是一个普通的色彩空间,因为它考虑了已知照明环境下的色块。因此,它可以被视为一个基本的色彩外观空间。
The three channels defined in CIELAB are L*, a*, and b*. The L* channel encodes the lightness of the color, i.e., the perceived reflectance of a patch with tristimulus value (X, Y, Z). The a* and b* are chromatic opponent channels. The transform between XYZ and CIELAB is given by
CIELAB 中定义的三个通道是L *、 a * 和b *。L * 通道编码颜色的亮度,即具有三刺激值 ( X、Y、Z )的色块的感知反射率。a *和b * 是色度对立通道。XYZ 和 CIELAB 之间的变换由以下公式给出
The function f is defined as
函数f定义为
As can be seen from this formulation, the chromatic channels do depend on the luminance Y . Although this is perceptually accurate, it means that we cannot plot the values of a* and b* in a chromaticity diagram. The lightness L* is normalized between 0 and 100 for black and white. Although the a* and b* channels are not explicitly constrained, they are typically in the range [—128, 128].
从该公式可以看出,色度通道确实依赖于亮度Y 。虽然这在感知上是准确的,但这意味着我们无法在色度图中绘制a * 和b * 的值。亮度L * 被标准化为 0 到 100 之间的黑色和白色。虽然a * 和b * 通道没有明确限制,但它们通常在 [—128 128] 范围内。
As CIELAB is approximately perceptually linear, it is possible to take two colors, convert them to CIELAB, and then estimate the perceived color difference by computing the Euclidean distance between them. This leads to the following color difference formula:
由于 CIELAB 在感知上近似线性,因此可以取两种颜色,将其转换为 CIELAB,然后通过计算它们之间的欧几里得距离来估计感知色差。这可以得出以下色差公式:
The letter E stands for difference in sensation (in German, Empfindung) (Judd, 1932).
字母E代表感觉差异(德语,Empfindung)(Judd,1932)。
Finally, the inverse transform between CIELAB and XYZ is given by
最后,CIELAB 和 XYZ 之间的逆变换由下式给出
The CIELAB color space just described takes as input both a tristimulus value of the stimulus and the tristimulus value of light reflected off a white diffuse patch. As such, it forms the beginnings of a system in which the viewing environment is taken into account.
刚刚描述的 CIELAB 颜色空间将刺激物的三刺激值和白色漫反射斑块反射光的三刺激值作为输入。因此,它构成了考虑观察环境的系统的开端。
The environment in which we observe objects and images has a large influence on how we perceive those objects. The range of viewing environments that we encounter in daily life is very large, from sunlight to starlight and from candlelight to fluorescent light. The lighting conditions not only constitute a very large range in the amount of light that is present, but also vary greatly in the color of the emitted light.
我们观察物体和图像的环境对我们感知这些物体的方式有很大影响。我们在日常生活中遇到的观察环境范围非常广泛,从阳光到星光,从烛光到荧光灯。照明条件不仅构成了光量范围非常大的变化,而且发射光的颜色也有很大差异。
The human visual system accommodates these changes in the environment through a process called adaptation. Three different types of adaptation can be distinguished, namely light adaptation, dark adaptation, and chromatic adaptation. Light adaptation refers to the changes that occur when we move from a very dark to a very light environment. When this happens, at first we are dazzled by the light, but soon we adapt to the new situation and begin to distinguish objects in our environment. Dark adaptation refers to the opposite—when we go from a light environment to a dark environment. At first, we see very little, but after a given amount of time, details will start to emerge. The time needed to adapt to the dark is generally much longer than for light adaptation.
人类视觉系统通过一种称为适应的过程来适应环境的变化。适应可分为三种类型,即光适应、暗适应和色适应。光适应是指我们从非常黑暗的环境进入非常明亮的环境时发生的变化。当这种情况发生时,我们一开始会被光线弄得眼花缭乱,但很快我们就会适应新情况并开始辨别环境中的物体。暗适应则相反,当我们从明亮的环境进入黑暗的环境时。一开始,我们几乎看不到什么,但经过一段时间后,细节就会开始显现。适应黑暗所需的时间通常比适应光的时间长得多。
Chromatic adaptation refers to our ability to adapt, and largely ignore, variations in the color of the illumination. Chromatic adaptation is, in essence, the biological equivalent of the white balancing operation that is available on most modern cameras. The human visual system effectively normalizes the viewing conditions to present a visual experience that is fairly consistent. Thus, we exhibit a certain amount of color constancy: object reflectances appear relatively constant despite variations in illumination.
色彩适应是指我们适应并在很大程度上忽略照明色彩变化的能力。色彩适应本质上相当于大多数现代相机上都具有的白平衡操作的生物学等价物。人类视觉系统有效地规范了观看条件,以呈现相当一致的视觉体验。因此,我们表现出一定程度的色彩恒常性:尽管照明发生变化,但物体的反射率看起来相对恒定。
Although we are able to largely ignore changes in viewing environment, we are not able to do so completely. For instance, colors appear much more colorful on a sunny day than they do on a cloudy day. Although the appearances have changed, we do not assume that object reflectances themselves have actually changed their physical properties. We thus understand that the lighting conditions have influenced the overall color appearance.
虽然我们能够在很大程度上忽略观察环境的变化,但无法完全忽略。例如,晴天的颜色比阴天的颜色更加鲜艳。虽然外观发生了变化,但我们并不认为物体反射率本身实际上改变了其物理属性。因此,我们理解照明条件影响了整体色彩外观。
Nonetheless, color constancy does apply to chromatic content. Chromatic adaptation allows white objects to appear white for a large number of lighting conditions, as shown in Figure 18.9.
尽管如此,色彩恒常性确实适用于色彩内容。色彩适应性允许白色物体在大量照明条件下呈现白色,如图 18.9所示。
Figure 18.9. A series of light sources plotted in the CIE u' v' chromaticity diagram. A white piece of paper illuminated by any of these light sources maintains a white color appearance.
图 18.9。 CIE u'v '色度图中绘制的一系列光源。用这些光源中的任何一个照亮的白纸都会保持白色的外观。
Computational models of chromatic adaptation tend to focus on the gain control mechanism in the cones. One of the simplest models assumes that each cone adapts independently to the energy that it absorbs. This means that different cone types adapt differently dependent on the spectrum of the light being absorbed. Such adaptation can then be modeled as an adaptive and independent rescaling of the cone signals:
色觉适应的计算模型往往侧重于视锥细胞中的增益控制机制。最简单的模型之一假设每个视锥细胞独立适应其吸收的能量。这意味着不同类型的视锥细胞会根据所吸收光的光谱以不同的方式适应。这种适应性可以建模为视锥细胞信号的自适应和独立重新缩放:
where (La,Ma,Sa) are the chromatically adapted cone signals, and α, β, and γ are the independent gain controls which are determined by the viewing environment. This type of independent adaptation is also known as von-Kries adaptation. An example is shown in Figure 18.10.
其中 ( L a ,M a ,S a ) 是色度适应的视锥细胞信号, α 、 β和γ是独立增益控制,由观看环境决定。这种独立适应也称为 von-Kries 适应。图 18.10显示了一个例子。
Figure 18.10. An example of von Kries–style independent photoreceptor gain control. The relative cone responses (solid line) and the relative adapted cone responses to CIE illuminant A (dashed) are shown. The separate patch of color represents CIE illuminant A rendered into the sRGB color space.
图 18.10. von Kries 式独立感光器增益控制示例。图中显示了相对视锥细胞反应(实线)和对 CIE 光源 A 的相对适应性视锥细胞反应(虚线)。单独的色块表示 CIE 光源 A 在 sRGB 颜色空间中的渲染效果。
The adapting illumination can be measured off a white surface in the scene. In the ideal case, this would be a Lambertian surface. In a digital image, the adapting illumination can also be approximated as the maximum tristimulus values of the scene. The light measured or computed in this manner is the adapting white, given by (Lw,Mw,Sw). Von Kries adaptation is then simply a scaling by the reciprocal of the adapting white, carried out in cone response space:
适应照明可以通过场景中的白色表面测量。在理想情况下,这将是朗伯表面。在数字图像中,适应照明也可以近似为场景的最大三刺激值。以这种方式测量或计算的光是适应白色,由 ( L w ,M w ,S w ) 给出。冯·克里适应只是通过适应白色的倒数进行缩放,在锥体响应空间中进行:
In many cases, we are interested in what stimulus should be generated under one illumination to match a given color under a different illumination. For example, if we have a colored patch illuminated by daylight, we may ask ourselves what tristimulus values should be generated to create a matching color patch that will be illuminated by incandescent light.
在许多情况下,我们感兴趣的是,在一种照明下应产生什么样的刺激才能与另一种照明下的特定颜色相匹配。例如,如果我们有一个由日光照射的彩色块,我们可能会问自己应该产生什么三色刺激值来创建一个由白炽灯照射的匹配颜色块。
We are thus interested in computing corresponding colors, which can be achieved by cascading two chromatic adaptation calculations. In essence, the previously mentioned von Kries transform divides out the adapting illuminant—in our example, the daylight illumination. If we subsequently multiply in the incandescent illuminant, we have computed a corresponding color. If the two illuminants are given by (Lw,1,Mw,1,Sw,1) and (Lw,2,Mw,2,Sw,2), the corresponding color (Lc,Mc,Sc) is given by
因此,我们感兴趣的是计算相应的颜色,这可以通过级联两个色度适应计算来实现。本质上,前面提到的冯·克里变换会除以适应光源——在我们的例子中是日光照明。如果我们随后乘以白炽光源,我们就计算出了相应的颜色。如果两个光源分别为 ( L w, 1 ,M w, 1 ,S w, 1 ) 和 ( L w, 2 ,M w, 2 ,S w, 2 ),则相应的颜色 ( L c ,M c ,S c ) 由以下公式给出
There are several more complicated and, therefore, more accurate chromatic adaptation transform in existence (Reinhard et al., 2008). However, the simple von Kries model remains remarkably effective in modeling chromatic adaptation and can thus be used to achieve white balancing in digital images.
目前存在几种更复杂、更准确的色彩适应变换(Reinhard 等,2008)。然而,简单的 von Kries 模型在色彩适应建模方面仍然非常有效,因此可用于实现数字图像中的白平衡。
The importance of chromatic adaptation in the context of rendering, is that we have moved one step closer to taking into account the viewing environment of the observer, without having to correct for it by adjusting the scene and rerendering our imagery. Instead, we can model and render our scenes, and then, as an image postprocess, correct for the illumination of the viewing environment. To ensure that white balancing does not introduce artifacts, however, it is important to ensure that the image is rendered to a floating-point format. If rendered to traditional 8-bit image formats, the chromatic adaptation transform may amplify quantization errors.
在渲染中,色度适应的重要性在于,我们离考虑观察者的观看环境又近了一步,而不必通过调整场景和重新渲染图像来纠正它。相反,我们可以对场景进行建模和渲染,然后作为图像后期处理,纠正观看环境的照明。然而,为了确保白平衡不会引入伪影,重要的是确保将图像渲染为浮点格式。如果渲染为传统的 8 位图像格式,色度适应变换可能会放大量化误差。
While colorimetry allows us to accurately specify and communicate color in a device-independent manner, and chromatic adaptation allows us to predict color matches across changes in illumination, these tools are still insufficient to describe what colors actually look like.
虽然比色法使我们能够以与设备无关的方式准确地指定和传达颜色,并且色彩适应使我们能够预测照明变化时的颜色匹配,但这些工具仍然不足以描述颜色的实际样子。
To predict the actual perception of an object, we need to know more information about the environment and take that information into account. The human visual system is constantly adapting to its environment, which means that the perception of color will be strongly influenced by such changes. Color appearance models take into account measurements of the stimulus itself, as well as the viewing environment. This means that the resulting description of color is independent of viewing condition.
为了预测对物体的实际感知,我们需要了解更多有关环境的信息,并将这些信息考虑在内。人类视觉系统不断适应环境,这意味着对颜色的感知将受到此类变化的强烈影响。颜色外观模型考虑了刺激本身以及观看环境的测量。这意味着最终的颜色描述与观看条件无关。
The importance of color appearance modeling can be seen in the following example. Consider an image being displayed on an LCD screen. When making a print of the same image and viewing it in a different context, more often than not the image will look markedly different. Color appearance models can be used to predict the changes required to generate an accurate cross-media color reproduction (Fairchild, 2005).
以下示例显示了色彩外观建模的重要性。考虑在 LCD 屏幕上显示的图像。当打印同一幅图像并在不同的环境中查看时,图像通常会看起来明显不同。色彩外观模型可用于预测生成准确的跨媒体色彩再现所需的变化(Fairchild,2005 年)。
Although color appearance modeling offers important tools for color reproduction, actual implementations tend to be relatively complicated and cumbersome in practical use. It can be anticipated that this situation may change over time. However, until then, we leave their description to more specialized textbooks (Fairchild, 2005).
尽管色彩外观建模为色彩再现提供了重要的工具,但实际实施在实际使用中往往相对复杂和繁琐。可以预见这种情况可能会随着时间的推移而改变。然而,在此之前,我们将其描述留给更专业的教科书(Fairchild,2005)。
Of all the books on color theory, Reinhard et al.’s work (Reinhard et al., 2008) is most directly geared toward engineering disciplines, including computer graphics, computer vision, and image processing. Other general introductions to color theory are given by Berns (Berns, 2000) and Stone (Stone, 2003). Wyszecki and Stiles have produced a comprehensive volume of data and formulae, forming an indispensable reference work (Wyszecki & Stiles, 2000). For color reproduction, we recommend Hunt’s book (Hunt, 2004). Color appearance models are comprehensively described in Fairchild’s book (Fairchild, 2005). For color issues related to video and HDTV Poynton’s book is essential (Poynton, 2003).
在所有关于色彩理论的书籍中,Reinhard 等人的著作 (Reinhard 等人,2008) 最直接面向工程学科,包括计算机图形学、计算机视觉和图像处理。Berns (Berns,2000) 和 Stone (Stone,2003) 给出了其他关于色彩理论的一般介绍。Wyszecki 和 Stiles 制作了大量数据和公式,形成了不可或缺的参考书 (Wyszecki & Stiles,2000)。对于色彩再现,我们推荐 Hunt 的书 (Hunt,2004)。Fairchild 的书 (Fairchild,2005) 全面描述了色彩外观模型。对于与视频和高清电视相关的色彩问题,Poynton 的书是必不可少的 (Poynton,2003)。
William B. Thompson
The ultimate purpose of computer graphics is to produce images for viewing by people. Thus, the success of a computer graphics system depends on how well it conveys relevant information to a human observer. The intrinsic complexity of the physical world and the limitations of display devices make it impossible to present a viewer with the identical patterns of light that would occur when looking at a natural environment. When the goal of a computer graphics system is physical realism, the best we can hope for is that the system be perceptually effective: displayed images should “look” as intended. For applications such as technical illustration, it is often desirable to visually highlight relevant information and perceptual effectiveness becomes an explicit requirement.
计算机图形的最终目的是生成图像供人观看。因此,计算机图形系统的成功取决于它如何向人类观察者传达相关信息。物理世界的内在复杂性和显示设备的局限性使得不可能向观察者呈现与观察自然环境时相同的光图案。当计算机图形系统的目标是物理真实感时,我们所能期望的最好结果是系统具有感知效果:显示的图像应该“看起来”符合预期。对于技术插图等应用,通常希望在视觉上突出相关信息,感知效果成为一项明确的要求。
Artists and illustrators have developed empirically a broad range of tools and techniques for effectively conveying visual information. One approach to improving the perceptual effectiveness of computer graphics is to utilize these methods in our automated systems. A second approach builds directly on knowledge of the human vision system by using perceptual effectiveness as an optimization criterion in the design of computer graphics systems. These two approaches are not completely distinct. Indeed, one of the first systematic examinations of visual perception is found in the notebooks of Leonardo da Vinci.
艺术家和插画家已经凭经验开发出各种工具和技术,用于有效地传达视觉信息。提高计算机图形感知效果的一种方法是在我们的自动化系统中利用这些方法。第二种方法直接建立在人类视觉系统的知识之上,使用感知效果作为计算机图形系统设计中的优化标准。这两种方法并不完全不同。事实上,最早对视觉感知进行系统检查的文献之一是列奥纳多·达·芬奇的笔记本。
The remainder of this chapter provides a partial overview of what is known about visual perception in people. The emphasis is on aspects of human vision that are most relevant to computer graphics. The human visual system is extremely complex in both its operation and its architecture. A chapter such as this can at best provide a summary of key points, and it is important to avoid over generalizing from what is presented here. More in-depth treatments of visual perception can be found in Wandell (1995) and Palmer (1999); Gregory (1997) and Yantis (2000) provide additional useful information. A good computer vision reference such as Forsyth and Ponce (2002) is also helpful. It is important to note that despite over 150 years of intensive research, our knowledge of many aspects of vision is still very limited and imperfect.
本章的其余部分将对人类视觉感知的已知知识进行部分概述。重点是与计算机图形学最相关的人类视觉方面。人类视觉系统在操作和架构方面都极其复杂。像这样的章节充其量只能提供要点的总结,重要的是避免过度概括这里介绍的内容。在 Wandell (1995) 和 Palmer (1999) 中可以找到对视觉感知的更深入的论述;Gregory (1997) 和 Yantis (2000) 提供了更多有用信息。像 Forsyth 和 Ponce (2002) 这样的优秀计算机视觉参考资料也很有帮助。值得注意的是,尽管经过了 150 多年的深入研究,我们对视觉许多方面的了解仍然非常有限且不完善。
Vision is generally agreed to be the most powerful of the senses in humans. Vision produces more useful information about the world than does hearing, touch, smell, or taste. This is a direct consequence of the physics of light (Figure 19.1). Illumination is pervasive, especially during the day but also at night due to moonlight, starlight, and artificial sources. Surfaces reflect a substantial portion of incident illumination and do so in ways that are idiosyncratic to particular materials and that are dependent on the shape of the surface. The fact that light (mostly) travels in straight lines through the air allows vision to acquire information from distant locations.
人们普遍认为视觉是人类最强大的感官。视觉比听觉、触觉、嗅觉或味觉能提供更多关于世界的有用信息。这是光的物理特性的直接结果(图 19.1 )。照明无处不在,尤其是在白天,但在夜晚,由于月光、星光和人造光源,照明也无处不在。表面会反射相当一部分入射光,反射方式因特定材料而异,取决于表面的形状。光线(大部分)在空气中沿直线传播,这一事实使视觉能够从远处获取信息。
Figure 19.1. The nature of light makes vision a powerful sense.
图 19.1。光的性质使视觉具有强大的意义。
The study of vision has a long and rich history. Much of what we know about the eye traces back to the work of philosophers and physicists in the 1600s. Starting in the mid-1800s, there was an explosion of work by perceptual psychologists exploring the phenomenology of vision and proposing models of how vision might work. The mid-1900s saw the start of modern neuroscience, which investigates both the fine-scale workings of individual neurons and the large-scale architectural organization of the brain and nervous system. A substantial portion of neuroscience research has focused on vision. More recently, computer science has contributed to the understanding of visual perception by providing tools for precisely describing hypothesized models of visual computations and by allowing empirical examination of computer vision programs. The term vision science was coined to refer to the multidisciplinary study of visual perception involving perceptual psychology, neuroscience, and computational analysis.
视觉研究历史悠久,内容丰富。我们对眼睛的了解大部分可以追溯到 17 世纪哲学家和物理学家的工作。从 19 世纪中期开始,感知心理学家开始大量研究视觉现象学,并提出视觉可能如何工作的模型。20 世纪中期,现代神经科学开始兴起,它研究单个神经元的精细工作以及大脑和神经系统的大规模结构组织。神经科学研究的很大一部分集中在视觉上。最近,计算机科学通过提供精确描述视觉计算假设模型的工具以及允许对计算机视觉程序进行实证检验,为理解视觉感知做出了贡献。视觉科学一词是指涉及感知心理学、神经科学和计算分析的多学科视觉感知研究。
Vision science views the purpose of vision as producing information about objects, locations, and events in the world from imaged patterns of light reaching the viewer. Psychologists use the term distal stimulus to refer to the physical world under observation and proximal stimulus to refer to the retinal image. 1 Using this terminology, the function of vision is to generate a description of aspects of the distal stimulus given the proximal stimulus. Visual perception is said to be veridical when the description that is produced accurately reflects the real world. In practice, it makes little sense to think of these descriptions of objects, locations, and events in isolation. Rather, vision is better understood in the context of the motor and cognitive functions that it serves.
视觉科学认为视觉的作用是从到达观察者的光的成像模式中产生有关世界上物体、位置和事件的信息。心理学家使用术语“远端刺激”来指代观察中的物理世界,使用“近端刺激”来指代视网膜图像。1 使用此术语,视觉的功能是在给定近端刺激的情况下生成对远端刺激各方面的描述。当产生的描述准确反映现实世界时,视觉感知被认为是真实的。实际上,孤立地考虑这些物体、位置和事件的描述是没有意义的。相反,最好在视觉所服务的运动和认知功能的背景下理解视觉。
1 In computer vision, the termscene is often used to refer to the external world, while the term image is used to refer to the projection of the scene onto a sensing plane.
1在计算机视觉中,术语“场景”通常用于指代外部世界,而术语“图像”则用于指代场景在感知平面上的投影。
Vision systems create descriptions of the visual environment based on properties of the incident illumination. As a result, it is important to understand what properties of incident illumination the human vision system can actually detect. One critical observation about the human vision system is that it is primarily sensitive to patterns of light rather than being sensitive to the absolute magnitude of light energy. The eye does not operate as a photometer. Instead, it detects spatial, temporal, and spectral patterns in the light imaged on the retina and information about these patterns of light form the basis for all of visual perception.
视觉系统根据入射照明的特性来描述视觉环境。因此,了解人类视觉系统实际上可以检测到入射照明的哪些特性非常重要。关于人类视觉系统的一个重要观察是,它主要对光的模式敏感,而不是对光能的绝对大小敏感。眼睛不是光度计。相反,它检测视网膜上成像的光中的空间、时间和光谱模式,这些光模式的信息构成了所有视觉感知的基础。
There is a clear ecological utility to the vision system’s sensitivity to variations in illumination over space and time. Being able to accurately sense changes in the environment is crucial to our survival. 2 A system which measures changes in light energy rather than the magnitude of the energy itself also makes engineering sense, since it makes it easier to detect patterns of light over large ranges in light intensity. It is a good thing for computer graphics that vision operates in this manner. Display devices are physically limited in their ability to project light with the power and dynamic range typical of natural scenes. Graphical displays would not be effective if they needed to produce the identical patterns of light as the corresponding physical world. Fortunately, all that is required is that displays be able to produce similar patterns of spatial and temporal change to the real world.
视觉系统对空间和时间中光照变化的敏感性具有明显的生态效用。能够准确地感知环境变化对我们的生存至关重要。2 测量光能变化而不是能量本身大小的系统也具有工程意义,因为它可以更轻松地检测大范围光强度下的光模式。视觉以这种方式运行对计算机图形学来说是一件好事。显示设备在投射具有自然场景典型功率和动态范围的光的能力方面受到物理限制。如果图形显示器需要产生与相应物理世界相同的光模式,那么它们将不会有效。幸运的是,只需要显示器能够产生与现实世界相似的空间和时间变化模式。
In bright light, the human visual system is capable of distinguishing gratings consisting of high-contrast parallel light and dark bars as fine as 50–60 cycles/degree. (In this case, a “cycle” consists of an adjacent pair of light and dark bars.) For comparison, the best currently available LCD computer monitor, at a normal viewing distance, can display patterns as fine as about 20 cycles/degree. The minimum contrast difference at an edge detectable by the human visual system in bright light is about 1% of the average luminance across the edge. In most 8-bit displays, differences of a single gray level are often noticeable over at least a portion of the range of intensities due to the nature of the mapping from gray levels to actual display luminance.
在明亮的光线下,人类视觉系统能够区分由高对比度平行明暗条组成的光栅,其精细度可达 50-60 周期/度。(在这种情况下,“周期”由一对相邻的明暗条组成。)相比之下,目前最好的 LCD 电脑显示器在正常观看距离下可以显示精细度约为 20 周期/度的图案。在明亮的光线下,人类视觉系统可检测到的边缘最小对比度差异约为边缘平均亮度的 1%。在大多数 8 位显示器中,由于从灰度级到实际显示亮度的映射性质,单个灰度级的差异通常在至少一部分强度范围内是显而易见的。
2 It is sometime said that the primary goals of vision are to support eating, avoiding being eaten, reproduction, and avoidance of catastrophe while moving. Thinking about vision as a goal-directed activity is often useful, but needs to be done so at a more detailed level.
2有人说,视觉的主要目的是支持进食、避免被吃掉、繁殖和在移动时避免灾难。将视觉视为一种有目标的活动通常很有用,但需要在更详细的层面上进行。
Characterizing the ability of the visual system to detect fine scale patterns (visual acuity) and to detect changes in brightness is considerably more complicated than for cameras and similar image acquisition devices. As shown in Figure 19.2, there is an interaction between contrast and acuity in human vision. In the figure, the scale of the pattern decreases from left to right while the contrast increases from top to bottom. If you view the figure at a normal viewing distance, it will be clear that the lowest contrast at which a pattern is visible is a function of the spatial frequency of the pattern.
表征视觉系统检测精细尺度图案的能力(视觉敏锐度 (visual cuaience ) 并检测亮度变化比相机和类似的图像采集设备要复杂得多。如图 19.2所示,人类视觉中对比度和敏锐度之间存在相互作用。在图中,图案的比例从左到右减小,而对比度从上到下增加。如果以正常的观看距离观看该图,就会清楚地看到,图案可见的最低对比度是图案空间频率的函数。
Figure 19.2. The contrast between stripes increases in a constant manner from top to bottom, yet the threshold of visibility varies with frequency.
图 19.2.条纹之间的对比度从上到下恒定增加,但可见性的阈值随频率而变化。
There is a linear relationship between the intensity of light L reaching the eye from a particular surface point in the world, the intensity of light I illuminating that surface point, and the reflectivity R of the surface at the point being observed:
从世界上特定的表面点到达眼睛的光强度L 、照亮该表面点的光强度I以及被观察点的表面反射率R之间存在线性关系:
where α is dependent on the relationship between the surface geometry, the pattern of incident illumination, and the viewing direction. While the eye is only able to directly measure L, human vision is much better at estimating R than L. To see this, view Figure 19.3 in bright direct light. Use your hand to shadow one of the patterns, leaving the other directly illuminated. While the light reflected off of the two patterns will be significantly different, the apparent brightness of the two center squares will seem nearly the same. The term lightness is often used to describe the apparent brightness of a surface, as distinct from its actual luminance. In many situations, lightness is invariant to large changes in illumination, a phenomenon referred to as lightness constancy.
其中α取决于表面几何形状、入射照明模式和观察方向之间的关系。虽然眼睛只能直接测量L ,但人类视觉在估计R方面比L好得多。要看到这一点,请在明亮的直射光下查看图 19.3 。用手遮住其中一个图案,让另一个图案直接照亮。虽然从两个图案反射的光线会有很大差异,但两个中心方块的视亮度似乎几乎相同。术语亮度通常用于描述表面的视亮度,与实际亮度不同。在许多情况下,亮度不会随照明的大幅变化而变化,这种现象称为亮度恒常性。
Figure 19.3. Lightness constancy. Cast a shadow over one of the patterns with your hand and notice that the apparent brightness of the two center squares remains nearly the same.
图 19.3。亮度恒常性。用手在其中一个图案上投下阴影,并注意两个中心方块的视亮度几乎保持不变。
The mechanisms by which the human visual system achieves lightness constancy are not well understood. As shown in Figure 19.2, the vision system is relatively insensitive to slowly varying patterns of light, which may serve to discount the effects of slowly varying illumination. Apparent brightness is affected by the brightness of surrounding regions (Figure 19.4). This can aid lightness constancy when regions are illuminated dissimilarly. While this simultaneous contrast effect is often described as a modification of the perceived lightness of one region based on contrasting brightness in the surrounding region, it is actually much more complicated than that (Figures 19.5 and 19.6). For more on lightness perception, see (Gilchrist et al., 1999) and (Adelson, 1999).
人类视觉系统实现亮度恒常性的机制尚不明确。如图 19.2所示,视觉系统对缓慢变化的光线模式相对不敏感,这可能有助于忽略缓慢变化的照明效果。视亮度受周围区域亮度的影响(图 19.4 )。当区域照明不同时,这可以帮助亮度恒常性。虽然这同时对比效应通常被描述为基于周围区域的对比亮度而对一个区域的感知亮度的修改,但实际上它比这复杂得多(图 19.5和19.6 )。有关亮度感知的更多信息,请参阅(Gilchrist 等人,1999)和(Adelson,1999)。
Figure 19.4. (a) Simultaneous contrast: the apparent brightness of the center bar is affected by the brightness of the surrounding area; (b) The same bar without a variable surround.
图 19.4. (a) 同时对比:中心条的视亮度受到周围区域亮度的影响;(b) 没有可变环绕的相同条。
Figure 19.5. The Munker-White illusion shows the complexity of simultaneous contrast. In Figure19.4, the central region looked lighter when the surrounding area was darker. In (a), the gray strips on the left look lighter than the gray strips on the right, even though they are nearly surrounded by regions of white; (b) shows the gray strips without the black lines.
图 19.5。蒙克-怀特错觉显示了同时对比的复杂性。在图 19.4 中,当周围区域较暗时,中心区域看起来较亮。在 (a) 中,左侧的灰色条纹看起来比右侧的灰色条纹更亮,即使它们几乎被白色区域包围;(b) 显示没有黑线的灰色条纹。
Figure 19.6. The perception of lightness is affected by the perception of 3D structure. The two surfaces marked (a) have the same brightness, as do the two surfaces marked (b) (after Adelson (1999)).
图 19.6.亮度感知受 3D 结构感知的影响。标记为 (a) 的两个表面具有相同的亮度,标记为 (b) 的两个表面也具有相同的亮度(根据 Adelson (1999))。
While the visual system largely ignores slowly varying intensity patterns, it is extremely sensitive to edges consisting of lines of discontinuity in brightness. Edges in imaged light intensity often correspond to surface boundaries or other important features in the environment (Figure 19.7). The vision system can also detect localized differences in motion, stereo disparity, texture, and several other image properties. The vision system has very little ability, however, to detect spatial discontinuities in color when not accompanied by differences in one of these other properties.
虽然视觉系统很大程度上忽略了缓慢变化的强度模式,但它对边缘由亮度不连续的线组成。成像光强度的边缘通常对应于表面边界或环境中的其他重要特征(图 19.7 )。视觉系统还可以检测运动、立体视差、纹理和其他几种图像属性的局部差异。然而,当没有伴随这些其他属性之一的差异时,视觉系统几乎没有能力检测颜色的空间不连续性。
Figure 19.7. (a) Original gray scale image, (b) image edges, which are lines of high spatial variability in some direction.
图 19.7. (a)原始灰度图像,(b)图像边缘,即在某些方向上具有高度空间变化性的线。
Perception of edges seems to interact with perception of form. While edges give the visual system the information it needs to recognize shapes, slowly varying brightness can appear as a sharp edge if the resulting edge creates a more complete form (Figure 19.8). Figure 19.9 shows a subjective contour, an extreme form of this effect in which a closed contour is seen even though no such contour exists in the actual image. Finally, the vision system’s sensitivity to edges also appears to be part of the mechanism involved in lightness perception. Note that the region enclosed by the subjective contour in Figure 19.9 appears a bit brighter than the surrounding area of the page. Figure 19.10 shows a different interaction between edges and lightness. In this case, a particular brightness profile at the edge has a dramatic effect on the apparent brightness of the surfaces to either side of the edge.
边缘感知似乎与形式感知相互作用。虽然边缘为视觉系统提供了识别形状所需的信息,但是如果生成的边缘形成了更完整的形式,那么缓慢变化的亮度就会显示为锐利的边缘(图 19.8 )。图 19.9显示了主观轮廓,这是这种效应的一种极端形式,在这种效应中,即使实际图像中不存在闭合轮廓,也会看到闭合轮廓。最后,视觉系统对边缘的敏感性似乎也是亮度感知机制的一部分。请注意,图 19.9中主观轮廓所包围的区域看起来比页面的周围区域要亮一些。图 19.10显示了边缘和亮度之间的不同相互作用。在这种情况下,边缘处的特定亮度分布对边缘两侧表面的视亮度有显著影响。
Figure 19.8. The visual system sometimes sees “edges” even when there are no sharp discontinuities in brightness, as is the case at the right side of the central pattern in this image.
图 19.8.即使亮度没有明显的不连续性,视觉系统有时也能看到“边缘”,就像该图像中中心图案右侧的情况一样。
Figure 19.9. Sometimes, the visual system will “see” subjective contours without any associated change in brightness.
图 19.9。有时,视觉系统会“看到”主观轮廓没有任何相关的亮度变化。
Figure 19.10. Perceived lightness depends more on local contrast at edges than on brightness across surfaces. Try covering the vertical edge in the middle of the figure with a pencil. This figure is an instance of the Craik-O’Brien-Cornsweet illusion.
图 19.10。感知亮度更多地取决于边缘的局部对比度,而不是表面的亮度。尝试用铅笔覆盖图形中间的垂直边缘。此图是Craik-O'Brien-Cornsweet 错觉的一个例子。
As indicated above, people can detect differences in the brightness between two adjacent regions if the difference is at least 1% of the average brightness. This is an example of Weber’s law, which states that there is a constant ratio between the just noticeable differences (jnd) in a stimulus and the magnitude of the stimulus:
如上所述,如果两个相邻区域之间的亮度差异至少为平均亮度的 1%,人们就能检测到这种差异。这是韦伯定律的一个例子,该定律指出,刺激中的最小可察觉差异(jnd) 与刺激的幅度之间存在一个恒定的比率:
where I is the magnitude of the stimulus, ΔI is the magnitude of the just noticeable difference, and k1 is a constant particular to the stimulus. Weber’s law was postulated in 1846 and still remains a useful characterization of many perceptual effects. Fechner’s law, proposed in 1860, generalized Weber’s law in a way that allowed for the description of the strength of any sensory experience, not just jnd’s:
其中I是刺激的量级,Δ I是刚好可察觉差异的量级, k 1是刺激特有的常数。韦伯定律于 1846 年提出,至今仍是许多知觉效应的有用表征。 1860 年提出的费希纳定律推广了韦伯定律,使其能够描述任何感官体验的强度,而不仅仅是 JND 的强度:
where S is the perceptual strength of the sensory experience, I is the physical magnitude of the corresponding stimulus, and k2 is a scaling constant specific to the stimulus. Current practice is to model the association between perceived and actual strength of a stimulus using a power function (Stevens’s law):
其中S是感官体验的感知强度, I是相应刺激的物理量, k 2是特定于刺激的缩放常数。当前的做法是使用幂函数(史蒂文斯定律)来模拟感知刺激强度与实际刺激强度之间的关联:
where S and I are as before, k3 is another scaling constant, and b is an exponent specific to the stimulus. For a large number of perceptual quantities involving vision, b < 1. The CIE L*a*b* color space, described elsewhere, uses a modified Stevens’s law representation to characterize perceptual differences between brightness values. Note that in the first two characterizations of the perceptual strength of a stimulus and in Stevens’s Law when b < 1, changes in the stimulus when it has a small average magnitude create larger perceptual effects than do the same physical change in the stimulus when it has a larger magnitude.
其中S和I与前面相同, k 3是另一个缩放常数, b是特定于刺激的指数。对于大量涉及视觉的感知量, b < 1。其他地方描述的 CIE L * a * b * 颜色空间使用改进的史蒂文斯定律表示来描述亮度值之间的感知差异。请注意,在刺激感知强度的前两个表征中以及在b < 1 的史蒂文斯定律中,当刺激具有较小的平均幅度时,其变化产生的感知效应比当刺激具有较大的幅度时相同的物理变化产生的感知效应更大。
The “laws” described above are not physical constraints on how perception operates. Rather, they are generalizations about how the perceptual system responds to particular physical stimuli. In the field of perceptual psychology, the quantitative study of the relationships between physical stimuli and their perceptual effects is called psychophysics. While psychophysical laws are empirically derived observations rather than mechanistic accounts, the fact that so many perceptual effects are well modeled by simple power functions is striking and may provide insights into the mechanisms involved.
上述“定律”并不是对知觉运作方式的物理限制。相反,它们是对知觉系统如何对特定物理刺激作出反应的概括。在知觉心理学领域,对物理刺激与其知觉效应之间关系的定量研究被称为心理物理学。虽然心理物理定律是通过经验得出的观察结果,而不是机械论的解释,但如此多的感知效应可以通过简单的幂函数很好地建模,这一事实是惊人的,并可能为所涉及的机制提供见解。
In 1666, Isaac Newton used prisms to show that apparently white sunlight could be decomposed into a spectrum of colors and that these colors could be recombined to produce light that appeared white. We now know that light energy is made up of a collection of photons, each with a particular wavelength. The spectral distribution of light is a measure of the average energy of the light at each wavelength. For natural illumination, the spectral distribution of light reflected off of surfaces varies significantly depending on the surface material. Characterizations of this spectral distribution can therefore provide visual information for the nature of surfaces in the environment.
1666 年,艾萨克·牛顿利用棱镜证明,看似白色的阳光可以分解成光谱中存在各种颜色,这些颜色可以重新组合产生看起来是白色的光。我们现在知道光能是由光子组成的,每个光子都有特定的波长。光的光谱分布是衡量每个波长光的平均能量的指标。对于自然照明,从表面反射的光的光谱分布会因表面材料的不同而有很大差异。因此,这种光谱分布的特征可以为环境中表面的性质提供视觉信息。
Most people have a pervasive sense of color when they view the world. Color perception depends on the frequency distribution of light, with the visible spectrum for humans ranging from a wavelength of about 370 nm to a wavelength of about 730 nm (see Figure 19.11). The manner in which the visual systems derives a sense of color from this spectral distribution was first systematically examined in 1801 and remained extremely controversial for 150 years. The problem is that the visual system responds to patterns of spectral distribution very differently than patterns of luminance distribution.
大多数人在观察世界时,都会对色彩产生普遍的感觉。色彩感知取决于光的频率分布,人类可见的光谱范围从波长约 370 nm 到波长约 730 nm(见图19.11 )。视觉系统如何从这种光谱分布中获得色彩感知,这一过程于 1801 年首次得到系统研究,并在 150 年内一直存在极大争议。问题在于,视觉系统对光谱分布模式的反应与对亮度分布模式的反应截然不同。
Figure 19.11. The visible spectrum. Wavelengths are in nanometers.
图 19.11。可见光谱。波长以纳米为单位。
Even accounting for phenomena such as lightness constancy, distinctly different spatial distributions almost always look distinctly different. More importantly given that the purpose of the visual system is to produce descriptions of the distal stimulus given the proximal stimulus, perceived patterns of lightness correspond at least approximately to patterns of brightness over surfaces in the environment. The same is not true of color perception. Many quite different spectral distributions of light can produce a sense of any specific color. Correspondingly, the sense that a surface is a specific color provides little direct information about the spectral distribution of light coming from the surface. For example, a spectral distribution consisting of a combination of light at wavelengths of 700 nm and 540 nm, with appropriately chosen relative strengths, will look indistinguishable from light at the single wavelength of 580 nm. (Perceptually indistinguishable colors with different spectral compositions are referred to as metamers.) If we see the color “yellow,” we have no way of knowing if it was generated by one or the other of these distributions or an infinite family of other spectral distributions. For this reason, in the context of vision the term color refers to a purely perceptual quality, not a physical property.
即使考虑到亮度恒常性等现象,截然不同的空间分布看起来也几乎总是截然不同。更重要的是,鉴于视觉系统的目的是根据近端刺激产生对远端刺激的描述,感知到的亮度模式至少大致对应于环境表面的亮度模式。但颜色感知并非如此。许多完全不同的光光谱分布可以产生任何特定颜色的感觉。相应地,表面是特定颜色的感觉几乎无法提供有关来自表面的光的光谱分布的直接信息。例如,由波长为 700 nm 和 540 nm 的光组合而成的光谱分布,如果选择适当的相对强度,看起来将与单一波长为 580 nm 的光无法区分。(具有不同光谱成分的感知上无法区分的颜色称为同色异谱色。)如果我们看到“黄色”,我们无法知道它是由这些分布中的一种或另一种产生的,还是由无限多的其他光谱分布系列产生的。因此,在视觉的背景下,术语“颜色”指的是纯粹的感知品质,而不是物理属性。
“The history of the investigation of colour vision is remarkable for its acrimony.”
“色彩视觉研究的历史因其激烈的争论而引人注目。”
—Richard Gregory (1997)
—理查德·格雷戈里(1997)
There are two classes of photoreceptors in the human retina. Cones are involved in color perception, while rods are sensitive to light energy across the visible range and do not provide information about color. There are three types of cones, each with a different spectral sensitivity (Figure 19.12). S-cones respond to short wavelengths in the blue range of the visible spectrum. M-cones respond to wavelengths in the middle (greenish) region of the visible spectrum. L-cones respond to somewhat longer wavelengths covering the green and red portions of the visible spectrum.
人类视网膜中有两类光感受器。视锥细胞参与颜色感知,而视杆细胞对可见光范围内的光能敏感,不提供有关颜色的信息。视锥细胞有三种类型,每种类型的光谱敏感度都不同(图 19.12 )。S视锥细胞对可见光谱蓝色范围内的短波长有反应。M视锥细胞对可见光谱中间(绿色)区域的波长有反应。L视锥细胞对覆盖可见光谱绿色和红色部分的稍长波长有反应。
Figure 19.12. Spectral sensitivity of the short, medium, and long cones in the human retina.
图 19.12.人类视网膜中短、中、长视锥细胞的光谱敏感度。
While it is common to describe the three types of cones as red, green, and blue, this is neither correct terminology nor does it accurately reflect the cone sensitivities shown in Figure 19.12. The L-cones and M-cones are broadly tuned, meaning that they respond to a wide range of frequencies. There is also substantial overlap between the sensitivity curves of the three cone types. Taken together, these two properties mean that it is not possible to reconstruct an approximation to the original spectral distribution given the responses of the three cone types. This is in contrast to spatial sampling in the retina (and in digital cameras), where the receptors are narrowly tuned in their spatial sensitivity in order to be able to detect fine detail in local contrast.
虽然通常将三种类型的视锥细胞描述为红色、绿色和蓝色,但这既不是正确的术语,也不能准确反映图 19.12所示的视锥细胞敏感度。L视锥细胞和M 视锥细胞的调节范围很广,这意味着它们对很宽的频率范围作出反应。三种视锥细胞的敏感度曲线之间也有很大的重叠。综合起来,这两个特性意味着,给定三种视锥细胞的响应,不可能重建原始光谱分布的近似值。这与视网膜(和数码相机)中的空间采样形成对比,在视网膜中,受体的空间敏感度调节得很窄,以便能够检测到局部对比度中的精细细节。
The fact that there are are only three types of color sensitive photoreceptors in the human retina greatly simplifies the task of displaying colors on computer monitors and in other graphical displays. Computer monitors display colors as a weighted combination of three fixed-color distributions. Most often, the three colors are a distinct red, a distinct green, and a distinct blue. As a result, in computer graphics, color is often represented by a red-green-blue (RGB) triple, representing the intensities of red, green, and blue primaries needed to display a particular color. Three basis colors are sufficient to display most perceptible colors, since appropriately weighted combinations of three appropriately chosen colors can produce metamers for these perceptible colors.
事实上,人类视网膜中只有三种对颜色敏感的光感受器,这大大简化了在计算机显示器和其他图形显示器上显示颜色的任务。计算机显示器将颜色显示为三种固定颜色分布的加权组合。最常见的是,这三种颜色是不同的红色、不同的绿色和不同的蓝色。因此,在计算机图形学中,颜色通常由红绿蓝(RGB) 三元组表示,表示显示特定颜色所需的红、绿和蓝原色的强度。三种基本颜色足以显示大多数可感知的颜色,因为三种适当选择的颜色的适当加权组合可以产生这些可感知颜色的同色异谱。
There are at least two significant problems with the RGB color representation. The first is that different monitors have different spectral distributions for their red, green, and blue primaries. As a result, perceptually correct color rendition involves remapping RGB values for each monitor. This is, of course, only possible if the original RGB values satisfy some well-defined standard, which is often not the case. (See Chapter 18 for more information on this issue.) The second problem is that RGB values do not define a particular color in a way that corresponds to subjective perception. When we see the color “yellow,” we do not have the sense that it is made up of equal parts of red and green light. Rather, it looks like a single color, with additional properties involving brightness and the “amount” of color. Representing color as the output of the S-cones, M-cones, and L-cones is no help either, since we have no more phenomenological sense of color as characterized by these properties than we do as characterized by RGB display properties.
RGB 颜色表示至少存在两个重大问题。首先,不同的显示器的红、绿和蓝原色具有不同的光谱分布。因此,感知正确的颜色再现涉及为每台显示器重新映射 RGB 值。当然,这只有在原始 RGB 值满足某些明确定义的标准时才有可能,而情况往往并非如此。(有关此问题的更多信息,请参阅第 18 章。)第二个问题是 RGB 值不能以与主观感知相对应的方式定义特定颜色。当我们看到“黄色”时,我们不会感觉到它是由相等的红光和绿光组成的。相反,它看起来像一种单一的颜色,具有亮度和颜色“量”等附加属性。将颜色表示为 S 视锥细胞、M 视锥细胞和 L 视锥细胞的输出也无济于事,因为我们对这些属性所表征的颜色的现象学感知并不比我们对 RGB 显示属性所表征的颜色的现象学感知更多。
There are two different approaches to characterizing color in a way that more closely reflects human perception. The various CIE color spaces aim to to be “perceptually uniform” so that the magnitude of the difference in the represented values of two colors is proportional to the perceived difference in color (Wyszecki & Stiles, 2000). This turns out to be a difficult goal to accomplish, and there have been several modifications to the CIE model over the years. Furthermore, while one of the dimensions of the CIE color spaces corresponds to perceived brightness, the other two dimensions that specify chromaticity have no intuitive meaning.
有两种不同的方法可以更准确地反映人类的感知,从而描述颜色。各种 CIE 颜色空间都力求“感知一致”,以便两种颜色所表示值的差异大小与感知的颜色差异成正比 (Wyszecki & Stiles, 2000)。事实证明,这是一个难以实现的目标,多年来 CIE 模型已多次修改。此外,虽然 CIE 颜色空间的一个维度对应于感知亮度,但指定色度的另外两个维度没有直观含义。
The second approach to characterizing color in a more natural manner starts with the observation that there are three distinct and independent properties that dominate the subjective sense of color. Lightness, the apparent brightness of a surface, has already been discussed. Saturation refers to the purity or vividness of a color. Colors can range from totally unsaturated gray to partially saturated pastels to fully saturated “pure” colors. The third property, hue, corresponds most closely to the informal sense of the word “color” and is characterized in a manner similar to colors in the visible spectrum, ranging from dark violet to dark red. Figure 19.13 shows a plot of the hue-saturation-lightness (HSV) color space. Since the relationship between brightness and lightness is both complex and not well understood, HSV color spaces almost always use brightness instead of attempting to estimate lightness. Unlike wavelengths in the spectrum, however, hue is usually represented in a manner that reflects the fact that the extremes of the visible spectrum are actually similar in appearance (Figure 19.14). Simple transformations exist between RGB and HSV representations of a particular color value. As a result, while the HSV color space is motivated by perceptual considerations, it contains no more information than does an RGB representation.
第二种以更自然的方式描述颜色的方法是从观察开始,有三个不同的、独立的属性主导着对颜色的主观感觉。亮度,即表面的视亮度,已经讨论过了。饱和度是指颜色的纯度或鲜艳度。颜色的范围可以从完全不饱和的灰色到部分饱和的粉彩色,再到完全饱和的“纯”色。第三个属性,色相,与“颜色”一词的非正式含义最为接近,其特征与可见光谱中的颜色类似,范围从深紫色到深红色。图 19.13显示了色相-饱和度-亮度 (HSV) 颜色空间的图。由于亮度和亮度之间的关系既复杂又不太容易理解,因此 HSV 颜色空间几乎总是使用亮度而不是试图估计亮度。然而,与光谱中的波长不同,色相通常以一种反映可见光谱的极端在外观上实际上相似的事实的方式表示(图 19.14 )。特定颜色值的 RGB 和 HSV 表示之间存在简单的转换。因此,尽管 HSV 颜色空间是出于感知考虑,但它所包含的信息并不比 RGB 表示多。
Figure 19.13. HSV color space. Hue varies around the circle, saturation varies with radius, and value varies with height.
图 19.13。HSV颜色空间。色调沿圆周变化,饱和度随半径变化,值随高度变化。
Figure 19.14. Which color is closer to red: green or violet?
图 19.14。哪种颜色更接近红色:绿色还是紫色?
The hue-saturation-lightness approach to describing color is based on the spectral distribution at a single point and so only approximates the perceptual response to spectral distributions of light distributed over space. Color perception is subject to similar constancy and simultaneous contrast effects as is light-ness/brightness, neither of which are captured in the RGB representation and as a result are not captured in the HSV representation. For an example of color constancy, look at a piece of white paper indoors under incandescent light and outdoors under direct sunlight. The paper will look “white” in both cases, even though incandescent light has a distinctly yellow hue and so the light reflected off of the paper will also have a yellow hue, while sunlight has a much more uniform color spectrum.
描述颜色的色相-饱和度-亮度方法基于单点的光谱分布,因此仅近似于对分布在空间中的光的光谱分布的感知响应。颜色感知受制于与亮度/亮度类似的恒定性和同时对比效应,RGB 表示法无法捕捉到这两者,因此 HSV 表示法也无法捕捉到这两者。举一个颜色恒定性的例子,看看室内白炽灯下的一张白纸和室外阳光直射下的一张白纸。在这两种情况下,纸张看起来都是“白色”,尽管白炽灯具有明显的黄色色调,因此从纸张反射出来的光也会有黄色色调,而阳光的色谱要均匀得多。
Another aspect of color perception not captured by either the CIE color spaces or HSV encoding is the fact that we see a small number of distinct colors when looking at a continuous spectrum of visible light (Figure 19.11) or in a naturally occurring rainbow. For most people, the visible spectrum appears to be divided into four to six distinct colors: red, yellow, green, and blue, plus perhaps light blue and purple. Considering non-spectral colors as well, there are only 11 basic color terms commonly used in English: red, green, blue, yellow, black, white, gray, orange, purple, brown, and pink. The partitioning of the intrinsically continuous space of spectral distributions into a relatively small set of perceptual categories associated with well-defined linguistic terms seems to be a basic property of perception, not just a cultural artifact (Berlin & Kay, 1969). The exact nature of the process, however, is not well understood.
CIE 颜色空间或 HSV 编码都无法捕捉到颜色感知的另一个方面是,当我们观察连续的可见光光谱(图 19.11 )或自然产生的彩虹时,我们只能看到少量不同的颜色。对大多数人来说,可见光谱似乎分为四到六种不同的颜色:红、黄、绿和蓝,也许还有淡蓝色和紫色。考虑到非光谱颜色,英语中常用的基本颜色词只有 11 个:红、绿、蓝、黄、黑、白、灰、橙、紫、棕色和粉色。将本质上连续的光谱分布空间划分为一组相对较小的感知类别并与明确定义的语言术语相关联,这似乎是感知的基本属性,而不仅仅是文化产物(Berlin & Kay,1969)。但该过程的确切性质尚不十分清楚。
Natural illumination varies in intensity over 6 orders of magnitude (Figure 19.15). The human vision system is able to operate over this full range of brightness levels. However, at any one point in time, the visual system is only able to detect variations in light intensity over a much smaller range. As the average brightness to which the visual system is exposed changes over time, the range of discriminable brightnesses changes in a corresponding manner. This effect is most obvious if we move rapidly from a brightly lit outdoor area to a very dark room. At first, we are able to see little. After a while, however, details in the room start to become apparent. The dark adaptation that occurs involves a number of physiological changes in the eye. It takes several minutes for significant dark adaptation to occur and 40 minutes or so for complete dark adaptation. If we then move back into the bright light, not only is vision difficult but it can actually be painful. Light adaptation is required before it is again possible to see clearly. Light adaptation occurs much more quickly than dark adaptation, typically requiring less than a minute.
自然光照的强度变化范围超过 6 个数量级(图 19.15 )。人类视觉系统能够在整个亮度范围内工作。然而,在任何一个时间点,视觉系统只能检测到一个更小范围内的光强度变化。随着视觉系统所处的平均亮度随时间变化,可辨别的亮度范围也会相应变化。如果我们从明亮的室外区域快速移动到非常黑暗的房间,这种影响最为明显。起初,我们几乎看不到什么。然而,过了一会儿,房间里的细节开始变得明显。黑暗适应过程涉及眼睛的一系列生理变化。显著的暗适应需要几分钟才能发生,完全的暗适应则需要 40 分钟左右。如果我们再次回到明亮的光线下,不仅视力困难,而且实际上会很痛苦。需要先进行光适应,然后才能再次看得清楚。光适应比暗适应发生得快得多,通常只需不到一分钟。
Figure 19.15. Approximate luminance level of a white surface under different types of illumination in candelas per meter squared (cd/m2). (Wandell, 1995).
图 19.15.不同类型照明下白色表面的近似亮度水平,单位为坎德拉/平方米 (cd/m 2 )。(Wandell, 1995)。
The two classes of photoreceptors in the human retina are sensitive to different ranges of brightness. The cones provide visual information over most of what we consider normal lighting conditions, ranging from bright sunlight to dim indoor lighting. The rods are only effective at very low light levels. Photopic vision involves bright light in which only the cones are effective. Scotopic vision involves dark light in which only the rods are effective. There is a range of intensities within which both cones and rods are sensitive to changes in light, which is referred to as mesopic conditions (see Chapter 21).
人类视网膜中的两类光感受器对不同范围的亮度敏感。视锥细胞在大多数我们认为正常的照明条件下提供视觉信息,从明亮的阳光到昏暗的室内照明。视杆细胞仅在极低的光照水平下有效。明视涉及明亮的光线,在这种光线下只有视锥细胞有效。暗视涉及暗光,在这种光线下只有视杆细胞有效。在一定强度范围内,视锥细胞和视杆细胞都对光线变化敏感,这被称为中视条件(参见第 21 章)。
Each eye in the human visual system has a field-of-view of approximately 160° horizontal by 135° vertical. With binocular viewing, there is only partial overlap between the fields-of-view of the two eyes. This results in a wider overall field-of-view (approximately 200° horizontal by 135° vertical), with the region of overlap being approximately 120° horizontal by 135° vertical.
人类视觉系统中每只眼睛的视野约为水平 160°、垂直 135°。双眼观看时,两只眼睛的视野只有部分重叠。这导致整体视野更宽(水平约 200°、垂直约 135°),重叠区域约为水平 120°、垂直约 135°。
With normal or corrected-to-normal vision, we usually have the subjective experience of being able to see relatively fine detail wherever we look. This is an illusion, however. Only a small portion of the visual field of each eye is actually sensitive to fine detail. To see this, hold a piece of paper covered with normalsized text at arm’s length, as shown in Figure 19.16. Cover one eye with the hand not holding the paper. While staring at your thumb and not moving your eye, note that the text immediately above your thumb is readable while the text to either side is not. High acuity vision is limited to a visual angle slightly larger than your thumb held at arm’s length. We do not normally notice this because the eyes usually move frequently, allowing different regions of the visual field to be viewed at high resolution. The visual system then integrates this information over time to produce the subjective experience of the whole visual field being seen at high resolution.
视力正常或矫正至正常时,我们通常会主观地觉得无论看向何处,都能看到相对精细的细节。然而,这只是一种错觉。每只眼睛的视野中只有一小部分实际上对精细细节敏感。要看到这一点,请将一张写满正常大小文本的纸张拿在一臂之远,如图 19.16所示。用不拿纸的手遮住一只眼睛。在盯着拇指而不移动眼睛的情况下,请注意拇指正上方的文本可读,而两侧的文本则不可读。高敏锐度视力仅限于比一臂之远的拇指稍大的视角。我们通常不会注意到这一点,因为眼睛通常频繁移动,从而可以高分辨率查看视野的不同区域。然后,视觉系统会随着时间的推移整合这些信息,产生以高分辨率看到整个视野的主观体验。
Figure 19.16. If you hold a page of text at arm’s length and stare at your thumb, only the text near your thumb will be readable. Photo by Peter Shirley.
图 19.16。如果你拿着一页文本,距离手臂很远,盯着拇指看,那么只有靠近拇指的文本才可读。照片由 Peter Shirley 拍摄。
There is not enough bandwidth in the human visual cortex to process the information that would result if there was a dense sampling of image intensity over the whole of the retina. The combination of variable density photoreceptor packing in the retina and a mechanism for rapid eye movements to point at areas of interest provides a way to simultaneously optimize acuity and field-of-view. Other animals have evolved different ways of balancing acuity and field-of-view that are not dependent on rapid eye movements. Some have only high acuity vision, but limited to a narrow field-of-view. Others have wide field-of-view vision, but limited ability to see detail.
如果对整个视网膜的图像强度进行密集采样,人类视觉皮层中的带宽不足以处理所产生的信息。视网膜中可变密度的光感受器填充与快速眼球运动指向感兴趣区域的机制相结合,提供了一种同时优化敏锐度和视野的方法。其他动物已经进化出不同的平衡敏锐度和视野的方法,这些方法不依赖于快速眼球运动。有些动物只有高敏锐度视力,但视野很窄。其他动物有宽视野,但看细节的能力有限。
The eye motions which focus areas of interest in the environment on the fovea are called saccades. Saccades occur very quickly. The time from a triggering stimulus to the completion of the eye movement is 150–200 ms. Most of this time is spent in the vision system planning the saccade. The actual motion takes 20 ms or so on average. The eyes are moving very quickly during a saccade, with the maximum rotational velocity often exceeding 500°/second. Between saccades, the eyes point toward an area of interest (fixate), taking 300 ms or so to acquire fine detail visual information. The mechanism by which multiple fixations are integrated to form an overall subjective sense of fine detail over a wide field of view is not well understood.
将环境中感兴趣区域聚焦在中央凹上的眼球运动称为扫视。扫视发生得非常快。从触发刺激到眼球运动完成的时间是 150-200 毫秒。其中大部分时间都花在视觉系统规划扫视上。实际运动平均需要 20 毫秒左右。眼球在扫视过程中移动非常快,最大旋转速度往往超过 500°/秒。在扫视之间,眼睛指向感兴趣的区域(注视),大约需要 300 毫秒来获取精细的细节视觉信息。多个注视点如何整合起来形成对宽视野内精细细节的整体主观感受的机制尚不清楚。
Figure 19.17 shows the variable packing density of cones and rods in the human retina. The cones, which are responsible for vision under normal lighting, are packed most closely at the fovea of the retina (Figure 19.17). When the eye is fixated at a particular point in the environment, the image of that point falls on the fovea. The higher packing density of cones at the fovea results in a higher sampling frequency of the imaged light (see Chapter 9) and hence greater detail in the sampled pattern. Foveal vision encompasses about 1.7°, which is the same visual angle as the width of your thumb held at arm’s length.
图 19.17显示了人眼视网膜中视锥细胞和视杆细胞的不同密度。视锥细胞负责正常照明下的视觉,它们最密集地聚集在视网膜的中央凹处(图 19.17 )。当眼睛注视环境中的某个特定点时,该点的图像就会落在中央凹上。中央凹处视锥细胞的密度越高,成像光的采样频率就越高(参见第 9 章),因此采样模式的细节就越多。中央凹视觉大约涵盖 1.7°,这与手臂伸直时拇指的宽度相同。
Figure 19.17. Density of rods and cone in the human retina (after Osterberg (1935)).
图 19.17.人类视网膜中视杆细胞和视锥细胞的密度(Osterberg (1935) 之后)。
While a version of Figure 19.17 appears in most introductory texts on human visual perception, it provides only a partial explanation for the neurophysiological limitations on visual acuity. The output of individual rods and cones is pooled in various ways by neural interconnects in the eye, before the information is shipped along the optic nerve to the visual cortex. 3 This pooling filters the signal provided by the pattern of incident illumination in ways that have important impacts on the patterns of light that are detectable. In particular, the farther away from the fovea, the larger the area over which brightness is averaged. As a consequence, spatial acuity drops sharply away from the fovea. Most figures showing rod and cone packing density indicate the location of the retinal blind spot, where the nerve bundle carrying optical information from the eye to the brain passes through the retina, and there is no sensitivity to light. By and large, the only practical impact of the blind spot on real-world perception is its use as an illusion in introductory perception texts, since normal eye movements otherwise compensate for the temporary loss of information.
尽管大多数关于人类视觉感知的入门教材中都会出现图 19.17的版本,但它只能部分解释视觉敏锐度的神经生理限制。各个视杆细胞和视锥细胞的输出通过眼睛中的神经互连以各种方式汇集,然后信息沿着视神经传送到视觉皮层。3这种汇集会过滤由入射光模式提供的信号,而这种过滤方式对可检测到的光模式有重要影响。具体而言,距离中央凹越远,平均亮度的面积就越大。因此,远离中央凹后空间敏锐度会急剧下降。大多数显示视杆细胞和视锥细胞填充密度的图表都表明了视网膜盲点的位置,即传递光学信息从眼睛到大脑的神经束穿过视网膜时,该神经束对光不敏感。总的来说,盲点对现实世界感知的唯一实际影响是在介绍性感知文本中作为一种幻觉,因为正常的眼球运动可以弥补信息的暂时丢失。
3 All of the cells in the optic nerve and almost all cells in the visual cortex have an associated retinal receptive field. Patterns of light hitting the retina outside of a cell’s receptive field have no effect on the firing rate of that cell.
3视神经中的所有细胞和视觉皮层中的几乎所有细胞都具有相关的视网膜感受野。照射到细胞感受野之外的视网膜上的光模式不会影响该细胞的放电率。
As shown in Figure 19.17, the packing density of rods drops to zero at the center of the fovea. Away from the fovea, the rod density first increases and then decreases. One result of this is that there is no foveal vision when illumination is very low. The lack of rods in the fovea can be demonstrated by observing a night sky on a moonless night, well away from any city lights. Some stars will be so dim that they will be visible if you look at a point in the sky slightly to the side of the star, but they will disappear if you look directly at them. This occurs because when you look directly at these features, the image of the features falls only on the cones in the retina, which are not sufficiently light sensitive to detect the feature. Looking slightly to the side causes the image to fall on the more light-sensitive cones. Scotopic vision is also limited in acuity, in part because of the lower density of rods over much of the retina and in part because greater pooling of signals from the rods occurs in the retina in order to increase the light sensitivity of the visual information passed back to the brain.
如图 19.17所示,视杆细胞的密度在中央凹处降至零。远离中央凹的地方,视杆细胞的密度先增加然后减少。这样做的一个结果是,当光照很低时,就没有中央凹视觉。在没有月亮的夜晚,远离城市灯光的情况下观察夜空可以证明中央凹缺乏视杆细胞。有些星星非常暗淡,如果你看天空中星星稍微侧面的某个点,它们是可见的,但是如果你直视它们,它们就会消失。发生这种情况的原因是,当你直视这些特征时,这些特征的图像只会落在视网膜中的视锥细胞上,而视锥细胞对光的敏感度不足以检测到这些特征。稍微向侧面看会使图像落在对光更敏感的视锥细胞上。暗视视觉的敏锐度也有限,部分原因是视网膜大部分区域的视杆细胞密度较低,部分原因是视网膜中汇集了更多的视杆细胞信号,以增加传回大脑的视觉信息的光敏度。
When reading about visual perception and looking at static figures on a printed page, it is easy to forget that motion is pervasive in our visual experience. The patterns of light that fall on the retina are constantly changing due to eye and body motion and the movement of objects in the world. This section covers our ability to detect visual motion. Section 19.3.4 describes how visual motion can be used to determine geometric information about the environment. Section 19.4.3 deals with the use of motion to guide our movement through the environment.
当阅读有关视觉感知的文章并查看印刷页面上的静态图形时,很容易忘记运动在我们的视觉体验中无处不在。由于眼睛和身体的运动以及世界上物体的运动,落在视网膜上的光图案不断变化。本节介绍我们检测视觉运动的能力。第 19.3.4 节描述了如何使用视觉运动来确定有关环境的几何信息。第 19.4.3 节讨论了如何使用运动来引导我们在环境中的运动。
The detectability of motion in a particular pattern of light falling on the retina is a complex function of speed, direction, pattern size, and contrast. The issue is further complicated because simultaneous contrast effects occur for motion perception in a manner similar to that observed in brightness perception. In the extreme case of a single small pattern moving against a contrasting, homogenous background, perceivable motion requires a rate of motion corresponding to 0.2°–0.3°/second of visual angle. Motion of the same pattern moving against a textured pattern is detectable at about a tenth this speed.
落在视网膜上的特定光图案的运动可检测性是速度、方向、图案大小和对比度的复杂函数。这个问题更加复杂,因为运动感知中同时发生的对比效应与亮度感知中观察到的相似。在单个小图案在对比均匀的背景下移动的极端情况下,可感知的运动需要对应于 0.2°-0.3°/秒视角的运动速率。相同图案在纹理图案上移动的运动可以以大约十分之一的速度被检测到。
With this sensitivity to retinal motion, combined with the frequency and velocity of saccadic eye movements, it is surprising that the world usually appears stable and stationary when we view it. The vision system accomplishes this in three ways. Contrast sensitivity is reduced during saccades, reducing the visual effects generated by these rapid changes in eye position. Between saccades, a variety of sophisticated and complex mechanisms adjust eye position to compensate for head and body motion and the motion of objects of interest in the world. Finally, the visual system exploits information about the position of the eyes to assemble a mosaic of small patches of high-resolution imagery from multiple fixations into a single, stable whole.
由于视网膜运动敏感,再加上眼球扫视运动的频率和速度,令人惊讶的是,我们看到的世界通常看起来是稳定和静止的。视觉系统通过三种方式实现这一点。在扫视过程中,对比敏感度会降低,从而降低眼位快速变化产生的视觉效果。在扫视之间,各种精密复杂的机制会调整眼位,以补偿头部和身体的运动以及世界中感兴趣物体的运动。最后,视觉系统利用有关眼睛位置的信息,将来自多个注视点的高分辨率图像小块拼接成一个稳定的整体。
The motion of straight lines and edges is ambiguous if no endpoints or corners are visible, a phenomenon referred to as the aperture problem (Figure 19.18). The aperture problem arises because the component of motion parallel to the line or edge does not produce any visual changes. The geometry of the real world is sufficiently complex that this rarely causes difficulties in practice, except for intentional illusions such as barber poles. The simplified geometry and texturing found in some computer graphics renderings, however, has the potential to introduce inaccuracies in perceived motion.
如果没有可见的端点或角,直线和边缘的运动就会变得模糊,这种现象被称为光圈问题(图 19.18 )。光圈问题的出现是因为平行于线或边缘的运动分量不产生任何视觉变化。现实世界的几何形状足够复杂,这在实践中很少造成困难,除非是故意制造的错觉,例如理发店的招牌。然而,在某些计算机图形渲染中发现的简化几何形状和纹理可能会导致感知运动不准确。
Figure 19.18. The aperture problem: (a) If a straight line or edge moves in such a way that its endpoints are hidden, the visual information is not sufficient to determine the actual motion of the line. (b) 2D motion of a line is unambiguous if there are any corners or other distinctive markings on the line.
图 19.18光圈问题:(a) 如果直线或边缘的移动方式使其端点被隐藏,则视觉信息不足以确定该线的实际运动。(b) 如果线上有任何角或其他独特标记,则线的 2D 运动是明确的。
Real-time computer graphics, film, and video would not be possible without an important perceptual phenomena: discontinuous motion, in which a series of static images are visible for discrete intervals in time and then move by discrete intervals in space, can be nearly indistinguishable from continuous motion. The effect is called apparent motion to highlight that the appearance of continuous motion is an illusion.
如果没有一个重要的感知现象,实时计算机图形、电影和视频就不可能实现:不连续运动,即一系列静态图像在时间上以离散间隔可见,然后在空间中以离散间隔移动,几乎无法与连续运动区分开来。这种效果被称为假运动,以强调连续运动的出现是一种错觉。
Figure 19.19 illustrates the difference between continuous motion, which is typical of the real world, and apparent motion, which is generated by almost all dynamic image display devices. The motion plotted in Figure 19.19 (b) consists of an average motion comparable to that shown in Figure 19.19 (a), modulated by a high space-time frequency that accounts for the alternation between a stationary pattern and one that moves discontinuously to a new location. Apparent perception of continuous motion occurs because the visual system is insensitive to the high-frequency component of the motion.
图 19.19说明了连续运动(现实世界的典型特征)与表观运动(几乎所有动态图像显示设备都会产生)之间的区别。图 19.19 (b) 中绘制的运动由与图 19.19 (a) 中所示的运动相当的平均运动组成,该运动受高时空频率调制,该频率解释了静止模式与不连续移动到新位置的模式之间的交替。由于视觉系统对运动的高频分量不敏感,因此会出现连续运动的表观感知。
Figure 19.19. (a) Continuous motion. (b) Discontinuous motion with the same average velocity. Under some circumstances, the perception of these two motion patterns may be similar.
图 19.19。 (a)连续运动。(b)平均速度相同的不连续运动。在某些情况下,这两种运动模式的感知可能相似。
A compelling sense of apparent motion occurs when the rate at which individual images appear is above about 10 Hz, as long as the positional changes between successive images is not too great. This rate is not fast enough, however, to produce a satisfying sense of continuous motion for most image display devices. Almost all such devices introduce brightness variation as one image is switched to the next. In well-lit conditions, the human visual system is sensitive to this varying brightness for rates of variations up to about 80 Hz. In lower light, detectability is present up to about 40 Hz. When the rate of alternating brightness is sufficiently high, flicker fusion occurs and the variation is no longer visible.
当单个图像出现的速率高于 10 Hz 时,只要连续图像之间的位置变化不是太大,就会产生明显的运动感。但是,对于大多数图像显示设备来说,这个速率还不够快,无法产生令人满意的连续运动感。几乎所有此类设备在切换图像时都会引入亮度变化。在光线充足的条件下,人类视觉系统对这种变化的亮度很敏感,变化率最高可达 80 Hz。在光线较弱的情况下,可检测性最高可达 40 Hz。当亮度交替率足够高时,发生闪烁融合,变化不再可见。
To produce a compelling sense of visual motion, an image display must therefore satisfy two separate constraints:
为了产生引人注目的视觉运动感,图像显示必须满足两个独立的约束:
images must be updated at a rate ≥ 10 Hz;
图像必须以≥10 Hz的速率更新;
any flicker introduced in the process of updating images must occur at a rate ≥ 60–80 Hz.
更新图像过程中引入的任何闪烁都必须以≥60–80 Hz 的速率发生。
One solution is to require that the image update rate be greater than or equal to 60–80 Hz. In many situations, however, this is simply not possible. For computer graphics displays, the frame computation time is often substantially greater than 12–15 msec. Transmission bandwidth and limitations of older monitor technologies limit normal broadcast television to 25–30 images per second. (Some HDTV formats operate at 60 images/sec.) Movies update images at 24 frames/second due to exposure time requirements and the mechanical difficulties of physically moving film any faster than that.
一种解决方案是要求图像更新率大于或等于 60–80 Hz。然而,在许多情况下,这根本是不可能的。对于计算机图形显示器,帧计算时间通常远大于 12–15 毫秒。传输带宽和旧显示器技术的限制将普通广播电视限制为每秒 25–30 张图像。(某些高清电视格式以 60 张图像/秒的速度运行。)由于曝光时间要求以及以更快的速度物理移动胶片的机械困难,电影以 24 帧/秒的速度更新图像。
Different display technologies solve this problem in different ways. Computer displays refresh the displayed image at ~70–80 Hz, regardless of how often the contents of the image change. The term frame rate is ambiguous for such displays, since two values are required to characterize this display: refresh rate, which indicates the rate at which the image is redisplayed and frame update rate,which indicates the rate at which new images are generated for display. Standard nonHDTV broadcast television uses a refresh rate of 60 Hz (NTSC, used in North America and some other locations) or 50 Hz (PAL, used in most of the rest of the world). The frame update rate is half the refresh rate. Instead of displaying each new image twice, the display is interlaced by dividing alternating horizontal image lines into even and odd fields and alternating the display of these even and odd fields. Flicker is avoided in movies by using a mechanical shutter to blink each frame of the film three times before moving to the next frame, producing a refresh rate of 72 Hz while maintaining the frame update rate of 24 Hz.
不同的显示技术以不同的方式解决这个问题。无论图像内容的变化频率如何,计算机显示器都会以约 70-80 Hz 的频率刷新显示的图像。术语帧速率对于此类显示器来说是不明确的,因为需要两个值来表征这种显示器:刷新率,表示重新显示图像的速率,帧更新率,表示生成新图像以供显示的速率。标准非高清电视广播电视使用 60 Hz(NTSC,用于北美和其他一些地区)或 50 Hz(PAL,用于世界其他大部分地区)的刷新率。帧更新率是刷新率的一半。显示不是将每个新图像显示两次,而是通过将交替的水平图像线分成偶数场和奇数场并交替显示这些偶数场和奇数场来进行隔行扫描。电影中通过使用机械快门使影片的每一帧闪烁三次再移动到下一帧来避免闪烁,产生 72 Hz 的刷新率,同时保持 24 Hz 的帧更新率。
The use of apparent motion to simulate continuous motion occasionally produces undesirable artifacts. Best known of these is the wagon wheel illusion in which the spokes of a rotating wheel appear to revolve in the opposite direction from what would be expected given the translational motion of the wheel. The wagon wheel illusion is an example of temporal aliasing. Spokes, or other spatially periodic patterns on a rotating disk, produce a temporally periodic signal for viewing locations that are fixed with respect to the center of the wheel or disk. Fixed frame update rates have the effect of sampling this temporally periodic signal in time. If the temporal frequency of the sampled pattern is too high, undersampling results in an aliased, lower temporal frequency appearing when the image is displayed. Under some circumstances, this distortion of temporal frequency causes a spatial distortion in which the wheel appears to move backwards. Wagon wheel illusions are more likely to occur with movies than with video, since the temporal sampling rate is lower.
使用视运动来模拟连续运动有时会产生不良伪影。其中最著名的是车轮错觉,即旋转车轮的辐条似乎以与车轮平移运动预期相反的方向旋转。车轮错觉是时间混叠的一个例子。轮辐或旋转盘上的其他空间周期性图案会产生时间周期性信号,用于观看相对于车轮或盘中心固定的位置。固定帧更新率具有及时采样此时间周期性信号的效果。如果采样图案的时间频率过高,则欠采样会导致在显示图像时出现混叠的较低时间频率。在某些情况下,这种时间频率的扭曲会导致空间扭曲,使车轮似乎向后移动。车轮错觉在电影中比在视频中更容易出现,因为时间采样率较低。
Problems can also occur when apparent motion imagery is converted from one medium to another. This is of particular concern when 24 Hz movies are transferred to video. Not only does a non-interlaced format need to be translated to an interlaced format, but there is no straightforward way to move from 24 frames per second to 50 or 60 fields per second. Some high-end display devices have the ability to partially compensate for the artifacts introduced when film is converted to video.
当将视运动图像从一种介质转换为另一种介质时,也会出现问题。当将 24 Hz 电影转换为视频时,这个问题尤其令人担忧。不仅需要将非隔行格式转换为隔行格式,而且没有直接的方法将每秒 24 帧转换为每秒 50 或 60 场。一些高端显示设备能够部分补偿将电影转换为视频时引入的伪影。
One of the critical operations performed by the visual system is the estimation of geometric properties of the visible environment, since these are central to determining information about objects, locations, and events. Vision has sometimes been described as inverse optics, to emphasize that one function of the visual system is to invert the image formation process in order to determine the geometry, materials, and lighting in the world that produced a particular pattern on light on the retina. The central problem for a vision system is that properties of the visible environment are confounded in the patterns of light imaged on the retina. Brightness is a function of both illumination and reflectance, and can depend on environmental properties across large regions of space due to the complexities of light transport. Image locations of a projected environmental location at best can be used to constrain the position of that location to a half-line. As a consequence, it is rarely possible to uniquely determine the nature of the world that produced a particular imaged pattern of light.
视觉系统执行的关键操作之一是估计可见环境的几何特性,因为这些特性对于确定有关物体、位置和事件的信息至关重要。视觉有时被描述为逆光学,强调视觉系统的功能之一是反转图像形成过程,以确定在视网膜上产生特定光图案的世界的几何形状、材料和照明。视觉系统的核心问题是可见环境的属性在视网膜上成像的光图案中是混乱的。亮度是照明和反射的函数,并且由于光传输的复杂性,可能取决于大片空间区域的环境属性。投影环境位置的图像位置最多可用于将该位置的位置限制为半线。因此,很少可能唯一地确定产生特定光成像图案的世界的性质。
Determining surface layout—the location and orientation of visible surfaces in the environment—is thought to be a key step in human vision. Most discussions of how the vision system extracts information about surface layout from the patterns of light it receives divide the problem into a set of visual cues, with each cue describing a particular visual pattern which can be used to infer properties of surface layout along with the needed rules of inference. Since surface layout can rarely be determined accurately and unambiguously from vision alone, the process of inferring surface layout usually requires additional, nonvisual information. This can come from other senses or assumptions about what is likely to occur in the real world.
确定表面布局(环境中可见表面的位置和方向)被认为是人类视觉的关键步骤。关于视觉系统如何从接收到的光模式中提取有关表面布局的信息的大多数讨论都将问题分为一组视觉线索,每个线索描述一个特定的视觉模式,可用于推断表面布局的属性以及所需的推理规则。由于仅凭视觉很少能准确无误地确定表面布局,因此推断表面布局的过程通常需要额外的非视觉信息。这些信息可能来自其他感官或对现实世界中可能发生的事情的假设。
Visual cues are typically categorized into four categories. Ocularmotor cues involve information about the position and focus of the eyes. Disparity cues involve information extracted from viewing the same surface point with two eyes, beyond that available just from the positioning of the eyes. Motion cues provide information about the world that arises from either the movement of the observer or the movement of objects. Pictorial cues result from the process of projecting 3D surface shapes onto a 2D pattern of light that falls on the retina. This section deals with the visual cues relevant to the extraction of geometric information about individual points on surfaces. More general extraction of location and shape information is covered in Section 19.4.
视觉提示通常分为四类。眼球运动线索涉及眼睛位置和焦点的信息。视差线索涉及用两只眼睛观察同一表面点时提取的信息,而不仅仅是眼睛定位所能提供的信息。运动线索提供有关世界的信息,这些信息源自观察者的运动或物体的运动。图形提示是将 3D 表面形状投射到落在视网膜上的 2D 光图案上的过程的结果。本节讨论与提取表面上各个点的几何信息相关的视觉提示。第 19.4 节介绍了更一般的位置和形状信息提取。
Descriptions of the location and orientation of points on a visible surface must be done within the context of a particular frame of references that specifies the origin, orientation, and scaling of the coordinate system used in representing the geometric information. The human vision system uses multiple frames of reference, partially because of the different sorts of information available from different visual cues and partly because of the different purposes to which the information is put (Klatzky, 1998). Egocentric representations are defined with respect to the viewer’s body. They can be subdivided into coordinate systems fixed to the eyes, head, or body. Allocentric representations, also called exocentric representations, are defined with respect to something external to the viewer. Allocentric frames of reference can be local to some configuration of objects in the environment or can be globally defined in terms of distinctive locations, gravity, or geographic properties.
可见表面上点的位置和方向的描述必须在特定参考框架的背景下进行,该参考框架指定用于表示几何信息的坐标系的原点、方向和缩放比例。人类视觉系统使用多个参考框架,部分是因为不同的视觉线索可提供不同类型的信息,部分是因为信息的用途不同(Klatzky,1998)。自我中心表征是相对于观察者的身体定义的。它们可以细分为固定在眼睛、头部或身体上的坐标系。他心表征,也称为外心表征,是相对于观察者外部的某个事物定义的。他心参考框架可以是环境中某些物体配置的局部,也可以是根据独特位置、重力或地理属性进行全局定义的。
The distance from the viewer to a particular visible location in the environment, expressed in an egocentric representation, is often referred to as depth in the perception literature. Surface orientation can be represented in either egocentric or allocentric coordinates. In egocentric representations of orientation, the term slant is used to refer to the angle between the line of sight to the point and the surface normal at the point, while the term tilt refers to the orientation of the projection of the surface normal onto a plane perpendicular to the line of sight.
观察者与环境中特定可见位置之间的距离,以自我中心表示法表示,在感知文献中通常称为深度。表面方向可以用自我中心坐标或他心坐标表示。在方向的自我中心表示法中,术语倾斜度用于指视线与该点的表面法线之间的角度,而术语倾斜度指表面法线在垂直于视线的平面上的投影的方向。
Distance and orientation can be expressed in a variety of measurement scales. Absolute descriptions are specified using a standard that is not part of the sensed information itself. These can be culturally defined standards (e.g., meters), or standards relative to the viewer’s body (e.g., eye height, the width of one’s shoulders). Relative descriptions relate one perceived geometric property to another (e.g., point a is twice as far away as point b). Ordinal descriptions are a special case of relative measure in which the sign, but not the magnitude, of the relation is all that is represented. Table 19.1 provides a list of the most commonly considered visual cues, along with a characterization of the sorts of information they can potentially provide.
距离和方向可以用各种测量尺度来表示。绝对描述使用不属于感知信息本身的标准来指定。这些可以是文化定义的标准(例如米),也可以是相对于观看者身体的标准(例如眼高、肩宽)。相对描述将一种感知到的几何属性与另一种几何属性联系起来(例如点a距离b点的两倍远)。序数描述是相对测量的一个特例,其中只表示关系的符号,而不是大小。表 19.1列出了最常考虑的视觉线索,以及它们可能提供的信息类型的特征。
Cue |
a |
r |
o |
Requirements for Absolute Depth |
---|---|---|---|---|
Accommodation |
x |
x |
x |
very limited range |
Binocular convergence |
x |
x |
x |
limited range |
Binocular disparity |
- |
x |
x |
limited range |
Linear perspective, height in picture, horizon ratio |
x |
x |
x |
requires viewpoint height |
Familiar size |
x |
x |
x |
|
Relative size |
- |
x |
x |
|
Aerial perspective |
? |
x |
x |
adaptation to local conditions |
Absolute motion parallax |
? |
x |
x |
requires viewpoint velocity |
Relative motion parallax |
- |
- |
x |
|
Texture gradients |
- |
x |
||
Shading |
- |
x |
||
Occlusion |
- |
- |
x |
Ocularmotor information about depth results directly from the muscular control of the eyes. There are two distinct types of ocularmotor information. Accommodation is the process by which the eye optically focuses at a particular distance. Convergence (often referred to as vergence) is the process by which the two eyes are pointed toward the same point in three-dimensional space. Both accommodation and convergence have the potential to provide absolute information about depth.
关于深度的眼球运动信息直接来自眼睛的肌肉控制。眼球运动信息有两种不同类型。调节是眼睛在特定距离处光学聚焦的过程。会聚(通常称为会聚)是两只眼睛指向三维空间中同一点的过程。调节和会聚都有可能提供有关深度的绝对信息。
Physiologically, focusing in the human eye is accomplished by distorting the shape of the lens at the front of the eye. The vision system can infer depth from the amount of this distortion. Accommodation is a relatively weak cue to distance and is ineffective beyond about 2 m. Most people have increasing difficulty in focusing over a range of distances as they get beyond about 45 years old. For them, accommodation becomes even less effective.
从生理学上讲,人眼聚焦是通过扭曲眼睛前部晶状体的形状来实现的。视觉系统可以根据这种扭曲程度推断深度。调节是距离的一个相对较弱的线索,在 2 米以外无效。大多数人在超过 45 岁后,在一定距离范围内聚焦的难度越来越大。对他们来说,调节变得更加无效。
Those not familiar with the specifics of visual perception sometimes confuse depth estimation from accommodation with depth information arising out of the blur associated with limited depth-of-field in the eye. The accommodation depth cue provides information about the distance to that portion of the visual field that it is in focus. It does not depend on the degree to which other portions of the visual field are out of focus, other than that blur is used by the visual system to adjust focus. Depth-of-field does seem to provide a degree of ordinal depth information (Figure 19.20), though this effect has received only limited investigation.
那些不熟悉视觉感知细节的人有时会将来自调节的深度估计与眼睛有限景深相关的模糊产生的深度信息相混淆。调节深度线索提供有关视野中聚焦部分的距离的信息。它不依赖于视野其他部分失焦的程度,只是视觉系统使用模糊来调整焦点。景深似乎确实提供了一定程度的序数深度信息(图 19.20 ),尽管这种影响仅得到有限的研究。
Figure 19.20. Does the central square appear in front of the pattern of circles or is it seen as appearing through a square hole in the pattern of circles? The only difference in the two images is the sharpness of the edge between the line and circle patterns (Marshall, Burbeck, Arely, Rolland, and Martin (1999), used by permission).
图 19.20。中央正方形出现在圆圈图案的前面,还是看起来像是穿过圆圈图案中的方形孔出现?两幅图像的唯一区别是线条和圆圈图案之间边缘的清晰度(Marshall、Burbeck、Arely、Rolland 和 Martin (1999),经许可使用)。
If two eyes fixate on the same point in space, trigonometry can be used to determine the distance from the viewer to the viewed location (Figure 19.21). For the simplest case, in which the point of interest is directly in front of the viewer,
如果两只眼睛注视空间中的同一点,则可以使用三角学来确定从观看者到观看位置的距离(图 19.21 )。对于最简单的情况,兴趣点就在观看者的正前方,
Figure 19.21. The vergence of the two eyes provides information about the distance to the point on which the eyes are fixated.
图 19.21.两只眼睛的会聚度提供了到眼睛注视点的距离的信息。
where z is the distance to a point in the world, ipd is the interpupillary distance indicating the distance between the eyes, and θ is the vergence angle indicating the orientation of the eyes relative to straight ahead. For small θ, which is the case for the geometric configuration of human eyes, tanθ ≈ θ when θ is expressed in radians. Thus, differences in vergence angle specify differences in depth by the following relationship:
其中z是到世界上某一点的距离, ipd是瞳孔间距表示两眼之间的距离,θ 是辐辏角,表示眼睛相对于直视前方的方向。对于较小的 θ(人眼的几何结构),当 θ 以弧度表示时,tan θ ≈ θ 。因此,辐辏角的差异通过以下关系指定深度的差异:
As θ → 0 in uniform steps, Δz gets increasingly larger. This means that stereo vision is less sensitive to changes in depth as the overall depth increases. Convergence in fact only provides information on absolute depth for distances out to a few meters. Beyond that, changes in distance produce changes in vergence angle that are too small to be useful.
随着θ均匀→0, Δz变得越来越大。这意味着随着整体深度的增加,立体视觉对深度变化的敏感度降低。事实上,会聚仅提供几米以内的绝对深度信息。除此之外,距离的变化会导致会聚角的变化太小而无用。
There is an interaction between accommodation and convergence in the human visual system: accommodation is used to help determine the appropriate vergence angle, while vergence angle is used to help set the focus distance. Normally, this helps the visual system when there is uncertainty is setting either accommodation or vergence. However, stereographic computer displays break the relationship between focus and convergence that occurs in the real world, leading to a number of perceptual difficulties (Wann, Rushton, & Mon-Williams, 1995).
在人类视觉系统中,调节和会聚之间存在相互作用:调节用于帮助确定适当的会聚角,而会聚角用于帮助设置焦距。通常,当不确定设置调节或会聚时,这会帮助视觉系统。然而,立体计算机显示器打破了现实世界中焦点和会聚之间的关系,导致了许多感知困难(Wann、Rushton 和 Mon-Williams,1995 年)。
The vergence angle of the eyes, when fixated on a common point in space, is only one of the ways that the visual system is able to determine depth from binocular stereo. A second mechanism involves a comparison of the retinal images in the two eyes and does not require information about where the eyes are pointed. A simple example demonstrates the effect. Hold your arm straight out in front of you, with your thumb pointed up. Stare at your thumb and then close one eye. Now, simultaneously open the closed eye and close the open eye. Your thumb will appear to be more or less stationary, while the more distant surfaces seen behind your thumb will appear to move from side to side (Figure 19.22). The change in retinal position of points in the scene between the left and right eyes is called disparity.
当双眼注视空间中的共同点时,双眼的聚散角只是视觉系统根据双目立体视觉确定深度的方式之一。第二种机制涉及比较两只眼睛的视网膜图像,并且不需要有关眼睛指向何处的信息。一个简单的例子可以演示这种效果。将手臂伸直放在前面,拇指向上。盯着拇指,然后闭上一只眼睛。现在,同时睁开闭着的眼睛并闭上睁着的眼睛。您的拇指看起来或多或少是静止的,而拇指后面看到的较远的表面似乎在左右移动(图 19.22 )。左右眼之间场景中点的视网膜位置的变化称为视差。
Figure 19.22. Binocular disparity. The view from the left and right eyes shows an offset for surface points at depths different from the point of fixation. Images courtesy Peter Shirley.
图 19.22。双眼视差。左眼和右眼的视图显示,与注视点深度不同的表面点存在偏移。图片由 Peter Shirley 提供。
The binocular disparity cue requires that the vision system be able to match the image of points in the world in one eye with the imaged locations of those points in the other eye, a process referred to as the correspondence problem. This is a relatively complicated process and is only partially understood. Once correspondences have been established, the relative positions on which particular points in the world project onto the left and right retinas indicate whether the points are closer than or farther away than the point of fixation. Crossed disparity occurs when the corresponding points are displaced outward relative to the fovea and indicates that the surface point is closer than the point of fixation. Uncrossed disparity occurs when the corresponding points are displaced inward relative to the fovea and indicates that the surface point is farther away than the point of fixation (Figure 19.23). 4 Binocular disparity is a relative depth cue, but it can provide information about absolute depth when scaled by convergence. Equation(19.5) applies to binocular disparity as well as binocular convergence. As with convergence, the sensitivity of binocular disparity to changes in depth decreases with depth.
双眼视差线索要求视觉系统能够将一只眼睛中世界中的点的图像与另一只眼睛中这些点的成像位置相匹配,这个过程称为对应问题。这是一个相对复杂的过程,而且只有一部分人能够理解。一旦建立了对应关系,世界中特定点投射到左右视网膜上的相对位置将表明这些点是比注视点更近还是更远。当对应点相对于中央凹向外移动时,就会发生交叉视差,这表明表面点比注视点更近。当对应点相对于中央凹向内移动时,就会发生非交叉视差,这表明表面点比注视点更远(图 19.23 )。4双眼视差是一种相对深度线索,但通过会聚缩放后,它可以提供有关绝对深度的信息。公式(19.5)适用于双眼视差和双眼会聚。与会聚一样,双眼视差对深度变化的敏感度随着深度的增加而降低。
Figure 19.23. Near the line of sight, surface points nearer than the fixation point produce disparities in the opposite direction from those associated with surface points more distant than the fixation point.
图 19.23.在视线附近,比注视点近的表面点与比注视点远的表面点相关的视差在相反方向上产生。
4 Technically, crossed and uncrossed disparities indicate that the surface point generating the disparity is closer to or farther away from the horopter. The horopter is not a fixed distance away from the eyes but rather it is a curved surface passing through the point of fixation.
4从技术上讲,交叉视差和非交叉视差表示产生视差的表面点距离单视界更近或更远。单视界与眼睛的距离不是固定的,而是通过注视点的曲面。
Relative motion between the eyes and visible surfaces will produce changes in the image of those surfaces on the retina. Three-dimensional relative motion between the eye and a surface point produces two-dimensional motion of the projection of the surface point on the retina. This retinal motion is given the name optic flow. Optic flow serves as the basis for several types of depth cues. In addition, optic flow can be used to determine information about how a person is moving in the world and whether or not a collision is imminent (Section 19.4.3).
眼睛和可见表面之间的相对运动会导致这些表面在视网膜上的图像发生变化。眼睛和表面点之间的三维相对运动会导致表面点在视网膜上的投影发生二维运动。这种视网膜运动被称为光流。光流是几种深度线索的基础。此外,光流还可用于确定一个人在世界中如何移动以及是否即将发生碰撞的信息(第 19.4.3 节)。
If a person moves to the side while continuing to fixate on some surface point, then optic flow provides information about depth similar to stereo disparity. This is referred to as motion parallax. For other surface points that project to retinal locations near the fixation point, zero optic flow indicates a depth equivalent to the fixation point; flow in the opposite direction to head translation indicates nearer points, equivalent to crossed disparity; and flow in the same direction as head translation indicates farther points, equivalent to uncrossed disparity (Figure 19.24). Motion parallax is a powerful cue to relative depth. In principle, motion parallax can provide absolute depth information if the visual system has access to information about the velocity of head motion. In practice, motion parallax appears at best to be a weak cue for absolute depth.
如果某人向侧面移动同时继续注视某个表面点,那么光流提供的深度信息类似于立体视差。这被称为运动视差。对于投射到注视点附近视网膜位置的其他表面点,零光流表示深度相当于注视点;与头部平移方向相反的流表示较近的点,相当于交叉视差;与头部平移方向相同的流表示较远的点,相当于非交叉视差(图 19.24 )。运动视差是相对深度的有力线索。原则上,如果视觉系统能够获得有关头部运动速度的信息,运动视差就可以提供绝对深度信息。实际上,运动视差充其量只是绝对深度的一个弱线索。
In addition to egocentric depth information due to motion parallax, visual motion can also provide information about the three-dimensional shape of objects moving relative to the viewer. In the perception literature, this is known as the kinetic depth effect. In computer vision, it is referred to as structure-from-motion. The kinetic depth effect presumes that one component of object motion is rotation in depth, meaning that there is a component of rotation around an axis perpendicular to the line of sight.
除了由于运动视差而产生的以自我为中心的深度信息外,视觉运动还可以提供有关相对于观看者移动的物体的三维形状的信息。在感知文献中,这被称为动能深度效应。在计算机视觉中,它被称为运动结构。动能深度效应假定物体运动的一个分量是深度旋转,这意味着有一个绕垂直于视线的轴的旋转分量。
Figure 19.24. (a) Motion parallax generated by sideways movement to the right while looking at an extended ground plane. (b) The same motion, with eye tracking of the fixation point.
图 19.24。 (a) 在注视延伸的地面时向右侧向移动产生的运动视差。 (b) 相同的运动,眼球跟踪注视点。
Optic flow can also provide information about the shape and location of surface boundaries, as shown in Figure 19.25. Spatial discontinuities in optic flow almost always either correspond to depth discontinuities or result from independently moving objects. Simple comparisons of the magnitude of optic flow are insufficient to determine the sign of depth changes, except in the special case of a viewer moving through an otherwise static world. Even when independently moving objects are present, however, the sign of the change in depth across surface boundaries can often be determined by other means. Motion often changes the portion of the more distant surface visible at surface boundaries. The appearance ( accretion) or disappearance ( deletion) of surface texture occurs because the nearer, occlud ing surface progressively uncovers or covers portions of the more distant, occlud ed surface. Comparisons of the motion of surface texture to either side of a boundary can also be used to infer ordinal depth, even in the absence of accretion or deletion of the texture. Discontinuities in optic flow and accretion/deletion of surface texture are referred to as dynamic occlusion cues and are another powerful source of visual information about the spatial structure of the environment.
光流还可以提供有关表面边界的形状和位置的信息,如图 19.25所示。光流中的空间不连续性几乎总是与深度不连续相对应或由独立移动的物体引起。简单地比较光流的大小不足以确定深度变化的符号,除非在观察者在一个原本静止的世界中移动的特殊情况。然而,即使存在独立移动的物体,跨表面边界的深度变化符号也常常可以通过其他方式确定。运动通常会改变在表面边界处可见的较远表面的部分。表面纹理的出现(累积)或消失(删除)是因为较近的遮挡表面逐渐揭示或覆盖较远的遮挡表面的部分。即使在没有纹理累积或删除的情况下,比较表面纹理与边界两侧的运动也可以用于推断序数深度。光流的不连续性和表面纹理的增生/删除被称为动态遮挡线索,是有关环境空间结构的另一个强大的视觉信息来源。
Figure 19.25. Discontinuities in optic flow signal surface boundaries. In many cases, the sign of the depth change (i.e., the ordinal depth) can be determined.
图 19.25.光流信号表面边界的不连续性。在许多情况下,可以确定深度变化的符号(即序数深度)。
The speed that a viewer is traveling relative to points in the world cannot be determined from visual motion alone (see Section 19.4.3). Despite this limitation, it is possible to use visual information to determine the time it will take to reach a visible point in the world, even when speed cannot be determined. When velocity is constant, time-to-contact (often referred to as time-to-collision)isgivenbythe retinal size of an entity toward which the observer is moving, divided by the rate at which that image size is increasing. 5 In the biological vision literature, this is often called the τ function (Lee & Reddish, 1981). If distance information to the structure in the world on which the time-to-collision estimate is based is available, then this can be used to determine speed.
仅从视觉运动无法确定观察者相对于世界上各点的行进速度(请参见第 19.4.3 节)。尽管存在此限制,但是即使在无法确定速度的情况下,也可以使用视觉信息来确定到达世界上某个可见点所需的时间。当速度恒定时,接触时间(通常称为碰撞时间)等于观察者移动的实体的视网膜大小除以该图像大小增加的速率。5 在生物视觉文献中,这通常称为τ函数(Lee & Reddish,1981)。如果可以获得碰撞时间估计所基于的世界上结构的距离信息,则可以使用该信息来确定速度。
5 The terms time-to-collision and time-to-contact are misleading, since contact will only occur if the viewer’s trajectory actually passes through or near the entity under view.
5碰撞时间和接触时间这两个术语具有误导性,因为只有当观察者的轨迹实际穿过或靠近所观察的实体时,才会发生接触。
An image can contain much information about the spatial structure of the world from which it arose, even in the absence of binocular stereo or motion. As evidence for this, note that the world still appears three-dimensional even if we close one eye, hold our head stationary, and nothing moves in the environment. (As discussed in Section 19.5, the situation is more complicated in the case of photographs and other displayed images.) There are three classes of such pictorial depth cues. The best known of these involve linear perspective. There are also a number of occlusion cues that provide information about ordinal depth even in the absence of perspective. Finally, illumination cues involving shading, shadows and interreflections, and aerial perspective also provide visual information about spatial layout.
即使没有双目立体视觉或运动,图像也可以包含大量有关其所处世界的空间结构的信息。作为证据,请注意,即使我们闭上一只眼睛,保持头部静止,并且环境中没有任何移动,世界仍然看起来是三维的。(如第 19.5 节所述,照片和其他显示图像的情况更为复杂。)此类图像深度线索有三类。其中最著名的是线性透视。还有许多遮挡线索即使在没有透视的情况下也能提供有关序数深度的信息。最后,涉及阴影、影子和相互反射的照明线索以及空中透视也提供有关空间布局的视觉信息。
The term linear perspective is often used to refer to properties of images involving object size in the image scaled by distance, the convergence of parallel lines, the ground plane extending to a visible horizon, and the relationship between the distance to objects on the ground plane and the image location of those objects relative to the horizon (Figure 19.26). More formally, linear perspective cues are those visual cues which exploit the fact that under perspective projection, the image location onto which points in the world are projected is scaled by , where z is the distance from the point of projection to the point in the environment. Direct consequences of this relationship are that points that are farther away are projected to points closer to the center of the image (convergence of parallel lines) and that the spacing between the image of points in the world decreases for more distant world points (object size in the image is scaled by distance). 6 The fact that the image of an infinite flat surface in the world ends at a finite horizon is explained by examining the perspective projection equation as z → ∞.
线性透视这个术语通常用于指代图像的属性,包括图像中物体的大小按距离缩放、平行线的汇聚、延伸到可见地平线的地面,以及地面上物体的距离与这些物体相对于地平线的图像位置之间的关系(图 19.26 )。更正式地说,线性透视线索是那些利用以下事实的视觉线索:在透视投影下,世界上的点投影到的图像位置按以下比例缩放: 1 2 ,其中z是从投影点到环境中点的距离。这种关系的直接后果是,较远的点被投影到更靠近图像中心的点(平行线会聚),并且对于较远的世界点,世界点的图像之间的间距会减小(图像中的物体大小按距离缩放) 。6通过检查透视投影方程z → ∞,可以解释无限平面在世界中的图像在有限的地平线处结束的事实。
Figure 19.26. The classical linear perspective effects include object size scaled by distance, the convergence of parallel lines, the ground plane extending to a visible horizon, and position on the ground plane relative to the horizon. Image courtesy Sam Pullara.
图 19.26。经典的线性透视效果包括物体大小按距离缩放、平行线会聚、地平面延伸到可见的地平线以及地平面相对于地平线的位置。图片由 Sam Pullara 提供。
With the exception of size-related effects described in Section 19.4.2, most pictorial depth cues involving linear perspective depend on objects of interest being in contact with a ground plane. In effect, these cues estimate not the distance to the objects but, instead, the distance to the contact point on the ground plane. Assuming observer and object are both on top of a horizontal ground plane, then locations on the ground plane lower in the view will be close. Figure 19.27 illustrates this effect quantitatively. For a viewpoint h above the ground and an angle of declination θ between the horizon and a point of interest on the ground, the point in question is a distance d =h cotθ from the point at which the observer is standing. The angle of declination provides relative depth information for arbitrary fixed viewpoints and can provide absolute depth when scaling by eye height ( h) is possible.
除了第 19.4.2 节中描述的与尺寸相关的效果之外,大多数涉及线性透视的图形深度线索都依赖于感兴趣的物体与地面的接触。实际上,这些线索不是估计到物体的距离,而是估计到地面接触点的距离。假设观察者和物体都位于水平地面之上,则视图中地面上较低的位置将会很接近。图 19.27定量说明了这种效应。对于高于地面的视点h和地平线与地面上的兴趣点之间的偏角 θ ,所讨论的点与观察者站立点的距离为d = h cotθ 。偏角为任意固定视点提供相对深度信息,当可以按眼睛高度 ( h ) 缩放时,可以提供绝对深度。
Figure 19.27. Absolute distance to locations on the ground plane can be determined based on declination angle from the horizon and eye height.
图 19.27.可以根据与地平线的偏角和眼睛的高度来确定到地面上位置的绝对距离。
6 The actual mathematics for analyzing the specifics of biological vision are different, since eyes are not well approximated by the planar projection formulation used in computer graphics and most other imaging applications.
6分析生物视觉细节的实际数学是不同的,因为眼睛不能很好地近似于计算机图形学和大多数其他成像应用中使用的平面投影公式。
While the human visual system almost certainly makes use of angle of declination as a depth cue, the exact mechanisms used to acquire the needed information are not clear. The angle θ could be obtained relative to either gravity or the visible horizon. There is some evidence that both are used in human vision. Eye height h could be based on posture, visually determined by looking at the ground at one’s feet, or learned by experience and presumed to be constant. While a number of researchers have investigated this issue, if and how these values are determined is not yet known with certainty.
虽然人类视觉系统几乎肯定会利用倾斜角作为深度线索,但获取所需信息的确切机制尚不清楚。角度 θ 可以相对于重力或可见地平线获得。有证据表明,两者都用于人类视觉。眼高h可能基于姿势,通过观察脚下的地面进行视觉确定,或者通过经验学习并假定为恒定的。虽然许多研究人员已经调查了这个问题,但这些值是否以及如何确定尚不确定。
Shadows provide a variety of types of information about three-dimensional spatial layout. Attached shadows indicate that an object is in contact with another surface, often consisting of the ground plane. Detached shadows indicate that an object is close to some surface, but not in contact with that surface. Shadows can serve as an indirect depth cue by causing an object to appear at the depth of the location of the shadow on the ground plane (Yonas, Goldsmith, & Hallstrom, 1978). When utilizing this cue, the visual system seems to make the assumption that light is coming from directly above (Figure 19.28).
阴影提供了有关三维空间布局的各种信息。附着阴影表示物体与另一个表面接触,通常由地面组成。分离阴影表示物体靠近某个表面,但未与该表面接触。阴影可以作为间接深度提示,使物体出现在地面上阴影位置的深度处(Yonas、Goldsmith 和 Hallstrom,1978 年)。当利用这个提示时,视觉系统似乎假设光线来自正上方(图 19.28 )。
Figure 19.28. Shadows can indirectly function as a depth cue by associating the depth of an object with a location on the ground plane (after Kersten, Mamassian, and Knill (1997)).
图 19.28.阴影可以通过将物体的深度与地面上的位置联系起来,间接地充当深度线索(根据 Kersten、Mamassian 和 Knill (1997))。
Vision provides information about surface orientation as well as distance. It is convenient to represent visually determined surface orientation in terms of tilt, defined as the orientation in the image of the projection of the surface normal, and slant, defined as the angle between the surface normal and the line of sight.
视觉提供关于表面方向和距离的信息。用倾斜度(定义为表面法线投影图像中的方向)和斜度(定义为表面法线和视线之间的角度)来表示视觉确定的表面方向非常方便。
A visible surface horizon can be used to find the orientation of an (effectively infinite) surface relative to the viewer. Determining tilt is straightforward, since the tilt of the surface is the orientation of the visible horizon. Slant can be recovered as well, since the lines of sight from the eye point to the horizon define a plane parallel to the surface. In many situations, either the surface horizon is not visible or the surface is small enough that its far edge does not correspond to an actual horizon. In such cases, visible texture can still be used to estimate orientation.
可见表面地平线可用于确定(实际上无限的)表面相对于观察者的方向。确定倾斜度很简单,因为表面的倾斜度就是可见地平线的方向。倾斜度也可以恢复,因为从眼睛指向地平线的视线定义了一个与表面平行的平面。在许多情况下,要么表面地平线不可见,要么表面太小,以至于其远边缘与实际地平线不对应。在这种情况下,可见纹理仍可用于估计方向。
In the context of perception, the term texture refers to visual patterns consisting of sub-patterns replicated over a surface. The sub-patterns and their distribution can be fixed and regular, as for a checkerboard, or consistent in a more statistical sense, as in the view of a grassy field. 7 When a textured surface is viewed from an oblique angle, the projected view of the texture is distorted relative to the actual markings on the surface. Two quite distinct types of distortions occur (Knill, 1998), both affected by the amount of slant. The position and size of texture elements are subject to the linear perspective effects described above. This produces a texture gradient (Gibson, 1950) due to both element size and spacing decreasing with distance (Figure 19.29(a)). Both the image of individual texture elements and the distribution of elements are foreshortened under oblique viewing (Figure 19.29(b)). This produces a compression in the direction of tilt. For example, an obliquely viewed circle appears as an ellipse, with the ratio of the minor to major axes equal to the cosine of the slant. Note that foreshortening itself is not a result of linear perspective, though in practice both linear perspective and foreshortening provide information about slant. 8
在感知的背景下,纹理一词是指由在表面上复制的子图案组成的视觉图案。子图案及其分布可以是固定的和规则的,如棋盘,也可以是更具统计意义的一致,如草地的视图。7从斜角观察纹理表面时,纹理的投影视图会相对于表面上的实际标记发生扭曲。会发生两种截然不同的扭曲(Knill,1998),两者都受倾斜量的影响。纹理元素的位置和大小受上述线性透视效应的影响。这会产生纹理梯度(Gibson,1950),因为元素大小和间距都会随着距离的减小而减小(图 19.29(a) )。在斜视下,单个纹理元素的图像和元素的分布都会缩短(图 19.29(b) )。这会在倾斜方向上产生压缩。例如,斜视的圆呈现为椭圆,其短轴与长轴之比等于倾斜的余弦。请注意,透视本身不是线性透视的结果,尽管在实践中线性透视和透视都提供了有关倾斜的信息。8
Figure 19.29. Texture cues for slant. (a) Near surface exhibiting compression and texture gradient;(b) distant surface exhibiting only compression; (c) variability in appearance of near surface with regular geometric variability.
图 19.29。倾斜的纹理线索。(a)近表面表现出压缩和纹理梯度;(b)远表面仅表现出压缩;(c)近表面外观的变化具有规则的几何变化。
7 In computer graphics, the term texture has a different meaning, referring to any image that is applied to a surface as part of the rendering process.
7在计算机图形学中,术语“纹理”具有不同的含义,指的是作为渲染过程的一部分应用于表面的任何图像。
8 A third form of visual distortion occurs when surfaces with distinct 3D surface relief are viewed obliquely (Leung & Malik, 1997), as shown in Figure 19.29(c). Nothing is currently known about if or how this effect might be used by the human vision system to determine slant.
8第三种视觉扭曲形式发生在倾斜观看具有明显 3D 表面浮雕的表面时(Leung & Malik, 1997),如图 19.29(c)所示。目前尚不清楚人类视觉系统是否或如何使用这种效应来确定倾斜度。
For texture gradients to serve as a cue to surface slant, the average size and spacing of texture elements must be constant over the textured surface. If spatial variability in size and spacing in the image is not due in its entirely to the projection process, then attempts to invert the effects of projection will produce incorrect inferences about surface orientation. Likewise, the foreshortening cue fails if the shape of texture elements is not isotropic, since then asymmetric texture element image shapes would occur in situations not associated with oblique viewing. These are examples of the assumptions often required in order for spatial visual cues to be effective. Such assumptions are reasonable to the degree that they reflect commonly occurring properties of the world.
要使纹理梯度成为表面倾斜的提示,纹理元素的平均大小和间距必须在纹理表面上保持不变。如果图像中大小和间距的空间变化并非完全归因于投影过程,那么试图反转投影的影响将产生关于表面方向的错误推断。同样,如果纹理元素的形状不是各向同性的,则透视提示也会失败,因为不对称的纹理元素图像形状会出现在与斜视无关的情况下。这些是通常需要的假设的例子,以使空间视觉提示有效。这些假设是合理的,因为它们反映了世界上普遍存在的属性。
Shading also provides information about surface shape (Figure 19.30). The brightness of viewed points on a surface depends on the surface reflectance and the orientation of the surface with respect to directional light sources and the observation point. When the relative position of an object, viewing direction, and illumination direction remain fixed, changes in brightness over a constant reflectance surface are indications of changes in the orientation of the surface of the object. Shape-from-shading is the process of recovering surface shape from these variations in observed brightness. It is almost never possible to recover the actual orientation of surfaces from shading alone, though shading can often be combined with other cues to provide an effective indication of surface shape. For surfaces with fine-scale geometric variability, shading can provide a compelling three-dimensional appearance, even for an image rendered on a two-dimensional surface (Figure 19.31).
阴影还能提供关于表面形状的信息(图 19.30 )。表面上观察点的亮度取决于表面反射率以及表面相对于定向光源和观察点的方向。当物体的相对位置、观察方向和照明方向保持不变时,恒定反射率表面上的亮度变化表明物体表面方向发生了变化。从阴影恢复形状是从观察到的亮度变化中恢复表面形状的过程。仅凭阴影几乎不可能恢复表面的实际方向,不过阴影通常可以与其他线索结合使用,以有效地指示表面形状。对于具有精细几何变化的表面,即使对于在二维表面上渲染的图像,阴影也可以提供引人注目的三维外观(图 19.31 )。
Figure 19.30. Shape-from-shading. The images in (a) and (b) appear to have different 3D shapes because of differences in the rate of change of brightness over their surfaces.
图 19.30.由阴影生成形状。(a) 和 (b) 中的图像看起来有不同的 3D 形状,因为它们表面的亮度变化率不同。
Figure 19.31. Shading can generate a strong perception of three-dimensional shape. In this figure, the effect is stronger if you view the image from several meters away using one eye. It becomes yet stronger if you place a piece of cardboard in front of the figure with a hole cut out slightly smaller than the picture (see Section 19.5). Image courtesy Albert Yonas.
图 19.31。阴影可以产生强烈的三维形状感知。在此图形中,如果您用一只眼睛从几米外观看图像,效果会更明显。如果您在图形前面放一块纸板,纸板上切出的洞比图片略小,效果会更加明显(参见第 19.5 节)。图片由 Albert Yonas 提供。
There are a number of pictorial cues that yield ordinal information about depth, without directly indicating actual distance. In line drawings, different types of junctions provide constraints on the 3D geometry that could have generated the drawing (Figure 19.32). Many of these effects occur in more natural images as well. Most perceptually effective of the junction cues are T-junctions, which are strong indicators that the surface opposite the stem of the T is occluding at least one more distant surface. T-junctions often generate a sense of amodal completion, in which one surface is seen to continue behind a nearer, occluding surface (Figure 19.33).
有许多图形线索可以提供有关深度的序数信息,而无需直接指示实际距离。在线图中,不同类型的连接点对可能生成图形的 3D 几何图形提供了约束(图 19.32 )。其中许多效果也出现在更自然的图像中。在连接点线索中,感知上最有效的是T 形连接点,它们强烈表明 T 字干对面的表面遮挡了至少一个更远的表面。T 形连接点通常会产生一种非模态完成感,其中一个表面被视为延续到较近的遮挡表面后面(图 19.33 )。
Figure 19.32. (a) Junctions provide information about occlusion and the convexity or concavity of corners. (b) Common junction types for planar surface objects.
图 19.32。 (a)连接点提供有关遮挡和角的凸度或凹度的信息。(b)平面物体的常见连接点类型。
Figure 19.33. T-junctions cause the left disk to appear to be continuing behind the rectangle, while the right disk appears in front of the rectangle, which is seen to continue behind the disk.
图 19.33。T型连接导致左侧圆盘看起来在矩形后面延伸,而右侧圆盘出现在矩形前面,矩形看起来在圆盘后面延伸。
Atmospheric effects cause visual changes that can provide information about depth, particularly outdoors over long distances. Leonardo da Vinci was the first to describe aerial perspective (also called atmospheric perspective), in which scattering reduces the contrast of distant portions of the scene and causes them to appear more bluish than if they were nearer (da Vinci, 1970) (see Figure 19.34). Aerial perspective is predominately a relative depth cue, though there is some speculation that it may affect perception of absolute distance as well. While many people believe that more distant objects look blurrier due to atmospheric effects, atmospheric scattering actually causes little blur.
大气效应会引起视觉变化,从而提供有关深度的信息,尤其是在户外远距离的情况下。列奥纳多·达芬奇是第一个描述空气透视(也称为大气透视)的人,其中散射降低了场景远处部分的对比度,并使它们看起来比近处更蓝(达芬奇,1970 年)(见图19.34 )。空气透视主要是相对深度线索,尽管有人猜测它可能也会影响对绝对距离的感知。虽然许多人认为由于大气效应,远处的物体看起来更模糊,但大气散射实际上几乎不会造成模糊。
Figure 19.34. Aerial perspective, in which atmospheric effects reduce contrast and shift colors toward blue, provides a depth cue over long distances.
图 19.34.空中透视,其中大气效应降低对比度并使颜色向蓝色偏移,提供了远距离的深度提示。
While there is fairly wide agreement among current vision scientists that the purpose of vision is to extract information about objects, locations, and events, there is little consensus on the key features of what information is extracted, how it is extracted, or how the information is used to perform tasks. Significant controversies exist about the nature of object recognition and the potential interactions between object recognition and other aspects of perception. Most of what we know about location involves low-level spatial vision, not issues associated with spatial relationships between complex objects or the visual processes required to navigate in complex environments. We know a fair amount about how people perceive their speed and heading as they move through the world, but have only a limited understanding of actual event perception. Visual attention involves aspects of the perception of objects, locations, and events. While there is much data about the phenomenology of visual attention for relatively simple and well-controlled stimuli, we know much less about how visual attention serves high-level perceptual goals.
虽然目前视觉科学家普遍认为视觉的目的是提取有关物体、位置和事件的信息,但对于提取什么信息、如何提取信息或如何使用信息执行任务的关键特征,几乎没有达成共识。关于物体识别的性质以及物体识别与感知其他方面之间的潜在相互作用存在重大争议。我们对位置的了解大部分涉及低级空间视觉,而不是与复杂物体之间的空间关系或在复杂环境中导航所需的视觉过程相关的问题。我们对人们在穿越世界时如何感知速度和方向了解很多,但对实际事件感知的理解却有限。视觉注意力涉及物体、位置和事件感知的各个方面。虽然有很多关于相对简单且控制良好的刺激的视觉注意力现象学的数据,但我们对视觉注意力如何服务于高级感知目标的了解却少得多。
Object recognition involves segregating an image into constituent parts corresponding to distinct physical entities and determining the identity of those entities. Figure 19.35 illustrates a few of the complexities associated with this process. We have little difficulty recognizing that the image on the left is some sort of vehicle, even though we have never before seen this particular view of a vehicle nor do most of us typically associate vehicles with this context. The image on the right is less easily recognizable until the page is turned upside down, indicating an orientational preference in human object recognition.
物体识别涉及将图像划分为与不同物理实体相对应的组成部分,并确定这些实体的身份。图 19.35说明了与此过程相关的一些复杂性。我们很容易就能识别出左侧的图像是某种车辆,即使我们以前从未见过这种车辆的特定视图,而且我们大多数人通常也不会将车辆与此上下文联系起来。右侧的图像不太容易识别,除非将页面翻转过来,这表明人类在物体识别中存在方向偏好。
Figure 19.35. The complexities of object recognition. (a) We recognize a vehicle-like object even though we have likely never seen this particular view of a vehicle before. (b) The image is hard to recognize based on a quick view. It becomes much easier to recognize if the book is turned upside down.
图 19.35。物体识别的复杂性。(a)尽管我们以前可能从未见过这种车辆的特定视图,但我们仍能识别出类似车辆的物体。(b)仅凭快速查看很难识别图像。如果将书倒过来,识别起来就会容易得多。
Object recognition is thought to involve two, fairly distinct steps. The first step organizes the visual field into groupings likely to correspond to objects and surfaces. These grouping processes are very powerful (see Figure 19.36), though there is little or no conscious awareness of the low-level image features that generate the grouping effect. 9 Grouping is based on the complex interaction of proximity, similarities in the brightness, color, shape, and orientation of primitive structures in the image, common motion, and a variety of more complex relationships.
物体识别被认为涉及两个相当不同的步骤。第一步将视野组织成可能与物体和表面相对应的组。这些分组过程非常强大(见图19.36 ),尽管人们很少或根本没有意识到产生分组效果的低级图像特征。9分组基于接近度的复杂相互作用、图像中原始结构的亮度、颜色、形状和方向的相似性、共同运动以及各种更复杂的关系。
Figure 19.36. Images are perceptually organized into groupings based on a complex set of similarity and organizational criteria. (a) Similarity in brightness results in four horizontal groupings. (b) Proximity resulting in three vertical groupings.
图 19.36.根据一组复杂的相似性和组织标准,图像在感知上被组织成组。(a)亮度相似导致四个水平分组。(b)接近度导致三个垂直分组。
The second step in object recognition is to interpret groupings as identified objects. A computational analysis suggests that there are a number of distinctly different ways in which an object can be identified. The perceptual data is unclear as to which of these are actually used in human vision. Object recognition requires that the vision system have available to it descriptions of each class of object sufficient to discriminate each class from all others. Theories of object recognition differ in the nature of the information describing each class and the mechanisms used to match these descriptions to actual views of the world.
物体识别的第二步是将分组解释为已识别的物体。计算分析表明,识别物体的方式有多种截然不同的方式。感知数据并不清楚人类视觉中实际使用了哪些方式。物体识别要求视觉系统拥有每类物体的描述,足以将每类物体与其他所有物体区分开来。物体识别理论在描述每类物体的信息性质以及将这些描述与实际世界观相匹配的机制方面有所不同。
9 The most common form of visual camouflage involves adding visual textures that fool the perceptual grouping processes so that the view of the world cannot be organized in a way that separates out the object being camouflaged.
9最常见的视觉伪装形式是添加视觉纹理,以欺骗感知分组过程,使得世界视图无法以分离出被伪装的物体的方式组织起来。
Three general types of descriptions are possible. Templates represent object classes in terms of prototypical views of objects in each class. Figure 19.37 shows asimpleexample. Structural descriptions represent object classes in terms of distinctive features of each class likely to be easily detected in views of the object, along with information about the geometric relationships between the features. Structural descriptions can either be represented in 2D or 3D. For 2D models of objects types, there must be a separate description for each distinctly different potential view of the object. For 3D models, two distinct forms of matching strategies are possible. In one, the three-dimensional structure of the viewed object is determined prior to classification using whatever spatial cues are available, and then this 3D description of the view is matched to 3D prototypes of known objects. The other possibility is that some mechanism allows the determination of the orientation of the yet-to-be identified object under view. This orientation information is used to rotate and project potential 3D descriptions in a way that allows a 2D matching of the description and the viewed object. Finally, the last option for describing the properties of object classes involves invariant features which describe classes of objects in terms of more generic geometric properties, particularly those that are likely be be insensitive to different views of the object.
可能存在三种一般类型的描述。模板用每个类中对象的原型视图来表示对象类。图 19.37显示了一个简单的例子。结构描述用每个类中可能很容易在对象的视图中检测到的独特特征来表示对象类,以及有关特征之间几何关系的信息。结构描述可以用 2D 或 3D 表示。对于对象类型的 2D 模型,必须对对象的每个截然不同的潜在视图进行单独的描述。对于 3D 模型,可能存在两种不同的匹配策略。在一种情况下,使用任何可用的空间线索在分类之前确定所观察对象的三维结构,然后将该视图的 3D 描述与已知对象的 3D 原型进行匹配。另一种可能性是某种机制允许确定所观察的尚未识别的对象的方向。此方向信息用于旋转和投影潜在的 3D 描述,从而允许对描述和所观察的对象进行 2D 匹配。最后,描述对象类别属性的最后一个选项涉及不变特征,这些特征用更通用的几何属性来描述对象类别,特别是那些可能对对象的不同视图不敏感的属性。
Figure 19.37. Template matching. The bright spot in the right image indicates the best match location to the template in the left image. Image courtesy National Archives and Records Administration.
图 19.37。模板匹配。右图中的亮点表示与左图中模板的最佳匹配位置。图片由美国国家档案和记录管理局提供。
In the absence of more definitive information about depth, objects which project onto a larger area of the retina are seen as closer compared with objects which project to a smaller retinal area, an effect called relative size. A more powerful cue involves familiar size, which can provide information for absolute distance to recognizable objects of known size. The strength of familiar size as a depth cue can be seen in illusions such as Figure 19.38, in which it is put in conflict with ground-plane, perspective-based depth cues. Familiar size is one part of the size-distance relationship, relating the physical size of an object, the optical size of the same object projected onto the retina, and the distance of the object from the eye (Figure 19.39).
在缺乏更明确的深度信息的情况下,投射到视网膜较大区域的物体比投射到视网膜较小区域的物体看起来更近,这种效应称为相对大小。一个更有力的线索涉及熟悉的大小,它可以提供到已知大小的可识别物体的绝对距离信息。熟悉的大小作为深度线索的强度可以在诸如图 19.38 的错觉中看到,其中熟悉的大小与地平面、基于透视的深度线索相冲突。熟悉的大小是大小-距离关系的一部分,它与物体的物理大小、投射到视网膜上的同一物体的光学大小以及物体与眼睛的距离有关(图 19.39 )。
Figure 19.38. Left: perspective and familiar size cues are consistent. Right: perspective and familiar size cues are inconsistent. Images courtesy Peter Shirley, Scott Kuhl, and J. Dylan Lacewell.
图 19.38。左图:透视和熟悉大小提示一致。右图:透视和熟悉大小提示不一致。图片由 Peter Shirley、Scott Kuhl 和 J. Dylan Lacewell 提供。
Figure 19.39. The size-distance relationship allows the distance to objects of known size to be determined based on the visual angle subtended by the object. Likewise, the size of an object at a know distance can be determined based on the visual angle subtended by the object.
图 19.39。大小距离关系允许根据物体所对的视角来确定与已知大小的物体之间的距离。同样,可以根据物体所对的视角来确定已知距离处的物体的大小。
When objects are sitting on top of a flat-ground plane, additional sources for depth information become available, particularly when the horizon is either visible or can be derived from other perspective information. The angle of declination to the contact point on the ground is a relative depth cue and provides absolute egocentric distance when scaled by eye height, as previously shown in Figure 19.27. The horizon ratio, in which the total visible height of an object is compared with the visible extent of that portion of the object appearing below the horizon, can be used to determine the actual size of objects, even when the distance to the objects is not known (Figure 19.40). Underlying the horizon ratio is the fact that for a flat-ground plane, the line of sight to the horizon intersects objects at a position that is exactly an eye height above the ground.
当物体位于平坦的地面上时,就可以获得额外的深度信息源,特别是当地平线可见或者可以从其他透视信息中得出时。 与地面接触点的倾斜角是一个相对深度线索,当按眼睛高度缩放时,可以提供绝对的自我中心距离,如前图 19.27所示。地平线比,即将物体的总可见高度与地平线以下物体部分的可见范围进行比较,可用于确定物体的实际大小,即使不知道物体的距离(图 19.40 )。 地平线比的根本原因是,对于平坦的地面而言,到地平线的视线与物体相交的位置恰好是地面上方眼睛的高度。
Figure 19.40. (a) The horizon ratio can be used to determine depth by comparing the visible portion of an object below the horizon to the total vertical visible extent of the object. (b) A real-world example.
图 19.40. (a)可以通过比较地平线以下物体的可见部分与物体的总垂直可见范围来使用地平线比来确定深度。(b)现实世界的例子。
The human visual system is sufficiently able to determine the absolute size of most viewed objects; our perception of size is dominated by the the actual physical size, and we have almost no conscious awareness of the corresponding retinal size of objects. This is similar to lightness constancy, discussed earlier, in that our perception is dominated by inferred properties of the world, not the low level features actually sensed by photoreceptors in the retina. Gregory (1997) describes a simple example of size constancy. Hold your two hands out in front of you, one at arm’s length and the other at half that distance away from you (Figure 19.41(a)). Your two hands will look almost the same size, even though the retinal sizes differ by a factor of two. The effect is much less strong if the nearer hand partially occludes the more distant hand, particularly if you close one eye (Figure 19.41(b)). The visual system also exhibits shape constancy, where the perception of geometric structure is close to actual object geometry than might be expected given the distortions of the retinal image due to perspective (Figure 19.42).
人类视觉系统足以确定大多数被观察物体的绝对大小;我们对大小的感知主要由实际的物理大小决定,我们几乎没有意识到物体在视网膜上的对应大小。这类似于前面讨论过的亮度恒常性,因为我们的感知主要由推断的世界属性决定,而不是视网膜中的光感受器实际感知到的低级特征。Gregory (1997) 描述了一个简单的例子大小恒常性。将双手伸到身前,一只手与身体保持一臂之长,另一只手与身体保持一半距离(图 19.41(a) )。即使视网膜大小相差两倍,你的两只手看起来大小几乎一样。如果较近的手部分遮挡较远的手,效果会小得多,特别是当你闭上一只眼睛时(图 19.41(b) )。视觉系统还表现出形状恒常性,其中几何结构的感知比由于透视造成的视网膜图像的扭曲所预期的更接近实际物体的几何形状(图 19.42 )。
Figure 19.41. (a) Size constancy makes hands positioned at different distances from the eye appear to be nearly the same size for real-world viewing, even though the retinal sizes are quite different. (b) The effect is less strong when one hand is partially occluded by the other, particularly when one eye is closed. Images courtesy Peter Shirley and Pat Moulis.
图 19.41。 (a)尺寸恒常性使得距离眼睛不同距离的手在现实世界中看起来几乎相同大小,即使视网膜尺寸完全不同。(b)当一只手被另一只手部分遮挡时,效果会减弱,尤其是当一只眼睛闭上时。图片由 Peter Shirley 和 Pat Moulis 提供。
Figure 19.42. Shape constancy—the table looks rectangular even though its shape in the image is an irregular four-sided polygon.
图 19.42。形状恒常性——尽管桌子在图像中的形状是不规则的四边形,但它看起来是矩形的。
Most aspects of event perception are beyond the scope of this chapter, since they involve complex nonvisual cognitive processes. Three types of event perception are primarily visual, however, and are also of clear relevance to computer graphics. Vision is capable of providing information about how a person is moving in the world, the existence of independently moving objects in the world, and the potential for collisions either due to observer motion or due to objects moving toward the observer.
事件感知的大多数方面超出了本章的范围,因为它们涉及复杂的非视觉认知过程。然而,三种类型的事件感知主要是视觉的,并且也与计算机图形学有明显的相关性。视觉能够提供有关一个人如何在世界中移动、世界中独立移动的物体的存在以及由于观察者运动或由于物体向观察者移动而发生碰撞的可能性的信息。
Vision can be used to determine rotation and the direction of translation relative to the environment. The simplest case involves movement toward a flat surface oriented perpendicularly to the line of sight. Presuming that there is sufficient surface texture to enable the recovery of optic flow, the flow field will form a symmetric pattern as shown in Figure 19.43(a). The location in the field of view of the focus of expansion of the flow field will have an associated line of sight corresponding to the direction of translation. While optic flow can be used to visually determine the direction of motion, it does not contain enough information to determine speed. To see this, consider the situation in which the world is made twice as large and the viewer moves twice as fast. The decrease in the magnitude of flow values due to the doubling of distances is exactly compensated for by the increase in the magnitude of flow values due to the doubling of velocity, resulting in an identical flow field.
视觉可用于确定相对于环境的旋转和平移方向。最简单的情况是朝与视线垂直的平面移动。假设有足够的表面纹理来恢复光流,流场将形成如图 19.43(a)所示的对称图案。视野中的位置流场扩展焦点将具有与平移方向相对应的相关视线。虽然光流可用于直观地确定运动方向,但它不包含足够的信息来确定速度。为了理解这一点,请考虑将世界放大一倍并且观察者移动速度提高一倍的情况。由于距离加倍而导致的流值幅度的降低恰好被由于速度加倍而导致的流值幅度的增加所补偿,从而产生相同的流场。
Figure 19.43. (a) Movement toward a flat, textured surface produces an expanding flow field, with the focus of expansion indicating the line of sight corresponding to the direction of motion. (b) The flow field resulting from rotation around the vertical axis while viewing a flat surface oriented perpendicularly to the line of sight. (c) The flow field resulting from translation parallel to a flat, textured surface.
图 19.43。 (a) 向平坦、有纹理的表面移动会产生一个扩展的流场,扩展的焦点指示与运动方向相对应的视线。(b) 观察与视线垂直的平坦表面时绕垂直轴旋转产生的流场。(c) 平行于平坦、有纹理的表面平移产生的流场。
Figure 19.43(b) shows the optic flow field resulting from the viewer (or more accurately, the viewer’s eyes) rotating around the vertical axis. Unlike the situation with respect to translational motion, optic flow provides sufficient information to determine both the axis of rotation and the (angular) speed of rotation. The practical problem in exploiting this is that the flow resulting from pure rotational motion around an axis perpendicular to the line of sight is quite similar to the flow resulting from pure translation in the direction that is perpendicular to both the line of sight and this rotational axis, making it difficult to visually discriminate between the two very different types of motion (Figure 19.43(c)). Figure 19.44 shows the optical flow patterns generated by movement through a more realistic environment.
图 19.43(b)显示了观看者(或者更准确地说,观看者的眼睛)绕垂直轴旋转产生的光流场。与平移运动的情况不同,光流提供了足够的信息来确定旋转轴和旋转(角)速度。利用这一点的实际问题是,绕垂直于视线的轴的纯旋转运动产生的流与垂直于视线和该旋转轴的方向的纯平移产生的流非常相似,因此很难在视觉上区分这两种非常不同的运动类型(图 19.43(c) )。图 19.44显示了在更现实的环境中运动产生的光流模式。
Figure 19.44. The optic flow generated by moving through an otherwise static environment provides information about both the motion relative to the environment and the distances to points in the environment. In this case, the direction of view is depressed from the horizon, but as indicated by the focus of expansion, the motion is parallel to the ground plane.
图 19.44。在静态环境中移动时产生的光流提供了有关相对于环境的运动以及到环境中各点的距离的信息。在这种情况下,视线方向从地平线向下,但如扩展焦点所示,运动与地面平行。
If a viewer is completely stationary, visual detection of moving objects is easy, since such objects will be associated with the only nonzero optic flow in the field of view. The situation is considerably more complicated when the observer is moving, since the visual field will be dominated by nonzero flow, most or all of which is due to relative motion between the observer and the static environment (Thompson & Pong, 1990). In such cases, the visual system must be sensitive to patterns in the optic flow field that are inconsistent with flow fields associated with observer movement relative to a static environment (Figure 19.45).
如果观察者完全静止,则视觉检测移动物体很容易,因为此类物体将与视野中唯一的非零光流相关联。当观察者移动时,情况要复杂得多,因为视野将由非零流主导,其中大部分或全部是由于观察者与静态环境之间的相对运动造成的(Thompson & Pong,1990)。在这种情况下,视觉系统必须对光流场中的模式敏感,这些模式与观察者相对于静态环境的运动所关联的流场不一致(图 19.45 )。
Figure 19.45. Visual detection of moving objects from a moving observation point requires recognizing patterns in the optic flow that cannot be associated with motion through a static environment.
图 19.45.从移动观察点对移动物体进行视觉检测需要识别光流中的模式,而这些模式在静态环境中无法与运动相关联。
Section 19.3.4 described how vision can be used to determine time to contact with a point in the environment even when the speed of motion is not known. Assuming a viewer moving with a straight, constant-speed trajectory and no independently moving objects in the world, contact will be made with whatever surface is in the direction of the line of sight corresponding to the focus of expansion at a time indicated by the τ relationship. An independently moving object complicates the matter of determining if a collision will in fact occur. Sailors use a method for detecting potential collisions that may also be employed in the human visual system: for non-accelerating straight-line motion, collisions will occur with objects that are visually expanding but otherwise remain visually stationary in the egocentric frame of reference.
第 19.3.4 节描述了如何使用视觉来确定接触环境中某个点的时间,即使不知道运动速度。假设观察者以直线、恒定速度的轨迹移动,并且世界上没有独立移动的物体,则在τ关系指示的时间,将与与扩展焦点相对应的视线方向上的任何表面接触。独立移动的物体使确定是否确实会发生碰撞变得复杂。水手使用一种检测潜在碰撞的方法,该方法也可以在人类视觉系统中使用:对于非加速直线运动,碰撞将与视觉上正在扩展但在自我中心参考系中保持视觉静止的物体发生。
One form of more complex event perception merits discussion here, since it is so important in interactive computer graphics. People are particularly sensitive to motion corresponding to human movement. Locomotion can be recognized when the only features visible are lights on the walker’s joints (Johansson, 1973). Such moving light displays are often even sufficient to recognize properties such as the sex of the walker and the weight of the load that the walker may be carrying. In computer graphics renderings, viewers will notice even small inaccuracies in animated characters, particularly if they are intended to mimic human motion.
这里值得讨论一种更复杂的事件感知形式,因为它在交互式计算机图形学中非常重要。人们对与人体运动相对应的运动特别敏感。当唯一可见的特征是步行者关节上的灯光时,就可以识别运动(Johansson,1973 年)。这种移动灯光显示通常甚至足以识别步行者的性别和步行者可能携带的负载重量等属性。在计算机图形渲染中,观众会注意到动画角色中哪怕是很小的不准确之处,尤其是当它们旨在模仿人类运动时。
The term visual attention covers a range of phenomenon from where we point our eyes to cognitive effects involving what we notice in a complex scene and how we interpret what we notice (Pashler, 1998). Figure 19.46 provides an example of how attentional processes affect vision, even for very simple images. In the left two panels, the one pattern differing in shape or color from the rest immediately “pops out” and is easily noticed. In the panel on the right, the one pattern differing in both shape and color is harder to find. The reason for this is that the visual system can do a parallel search for items distinguished by individual properties, but requires more cognitive, sequential search when looking for items that are indicated by the simultaneous presence of two distinguishing features. Graphically based human-computer interfaces should be (but often are not!) designed with an understanding of how to take advantage of visual attention processes in people so as to communicate important information quickly and effectively.
期限视觉注意力涵盖一系列现象,从我们将目光投向何处到认知效应,包括我们在复杂场景中注意到什么以及我们如何解释我们注意到的东西(Pashler,1998)。图 19.46提供了一个注意过程如何影响视觉的例子,即使对于非常简单的图像也是如此。在左侧两个面板中,形状或颜色与其他图案不同的一种图案立即“跳出”并很容易被注意到。在右侧面板中,形状和颜色都不同的一种图案更难找到。原因是视觉系统可以并行搜索由单个属性区分的项目,但在寻找由同时存在两个区别特征指示的项目时,需要更多的认知、顺序搜索。基于图形的人机界面应该(但通常不是!)在设计时了解如何利用人们的视觉注意力过程,以便快速有效地传达重要信息。
Figure 19.46. In (a) and (b), visual attention is quickly drawn to the item of different shape or color. In (c), sequential search appears to be necessary in order to find the one item that differs in both shape and color.
图 19.46。在 (a) 和 (b) 中,视觉注意力很快被不同形状或颜色的物品吸引。在 (c) 中,顺序搜索似乎是必要的,以便找到形状和颜色都不同的物品。
So far, this chapter has dealt with the visual perception that occurs when the world is directly imaged by the human eye. When we view the results of computer graphics, of course, we are looking at rendered images and not the real world. This has important perceptual implications. In principle, it should be possible to generate computer graphics that appear indistinguishable from the real world, at least for monocular viewing without either object or observer motion. Imagine looking out at the world through a glass window. Now, consider coloring each point on the window to exactly match the color of the world originally seen at that point. 10 The light reaching the eye is unchanged by this operation, meaning that perception should be the same whether the painted glass is viewed or the real world is viewed through the window. The goal of computer graphics can be thought of as producing the colored window without actually having the equivalent real-world view available.
到目前为止,本章讨论了当世界直接被人眼成像时产生的视觉感知。当然,当我们查看计算机图形的结果时,我们看到的是渲染的图像,而不是现实世界。这具有重要的感知意义。原则上,应该可以生成看起来与现实世界难以区分的计算机图形,至少对于没有物体或观察者运动的单眼观看而言。想象一下透过玻璃窗看世界。现在,考虑将窗户上的每个点着色,使其与最初在该点看到的世界的颜色完全匹配。10 到达眼睛的光线不会因此操作而改变,这意味着无论是观看彩绘玻璃还是通过窗户观看现实世界,感知都应该相同。计算机图形学的目标可以被认为是在没有实际可用的等效现实世界视图的情况下生成彩色窗口。
The problem for computer graphics and other visual arts is that we can’t in practice match a view of the real world by coloring a flat surface. The brightness and dynamic range of light in the real world is impossible to re-create using any current display technology. Resolution of rendered images is also often less that the finest detail perceivable by human vision. Lightness and color constancy are much less apparent in pictures than in the real world, likely because the visual system attempts to compensate for variability in the brightness and color of the illumination based on the ambient illumination in the viewing environment, rather than the illumination associated with the rendered image. This is why the realistic appearance of color in photographs depends on film color balanced for the nature of the light source present when the photograph was taken and why realistic color in video requires a white-balancing step. While much is known about how limitations in resolution, brightness, and dynamic range affect the detectability of simple patterns, almost nothing is known about how these display properties affect spatial vision or object identification.
计算机图形学和其他视觉艺术面临的问题是,我们无法通过给平面着色来实际匹配现实世界的视图。使用任何当前的显示技术都无法重现现实世界中光的亮度和动态范围。渲染图像的分辨率通常也低于人类视觉可以感知的最精细细节。亮度和色彩恒常性在图片中比在现实世界中要不明显得多,这可能是因为视觉系统试图根据观看环境中的环境照明而不是与渲染图像相关的照明来补偿照明亮度和颜色的变化。这就是为什么照片中色彩的真实外观取决于胶片色彩平衡,以平衡拍摄照片时存在的光源的性质,以及为什么视频中的真实色彩需要白平衡步骤。虽然人们对分辨率、亮度和动态范围的限制如何影响简单图案的可检测性了解很多,但几乎没有人知道这些显示属性如何影响空间视觉或物体识别。
10 This idea was first described by the painter Leon Battista Alberti in 1435 and is now known as Alberti’s Window. It is closely related to the camera obscura.
10这个想法最早由画家莱昂·巴蒂斯塔·阿尔伯蒂于 1435 年提出,现在被称为“阿尔伯蒂之窗” 。它与暗箱密切相关。
We have a better understanding of other aspects of this problem, which psychologists refer to as the perception of pictorial space (S. Rogers, 1995). One difference between viewing images and viewing the real world is that accommodation, binocular stereo, motion parallax, and perhaps other depth cues may indicate that the surface under view is much different from the distances in the world that it is intended to represent. The depths that are seen in such a situation tend to be somewhere between the depths indicated by the pictorial cues in the image and the distance to the image itself. When looking at a photograph or computer display, this often results in a sense of scale smaller than intended. On the other hand, seeing a movie in a big-screen theater produces a more compelling sense of spaciousness than does seeing the same movie on television, even if the distance to the TV is such that the visual angles are the same, since the movie screen is farther away.
我们对这个问题的其他方面有了更好的理解,心理学家将其称为对图像空间的感知(S. Rogers,1995)。观看图像和观看现实世界之间的一个区别是,调节、双目立体视觉、运动视差以及其他深度线索可能表明,所看到的表面与它想要表示的世界距离有很大不同。在这种情况下看到的深度往往介于图像中图像线索所指示的深度和到图像本身的距离之间。当看照片或计算机显示器时,这通常会导致比预期更小的尺度感。另一方面,在大屏幕影院观看电影比在电视上观看同一部电影更能产生令人信服的空间感,即使到电视的距离使得视角相同,因为电影屏幕更远。
Computer graphics rendered using perspective projection has a viewpoint, specified as a position and direction in model space, and a view frustum, which specifies the horizontal and vertical field of view and several other aspects of the viewing transform. If the rendered image is not viewed from the correct location, the visual angles to the borders of the image will not match the frustum used in creating the image. All visual angles within the image will be distorted as well, causing a distortion in all of the pictorial depth and orientation cues based on linear perspective. This effect occurs frequently in practice, when a viewer is positioned either too close or too far away from a photograph or display surface. If the viewer is too close, the perspective cues for depth will be compressed, and the cues for surface slant will indicate that the surface is closer to perpendicular to the line of sight than is actually the case. The situation is reversed if the viewer is too far from the photograph or screen. The situation is even more complicated if the line of sight does not go through the center of the viewing area, as is commonly the case in a wide variety of viewing situations.
使用透视投影渲染的计算机图形具有视点(指定为模型空间中的位置和方向)和视锥体(指定水平和垂直视野以及查看变换的其他几个方面)。如果未从正确的位置查看渲染的图像,则图像边界的视角将与创建图像时使用的视锥体不匹配。图像内的所有视角也将被扭曲,导致基于线性透视的所有图像深度和方向提示发生扭曲。这种效果在实践中经常发生,当观看者距离照片或显示表面太近或太远时。如果观看者太近,深度的透视提示将被压缩,并且表面倾斜的提示将指示表面比实际情况更接近垂直于视线。如果观看者离照片或屏幕太远,情况就会相反。如果视线不穿过观看区域的中心,情况会更加复杂,这在各种观看情况下都很常见。
The human visual system is able to partially compensate for perspective distortions arising from viewing an image at the wrong location, which is why we are able to sit in different seats at a movie theater and experience a similar sense of the depicted space. When controlling viewing position is particularly important, viewing tubes can be used. These are appropriately sized tubes, mounted in a fixed position relative to the display, and through which the viewer sees the display. The viewing tube constrains the observation point to the (hopefully) correct position. Viewing tubes are also quite effective at reducing the conflict in depth information between the pictorial cues in the image and the actual display surface. They eliminate both stereo and motion parallax, which, if present, would correspond to the display surface, not the rendered view. If they are small enough in diameter, they also reduce other cues to the location of the display surface by hiding the picture frame or edge of the display device. Exotic visually immersive display devices such as head-mounted displays (HMDs) go further in attempting to hide visual cues to the position of the display surface while adding binocular stereo and motion parallax consistent with the geometry of the world being rendered.
人类视觉系统能够部分补偿因在错误位置观看图像而产生的透视失真,这就是为什么我们坐在电影院的不同座位上都能体验到相似的空间感。当控制观看位置特别重要时,可以使用观察管。观察管是尺寸合适的管子,安装在相对于显示器的固定位置,观看者可以通过它观看显示器。观察管将观察点限制在(希望)正确的位置。观察管在减少图像中的图像提示和实际显示表面之间的深度信息冲突方面也非常有效。它们消除了立体和运动视差,如果存在,这些视差将对应于显示表面,而不是渲染视图。如果它们的直径足够小,它们还会通过隐藏显示设备的画框或边缘来减少显示表面位置的其他提示。头戴式显示器 (HMD) 等奇特的视觉沉浸式显示设备进一步尝试将视觉提示隐藏到显示表面的位置,同时添加与正在渲染的世界的几何形状一致的双目立体和运动视差。
Erik Reinhard
As discussed in Chapter 19, the human visual system adapts to a wide range of viewing conditions. Under normal viewing, we may discern a range of around 4 to 5 log units of illumination, i.e., the ratio between brightest and darkest areas where we can see detail may be as large as 100,000 : 1. Through adaptation processes, we may adapt to an even larger range of illumination. We call images that are matched to the capabilities of the human visual system high dynamic range.
如第 19 章所述,人类视觉系统可适应各种观察条件。在正常观察下,我们可以辨别出大约 4 到 5 个对数单位的照明范围,即我们可以看到细节的最亮和最暗区域之间的比率可能高达 100000:1。通过适应过程,我们可以适应更大的照明范围。我们将与人类视觉系统能力相匹配的图像称为高动态范围。
Visual simulations routinely produce images with a high dynamic range (Ward Larson & Shakespeare, 1998). Recent developments in image-capturing techniques allow multiple exposures to be aligned and recombined into a single high dynamic range image (Debevec & Malik, 1997). Multiple exposure techniques are also available for video. In addition, we expect future hardware to be able to photograph or film high dynamic range scenes directly. In general, we may think of each pixel as a triplet of three floating point numbers.
视觉模拟通常会产生具有高动态范围的图像(Ward Larson 和 Shakespeare,1998)。图像捕捉技术的最新发展允许将多个曝光对齐并重新组合成单个高动态范围图像(Debevec 和 Malik,1997)。多重曝光技术也可用于视频。此外,我们期望未来的硬件能够直接拍摄或录制高动态范围场景。一般来说,我们可以将每个像素视为三个浮点数的三元组。
As it is becoming easier to create high dynamic range imagery, the need to display such data is rapidly increasing. Unfortunately, most current display devices, monitors and printers, are only capable of displaying around 2 log units of dynamic range. We consider such devices to be of low dynamic range. Most images in existence today are represented with a byte-per-pixel-per-color channel, which is matched to current display devices, rather than to the scenes they represent.
随着创建高动态范围图像变得越来越容易,显示此类数据的需求也在迅速增加。不幸的是,大多数当前的显示设备、显示器和打印机只能显示大约 2 个对数单位的动态范围。我们认为此类设备的动态范围较低。当今存在的大多数图像都以字节/像素/颜色通道表示,这与当前的显示设备相匹配,而不是与它们所表示的场景相匹配。
Typically, low dynamic range images are not able to represent scenes without loss of information. A common example is an indoor room with an out door area visible through the window. Humans are easily able to see details of both the indoor part and the outside part. A conventional photograph typically does not capture this full range of information—the photographer has to choose whether the indoor or the outdoor part of the scene is properly exposed (see Figure 20.1). These decisions may be avoided by using high dynamic range imaging and preparing these images for display using techniques described in this chapter (see Figure 20.2).
通常,低动态范围图像无法在不丢失信息的情况下呈现场景。一个常见的例子是室内房间,透过窗户可以看到室外区域。人类很容易看到室内和室外的细节。传统照片通常无法捕捉到这种全方位的信息——摄影师必须选择场景的室内或室外部分是否正确曝光(见图 20.1 )。通过使用高动态范围成像并使用本章中描述的技术准备这些图像以供显示,可以避免这些决定(见图 20.2 )。
Figure 20.1. With conventional photography, some parts of the scene may be under- or over-exposed. To visualize the snooker table, the view through the window is burned out in the left image. On the other hand, the snooker table will be too dark if the outdoor part of this scene is properly exposed. Compare with Figure 20.2, which shows a high dynamic range image prepared for display using a tone reproduction algorithm.
图 20.1。使用传统摄影,场景的某些部分可能会曝光不足或过度。为了使斯诺克台球桌清晰可见,左侧图像中透过窗户看到的景色被过度曝光。另一方面,如果此场景的室外部分曝光适当,斯诺克台球桌将太暗。与图 20.2相比,该图显示了使用色调再现算法准备显示的高动态范围图像。
Figure 20.2. A high dynamic range image tonemapped for display using a recent tone reproduction operator (Reinhard & Devlin, 2005). In this image, both the indoor part and the view through the window are properly exposed.
图 20.2。使用最近的色调再现运算符 (Reinhard & Devlin, 2005) 进行色调映射以供显示的高动态范围图像。在此图像中,室内部分和透过窗户看到的景色都得到了正确的曝光。
There are two strategies available to display high dynamic range images. First, we may develop display devices which can directly accommodate a high dynamic range (Seetzen, Whitehead, & Ward, 2003; Seetzen et al., 2004). Second, we may prepare high dynamic range images for display on low dynamic range display devices (Upstill, 1985). This is currently the more common approach and the topic of this chapter. Although we foresee that high dynamic range display devices will become widely used in the (near) future, the need to compress the dynamic range of an image may diminish, but will not disappear. In particular, printed media such as this book are, by their very nature, low dynamic range.
有两种策略可用于显示高动态范围图像。首先,我们可以开发能够直接适应高动态范围的显示设备(Seetzen、Whitehead 和 Ward,2003;Seetzen 等,2004)。其次,我们可以准备高动态范围图像以在低动态范围显示设备上显示(Upstill,1985)。这是目前更常见的方法,也是本章的主题。虽然我们预见到高动态范围显示设备将在(不久的)将来得到广泛应用,但压缩图像动态范围的需求可能会减少,但不会消失。特别是,像本书这样的印刷媒体本质上就是低动态范围的。
Compressing the range of values of an image for the purpose of display on a low dynamic range display device is called tonemapping or tone reproduction. A simple compression function would be to normalize an image (see Figure 20.3 (left)). This constitutes a linear scaling which tends to be sufficient only if the dynamic range of the image is only marginally higher than the dynamic range of the display device. For images with a higher dynamic range, small intensity differences will be quantized to the same display value such that visible details are lost. In Figure 20.3 (middle) all pixel values larger than a user-specified maximum are set to this maximum (i.e., they are clamped). This makes the normalization less dependent on noisy outliers, but here we lose information in the bright areas of the image. For comparison, Figure 20.3 (right) is a tonemapped version showing detail in both the dark and the bright regions.
为了在低动态范围显示设备上显示而压缩图像的值范围称为色调映射或色调再现。一个简单的压缩函数是对图像进行标准化(见图20.3 (左))。这构成了一个线性缩放,只有当图像的动态范围仅略高于显示设备的动态范围时,这种缩放才足够。对于动态范围较高的图像,较小的强度差异将被量化为相同的显示值,从而导致可见细节丢失。在图 20.3 (中)中,所有大于用户指定最大值的像素值都设置为此最大值(即,它们被限制)。这使得标准化不太依赖于嘈杂的异常值,但在这里我们会丢失图像明亮区域的信息。为了进行比较,图 20.3 (右)是一个色调映射版本,显示了暗区和亮区的细节。
Figure 20.3. Linear scaling of high dynamic range images to fit a given display device may cause significant detail to be lost (left and middle). The left image is linearly scaled. In the middle image high values are clamped. For comparison, the right image is tonemapped, allowing details in both bright and dark regions to be visible.
图 20.3。为适应给定的显示设备而对高动态范围图像进行线性缩放可能会导致大量细节丢失(左图和中图)。左图是线性缩放的。中图的高值被限制。为了进行比较,右图是色调映射的,因此明亮和黑暗区域的细节都清晰可见。
In general, linear scaling will not be appropriate for tone reproduction. The key issue in tone reproduction is then to compress an image while at the same time preserving one or more attributes of the image. Different tone reproduction algorithms focus on different attributes such as contrast, visible detail, brightness, or appearance.
一般来说,线性缩放不适合色调再现。色调再现的关键问题是压缩图像,同时保留图像的一个或多个属性。不同的色调再现算法侧重于不同的属性,例如对比度、可见细节、亮度或外观。
Ideally, displaying a tonemapped image on a low dynamic range display device would create the same visual response in the observer as the original scene. Given the limitations of display devices, this will not be achievable, although we could aim for approximating this goal as closely as possible.
理想情况下,在低动态范围显示设备上显示色调映射图像将在观察者中产生与原始场景相同的视觉响应。考虑到显示设备的局限性,这无法实现,尽管我们可以尽可能接近这一目标。
As an example, we created the high dynamic range image shown in Figure 20.4. This image was then tonemapped and displayed on a display device. The display device itself was then placed in the scene such that it displays its own background (Figure 20.5). In the ideal case, the display should appear transparent. Dependent on the quality of the tone reproduction operator, as well as the nature of the scene being depicted, this goal may be more or less achievable.
作为示例,我们创建了如图 20.4所示的高动态范围图像。然后对该图像进行色调映射,并将其显示在显示设备上。然后将显示设备本身放置在场景中,使其显示自己的背景(图 20.5 )。在理想情况下,显示应看起来是透明的。根据色调再现运算符的质量以及所描绘场景的性质,此目标可能或多或少可以实现。
Figure 20.4. Image used for demonstrating the goal of tone reproduction in Figure 20.5.
图 20.4.图 20.5中用于演示色调再现目标的图像。
Figure 20.5. After tonemapping the image in Figure 20.4 and displaying it on a monitor, the monitor is placed in the scene approximately at the location where the image was taken. Dependent on the quality of the tone reproduction operator, the result should appear as if the monitor is transparent.
图 20.5。对图 20.4中的图像进行色调映射并将其显示在显示器上后,将显示器放置在场景中拍摄图像的大致位置。根据色调再现运算符的质量,结果应该看起来就像显示器是透明的一样。
Although it would be possible to classify tone reproduction operators by which attribute they aim to preserve, or for which task they were developed, we classify algorithms according to their general technique. This will enable us to show the differences and similarities between a significant number of different operators, and so, hopefully, contribute to the meaningful selection of specific operators for given tone reproduction tasks.
虽然可以根据音调再现算子所要保留的属性或为哪项任务而开发来对它们进行分类,但我们根据算法的一般技术对它们进行分类。这将使我们能够展示大量不同算子之间的差异和相似之处,因此,希望有助于为给定的音调再现任务有意义地选择特定算子。
The main classification scheme we follow hinges upon the realization that tone reproduction operators are based on insights gained from various disciplines. In particular, several operators are based on knowledge of human visual perception.
我们遵循的主要分类方案取决于这样的认识:色调再现算子基于从各个学科获得的见解。具体而言,一些算子基于人类视觉感知的知识。
The human visual system detects light using photoreceptors located in the retina. Light is converted to an electrical signal which is partially processed in the retina and then transmitted to the brain. Except for the first few layers of cells in the retina, the signal derived from detected light is transmitted using impulse trains. The information-carrying quantity is the frequency with which these electrical pulses occur.
人类视觉系统利用视网膜中的光感受器检测光线。光线被转换成电信号,该电信号在视网膜中经过部分处理,然后传输到大脑。除了视网膜中前几层细胞外,从检测到的光线产生的信号都是通过脉冲序列传输的。信息承载量是这些电脉冲发生的频率。
The range of light that the human visual system can detect is much larger than the range of frequencies employed by the human brain to transmit information. Thus, the human visual system effortlessly solves the tone reproduction problem—a large range of luminances is transformed into a small range of frequencies of impulse trains. Emulating relevant aspects of the human visual system is therefore a worthwhile approach to tone reproduction; this approach is explained in more detail in Section 20.7.
人类视觉系统能够检测到的光的范围比人脑传递信息所使用的频率范围大得多。因此,人类视觉系统毫不费力地解决了色调再现问题——将大范围的亮度转换为小范围的脉冲序列频率。因此,模拟人类视觉系统的相关方面是色调再现的一种有价值的方法;第 20.7 节将更详细地解释这种方法。
A second class of operators is grounded in physics. Light interacts with surfaces and volumes before being absorbed by the photoreceptors. In computer graphics, light interaction is generally modeled by the rendering equation. For purely diffuse surfaces, this equation may be simplified to the product between light incident upon a surface (illuminance), and this surface’s ability to reflect light (reflectance) (Oppenheim, Schafer, & Stockham, 1968).
第二类算子基于物理学。光在被光感受器吸收之前会与表面和体积相互作用。在计算机图形学中,光相互作用通常由渲染方程建模。对于纯漫反射表面,该方程可以简化为入射到表面上的光(照度)与该表面反射光的能力(反射率)之间的乘积(Oppenheim、Schafer 和 Stockham,1968 年)。
Since reflectance is a passive property of surfaces, for diffuse surfaces it is, by definition, low dynamic range—typically between 0.005 and 1 (Stockham, 1972). The reflectance of a surface cannot be larger than 1, since then it would reflect more light than was incident upon the surface. Illuminance, on the other hand, can produce arbitrarily large values and is limited only by the intensity and proximity of the light sources.
由于反射率是表面的被动属性,因此对于漫反射表面,根据定义,其动态范围较低 - 通常介于 0005 和 1 之间(Stockham,1972 年)。表面的反射率不能大于 1,因为这样它反射的光会比入射到表面上的光多。另一方面,照度可以产生任意大的值,并且仅受光源的强度和接近度的限制。
The dynamic range of an image is thus predominantly governed by the illuminance component. In the face of diffuse scenes, a viable approach to tone reproduction may therefore be to separate reflectance from illuminance, compress the illuminance component, and then recombine the image.
因此,图像的动态范围主要由照度分量决定。面对漫射场景,色调再现的可行方法可能是将反射与照度分离,压缩照度分量,然后重新组合图像。
However, the assumption that all surfaces in a scene are diffuse is generally incorrect. Many high dynamic range images depict highlights and/or directly visible light sources (Figure 20.3). The luminance reflected by a specular surface may be almost as high as the light source it reflects.
然而,场景中所有表面都是漫反射的假设通常是不正确的。许多高动态范围图像描绘了高光和/或直接可见的光源(图 20.3 )。镜面反射的亮度可能几乎与其反射的光源一样高。
Various tone reproduction operators currently used split the image into a high dynamic range base layer and a low dynamic range detail layer. These layers would represent illuminance and reflectance if the depicted scene were entirely diffuse. For scenes containing directly visible light sources or specular highlights, separation into base and detail layers still allows the design of effective tone reproduction operators, although no direct meaning can be attached to the separate layers. Such operators are discussed in Section 20.5.
目前使用的各种色调再现算子将图像分为高动态范围基础层和低动态范围细节层。如果所描绘的场景完全是漫反射的,这些层将表示照度和反射率。对于包含直接可见光源或镜面高光的场景,分离为基础层和细节层仍然可以设计有效的色调再现算子,尽管单独的层没有直接的含义。此类算子将在第 20.5 节中讨论。
Conventional images are stored with one byte per pixel for each of the red, green and blue components. The dynamic range afforded by such an encoding depends on the ratio between smallest and largest representable value, as well as the step size between successive values. Thus, for low dynamic range images, there are only 256 different values per color channel.
传统图像以每个像素一个字节来存储红、绿、蓝分量。这种编码提供的动态范围取决于最小和最大可表示值之间的比率,以及连续值之间的步长。因此,对于低动态范围图像,每个颜色通道只有 256 个不同的值。
High dynamic range images encode a significantly larger set of possible values; the maximum representable value may be much larger and the step size between successive values may be much smaller. The file size of high dynamic range images is therefore generally larger as well, although at least one standard (the OpenEXR high dynamic range file format (Kainz, Bogart, & Hess, 2003)) includes a very capable compression scheme.
高动态范围图像编码了一组明显更大的可能值;最大可表示值可能大得多,而连续值之间的步长可能小得多。因此,高动态范围图像的文件大小通常也更大,尽管至少有一个标准(OpenEXR 高动态范围文件格式 (Kainz、Bogart 和 Hess,2003))包含非常强大的压缩方案。
Figure 20.6. Dynamic range of 2.65 log2 units.
图 20.6。动态范围为 2.65 log 2单位。
A different approach to limit file sizes is to apply a tone reproduction operator to the high dynamic data. The result may then be encoded in JPEG format. In addition, the input image may be divided pixel-wise by the tonemapped image.
限制文件大小的另一种方法是将色调再现运算符应用于高动态数据。然后可以将结果编码为 JPEG 格式。此外,输入图像可以按像素划分为色调映射图像。
Figure 20.7. Dynamic range of 3.96 log2 units.
图 20.7。动态范围为 3.96 log 2单位。
The result of this division can then be subsampled and stored as a small amount of data in the header of the same JPEG image (G. Ward & Simmons, 2004). The file size of such sub-band encoded images is of the same order as conventional JPEG encoded images. Display programs can display the JPEG image directly or may reconstruct the high dynamic range image by multiplying the tonemapped image with the data stored in the header.
然后可以对该除法的结果进行二次采样,并将其作为少量数据存储在同一 JPEG 图像的标头中(G. Ward & Simmons,2004)。此类子带编码图像的文件大小与传统 JPEG 编码图像的大小相同。显示程序可以直接显示 JPEG 图像,也可以通过将色调映射图像与标头中存储的数据相乘来重建高动态范围图像。
Figure 20.8. Dynamic range of 4.22 log2 units.
图 20.8。动态范围为 4.22 log 2单位。
In general, the combination of smallest step size and ratio of the smallest and largest representable values determines the dynamic range that an image encoding scheme affords. For computer-generated imagery, an image is typically stored as a triplet of floating point values before it is written to file or displayed on screen, although more efficient encoding schemes are possible (Reinhard, Ward, Debevec, & Pattanaik, 2005). Since most display devices are still fitted with eight-bit D/A converters, we may think of tone reproduction as the mapping of floating point numbers to bytes such that the result is displayable on a low dynamic range display device.
一般来说,最小步长和可表示的最小值与最大值之比的组合决定了图像编码方案所能提供的动态范围。对于计算机生成的图像,图像在写入文件或显示在屏幕上之前通常以三元组浮点值的形式存储,尽管可能存在更高效的编码方案(Reinhard、Ward、Debevec 和 Pattanaik,2005 年)。由于大多数显示设备仍配备 8 位 D/A 转换器,我们可以将色调再现视为浮点数到字节的映射,以便结果可以在低动态范围显示设备上显示。
Figure 20.9. Dynamic range of 5.01 log2 units.
图 20.9。动态范围为 5.01 log 2单位。
The dynamic range of individual images is generally smaller, and is determined by the smallest and largest luminances found in the scene. A simplistic approach to measure the dynamic range of an image may therefore compute the ratio between the largest and smallest pixel value of an image. Sensitivity to outliers may be reduced by ignoring a small percentage of the darkest and brightest pixels.
单个图像的动态范围通常较小,由场景中发现的最小和最大亮度决定。因此,测量图像动态范围的一种简单方法是计算图像最大和最小像素值之间的比率。通过忽略一小部分最暗和最亮的像素,可以降低对异常值的敏感度。
Figure 20.10. Dynamic range of 6.56 log2 units.
图 20.10。动态范围为 6.56 log 2单位。
Alternatively, the same ratio may be expressed as a difference in the logarithmic domain. This measure is less sensitive to outliers. The images shown in the margin on this page are examples of images with different dynamic ranges. Note that the night scene in this case does not have a smaller dynamic range than the day scene. While all the values in the night scene are smaller, the ratio between largest and smallest values is not.
或者,相同的比率可以表示为对数域中的差异。此度量对异常值不太敏感。本页边距中显示的图像是具有不同动态范围的图像示例。请注意,在这种情况下,夜景的动态范围并不小于日景。虽然夜景中的所有值都较小,但最大值和最小值之间的比率却不是。
However, the recording device or rendering algorithm may introduce noise which will lower the useful dynamic range. Thus, a measurement of the dynamic range of an image should factor in noise. A better measure of dynamic range would therefore be a signal-to-noise ratio, expressed in decibels, as used in signal processing.
然而,录制设备或渲染算法可能会引入噪声,从而降低有用的动态范围。因此,测量图像的动态范围时应考虑噪声。因此,动态范围的更好测量方法是信噪比,以分贝表示,用于信号处理。
Tone reproduction operators normally compress luminance values, rather than work directly on the red, green, and blue components of a color image. After these luminance values have been compressed into display values Ld(x, y), a color image may be reconstructed by keeping the ratios between color channels the same as they were before compression (using s = 1) (Schlick, 1994b):
色调再现算子通常会压缩亮度值,而不是直接作用于彩色图像的红色、绿色和蓝色分量。将这些亮度值压缩为显示值L d ( x, y ) 后,可以通过保持颜色通道之间的比率与压缩前相同(使用s = 1)来重建彩色图像(Schlick,1994b):
The results frequently appear over-saturated, because human color perception is nonlinear with respect to overall luminance level. This means that if we view an image of a bright outdoor scene on a monitor in a dim environment, our eyes are adapted to the dim environment rather than the outdoor lighting. By keeping color ratios constant, we do not take this effect into account.
结果经常显得过于饱和,因为人类对色彩的感知与整体亮度水平呈非线性关系。这意味着,如果我们在昏暗的环境中通过显示器观看明亮的室外场景图像,我们的眼睛会适应昏暗的环境,而不是室外照明。通过保持色彩比率不变,我们不会考虑这种影响。
Alternatively, the saturation constant s may be chosen smaller than one. Such per-channel gamma correction may desaturate the results to an appropriate level, as shown in Figure 20.11 (Fattal, Lischinski, & Werman, 2002). A more comprehensive solution is to incorporate ideas from the field of color appearance modeling into tone reproduction operators (Pattanaik, Ferwerda, Fairchild, & Greenberg, 1998; Fairchild & Johnson, 2004; Reinhard & Devlin, 2005).
或者,饱和度常数s可以选得小于 1。这种每通道伽马校正可以将结果的饱和度降低到适当的水平,如图20.11所示(Fattal、Lischinski 和 Werman,2002 年)。更全面的解决方案是将色彩外观建模领域的思想融入色调再现运算符(Pattanaik、Ferwerda、Fairchild 和 Greenberg,1998 年;Fairchild 和 Johnson,2004 年;Reinhard 和 Devlin,2005 年)。
Figure 20.11. Per-channel gamma correction may desaturate the image. The left image was desaturated with a value of s = 0.5. The right image was not desaturated ( s =1).
图 20.11。每通道伽马校正可能会降低图像的饱和度。左侧图像的饱和度降低, s值为 0.5。右侧图像的饱和度未降低( s =1)。
Finally, if an example image with a representative color scheme is already available, this color scheme may be applied to a new image. Such a mapping of colors between images may be used for subtle color correction, such as saturation adjustment or for more creative color mappings. The mapping proceeds by converting both source and target images to a decorrelated color space. In such a color space, the pixel values in each color channel may be treated independently without introducing too many artifacts (Reinhard, Ashikhmin, Gooch, & Shirley, 2001).
最后,如果已经有具有代表性配色方案的示例图像,则可以将此配色方案应用于新图像。图像之间的这种颜色映射可用于细微的颜色校正,例如饱和度调整或用于更有创意的颜色映射。映射通过将源图像和目标图像都转换为去相关的颜色空间来进行。在这样的颜色空间中,可以独立处理每个颜色通道中的像素值,而不会引入太多伪影(Reinhard、Ashikhmin、Gooch 和 Shirley,2001 年)。
Mapping colors from one image to another in a decorrelated color space is then straightforward: compute the mean and standard deviation of all pixels in the source and target images for the three color channels separately. Then, shift and scale the target image so that in each color channel the mean and standard deviation of the target image is the same as the source image. The resulting image is then obtained by converting from the decorrelated color space to RGB and clamping negative pixels to zero. The dynamic range of the image may have changed as a result of applying this algorithm. It is therefore recommended to apply this algorithm on high dynamic range images and apply a conventional tone reproduction algorithm afterward. A suitable decorrelated color space is the opponent space from Section 18.2.4.
在去相关颜色空间中将颜色从一张图像映射到另一张图像非常简单:分别计算源图像和目标图像中三个颜色通道的所有像素的平均值和标准差。然后,移动并缩放目标图像,使得每个颜色通道中目标图像的平均值和标准差与源图像相同。然后通过将去相关颜色空间转换为 RGB 并将负像素钳位为零来获得结果图像。应用此算法后,图像的动态范围可能会发生变化。因此,建议将此算法应用于高动态范围图像,然后应用传统的色调再现算法。合适的去相关颜色空间是第 18.2.4 节中的对手空间。
The result of applying such a color transform to the image in Figure 20.12 is shown in Figure 20.13.
对图 20.12中的图像应用此类颜色变换的结果如图 20.13所示。
Figure 20.12. Image used for demonstrating the color transfer technique. Results are shown in Figures 21.13 and 21.31.
图 20.12。用于演示颜色转移技术的图像。结果如图 21.13和21.31所示。
Figure 20.13. The image on the left is used to adjust the colors of the image shown in Figure 20.12. The result is shown on the right.
图 20.13。左侧图像用于调整图 20.12所示图像的颜色。结果显示在右侧。
For now, we assume that an image is formed as the result of light being diffusely reflected off of surfaces. In Sections 20.5 and 20.6, we relax this constraint to scenes directly depicting light sources and highlights. The luminance Lv of each pixel is then approximated by the following product:
现在,我们假设图像是由光从表面漫反射而形成的。在20.5和20.6节中,我们将此约束放宽到直接描绘光源和高光的场景。然后,每个像素的亮度L v可由以下乘积近似:
Here, r denotes the reflectance of a surface, and Ev denotes the illuminance. The subscript v indicates that we are using photometrically weighted quantities. Alternatively, we may write this expression in the logarithmic domain (Oppenheim et al., 1968):
这里, r表示表面的反射率, E v表示照度。下标v表示我们使用光度加权量。或者,我们可以在对数域中写出此表达式(Oppenheim 等人,1968 年):
Photographic transparencies record images by varying the density of the material. In traditional photography, this variation has a logarithmic relation with luminance. Thus, in analogy with common practice in photography, we will use the term density representation ( D) for log luminance. When represented in the log domain, reflectance and illuminance become additive. This facilitates separation of these two components, despite the fact that isolating either reflectance or illuminance is an under-constrained problem. In practice, separation is possible only to a certain degree and depends on the composition of the image. Nonetheless, tone reproduction could be based on disentangling these two components of image formation, as shown in the following two sections.
摄影胶片通过改变材料的密度来记录图像。在传统摄影中,这种变化与亮度呈对数关系。因此,与摄影中的常见做法类似,我们将使用术语对数亮度的密度表示( D )。当在对数域中表示时,反射率和照度变为可加的。这有助于分离这两个成分,尽管分离反射率或照度是一个约束不足的问题。实际上,分离只能在一定程度上进行,并且取决于图像的组成。尽管如此,色调再现可以基于解开图像形成的这两个成分,如以下两节所示。
For typical diffuse scenes, the reflectance component tends to exhibit high spatial frequencies due to textured surfaces as well as the presence of surface edges. On the other hand, illuminance tends to be a slowly varying function over space.
对于典型的漫反射场景,由于表面纹理以及表面边缘的存在,反射分量往往表现出较高的空间频率。另一方面,照度往往是一个随空间缓慢变化的函数。
Since reflectance is low dynamic range and illuminance is high dynamic range, we may try to separate the two components. The frequency-dependence of both reflectance and illuminance provides a solution. We may, for instance, compute the Fourier transform of an image and attenuate only the low frequencies. This compresses the illuminance component while leaving the reflectance component largely unaffected—the very first digital tone reproduction operator known to us takes this approach (Oppenheim et al., 1968).
由于反射率属于低动态范围,而照度属于高动态范围,我们可以尝试将这两个分量分开。反射率和照度的频率依赖性提供了一种解决方案。例如,我们可以计算图像的傅里叶变换并仅衰减低频。这会压缩照度分量,而反射率分量基本不受影响——我们所知的第一个数字色调再现运算符就采用了这种方法(Oppenheim 等人,1968 年)。
More recently, other operators have also followed this line of reasoning. In particular, bilateral and trilateral filters were used to separate an image into base and detail layers (Durand & Dorsey, 2002; Choudhury & Tumblin, 2003). Both filters are edge-preserving smoothing operators which may be used in a variety of different ways. Applying an edge-preserving smoothing operator to a density image results in a blurred image in which sharp edges remain present (Figure 20.14 (left)). We may view such an image as a base layer. If we then pixel-wise divide the high dynamic range image by the base layer, we obtain a detail layer which contains all the high-frequency detail (Figure 20.14 (right)).
最近,其他运算符也遵循了这种推理。特别是,双边和三边滤波器用于将图像分离为基础层和细节层(Durand & Dorsey,2002;Choudhury & Tumblin,2003)。这两个滤波器都是边缘保留平滑运算符,可以以多种不同的方式使用。将边缘保留平滑运算符应用于密度图像会产生模糊图像,但其中仍存在尖锐边缘(图 20.14 (左))。我们可以将这样的图像视为基础层。如果我们随后将高动态范围图像逐像素除以基础层,我们将获得包含所有高频细节的细节层(图 20.14 (右))。
Figure 20.14. Bilateral filtering removes small details but preserves sharp gradients (left). The associated detail layer is shown on the right.
图 20.14。双边滤波移除小细节但保留尖锐梯度(左)。相关细节层显示在右侧。
For diffuse scenes, base and detail layers are similar to representations of illuminance and reflectance. For images depicting highlights and light sources, this parallel does not hold. However, separation of an image into base and detail layers is possible regardless of the image’s content. By compressing the base layer before recombining into a compressed density image, a low dynamic range density image may be created (Figure 20.15). After exponentiation, a displayable image is obtained.
对于漫反射场景,基础层和细节层类似于照度和反射率的表示。对于描绘高光和光源的图像,这种相似性并不成立。但是,无论图像的内容如何,都可以将图像分离为基础层和细节层。通过压缩基础层然后再重新组合成压缩密度图像,可以创建低动态范围密度图像(图 20.15 )。指数化后,得到可显示的图像。
Figure 20.15. An image tonemapped using bilateral filtering. The base and detail layers shown in Figure 20.14 are recombined after compressing the base layer.
图 20.15.使用双边滤波进行色调映射的图像。图 20.14中所示的基础层和细节层在压缩基础层后重新组合。
Edge-preserving smoothing opera-torsmayalsobeusedtocomputealocal adaptation level for each pixel, which may be used in a spatially varying or local tone reproduction operator. We describe this use of bilateral and trilateral filters in Section 20.7.
边缘保持平滑算子也可用于计算每个像素的局部适应水平,该算子可用于空间变化或局部色调再现算子。我们将在第 20.7 节中描述双边和三边滤波器的这种用法。
The arguments made for the frequency-based operators in the preceding section also hold for the gradient field. Assuming that no light sources are directly visible, the reflectance component will be a constant function with sharp spikes in the gradient field. Similarly, the illuminance component will cause small gradients everywhere.
上一节中对基于频率的算子的论证也适用于梯度场。假设没有光源直接可见,反射率分量将是一个常数函数,梯度场中会出现尖峰。同样,照度分量将在各处引起小的梯度。
Humans are generally able to separate illuminance from reflectance in typical scenes. The perception of surface reflectance after discounting the illuminant is called lightness. To assess the lightness of an image depicting only diffuse surfaces, B. K. P. Horn was the first to separate reflectance and illuminance using a gradient field (Horn, 1974). He used simple thresholding to remove all small gradients and then integrated the image, which involves solving a Poisson equation using the Full Multigrid Method (Press, Teukolsky, Vetterling, & Flannery, 1992).
人类通常能够在典型场景中区分照度和反射。在忽略光源后,表面反射的感知称为亮度。为了评估仅描绘漫反射表面的图像的亮度,BKP Horn 首次使用梯度场将反射和照度分开(Horn,1974 年)。他使用简单的阈值去除所有小梯度,然后对图像进行积分,这涉及使用全多重网格方法求解泊松方程(Press、Teukolsky、Vetterling 和 Flannery,1992 年)。
The result is similar to an edge-preserving smoothing filter. This is according to expectation since Oppenheim’s frequency-based operator works under the same assumptions of scene reflectivity and image formation. In particular, Horn’s work was directly aimed at “mini-worlds of Mondrians,” which are simplified versions of diffuse scenes which resemble the abstract paintings by the famous Dutch painter Piet Mondrian.
结果类似于边缘保留平滑滤波器。这是意料之中的,因为 Oppenheim 的频率相关算子在场景反射率和图像形成的相同假设下工作。特别是,Horn 的工作直接针对“蒙德里安的迷你世界”,它们是弥散场景的简化版本,类似于著名荷兰画家 Piet Mondrian 的抽象画。
Horn’s work cannot be employed directly as a tone reproduction operator, since most high dynamic range images depict light sources. However, a relatively small variation will turn this work into a suitable tone reproduction operator. If light sources or specular surfaces are depicted in the image, then large gradients will be associated with the edges of light sources and highlights. These cause the image to have a high dynamic range. An example is shown in Figure 20.16, where the highlights on the snooker balls cause sharp gradients.
Horn 的工作不能直接用作色调再现运算符,因为大多数高动态范围图像都描绘了光源。但是,相对较小的变化将使这项工作成为合适的色调再现运算符。如果图像中描绘了光源或镜面,则较大的渐变将与光源和高光的边缘相关联。这导致图像具有高动态范围。图 20.16显示了一个例子,其中斯诺克球上的高光导致尖锐的渐变。
Figure 20.16. The image on the left (tonemapped using gradient-domain compression) shows a scene with highlights. These highlights show up as large gradients on the right, where the magnitude of the gradients is mapped to a grayscale (black is a gradient of 0, white is the maximum gradient in the image).
图 20.16。左侧图像(使用梯度域压缩进行色调映射)显示了一个具有高光的场景。这些高光在右侧显示为较大的渐变,其中渐变的幅度被映射到灰度(黑色是 0 的渐变,白色是图像中的最大渐变)。
We could therefore compress a high dynamic range image by attenuating large gradients, rather than thresholding the gradient field. This approach was taken by Fattal et al. who showed that high dynamic range imagery may be successfully compressed by integrating a compressed gradient field (Figure 20.17) (Fattal et al., 2002). Fattal’s gradient-domain compression is not limited to diffuse scenes.
因此,我们可以通过衰减大梯度而不是对梯度场进行阈值处理来压缩高动态范围图像。Fattal 等人采用了这种方法,他们展示了通过积分压缩梯度场(图 20.17 )可以成功压缩高动态范围图像(Fattal 等人,2002 年)。Fattal 的梯度域压缩不仅限于漫射场景。
Figure 20.17. An image tonemapped using gradient-domain compression.
图 20.17.使用梯度域压缩进行色调映射的图像。
In the following sections, we discuss tone reproduction operators which apply compression directly on pixels without transformation to other domains. Often global and local operators are distinguished. Tone reproduction operators in the former class change each pixel’s luminance values according to a compressive function which is the same for each pixel. The term global stems from the fact that many such functions need to be anchored to some values determined by analyzing the full image. In practice, most operators use the geometric average to steer the compression:
在以下部分中,我们将讨论色调再现算子,它们直接对像素应用压缩,而无需转换到其他域。通常,全局和局部算子是不同的。前一类中的色调再现算子根据每个像素相同的压缩函数改变每个像素的亮度值。术语“全局”源于这样一个事实,即许多此类函数需要锚定到通过分析整个图像确定的某些值。在实践中,大多数算子使用几何平均值大号¯ υ控制压缩:
In Equation (20.1), a small constant δ is introduced to prevent the average to become zero in the presence of black pixels. The geometric average is normally mapped to a predefined display value. The effect of mapping the geometric average to different display values is shown in Figure 20.18. Alternatively, sometimes the minimum or maximum image luminance is used. The main challenge faced in the design of a global operator lies in the choice of the compressive function.
在公式 (20.1) 中,引入了一个小常数δ ,以防止在存在黑色像素的情况下平均值变为零。几何平均值通常映射到预定义的显示值。将几何平均值映射到不同显示值的效果如图 20.18所示。或者,有时使用最小或最大图像亮度。全局算子设计面临的主要挑战在于压缩函数的选择。
Figure 20.18. Spatial tonemapping operator applied after mapping the geometric average to different display values (left: 0.12, right: 0.38).
图 20.18.将几何平均值映射到不同的显示值后应用的空间色调映射运算符(左:0.12,右:0.38)。
On the other hand, local operators compress each pixel according to a specific compression function which is modulated by information derived from a selection of neighboring pixels, rather than the full image. The rationale is that a bright pixel in a bright neighborhood may be perceived differently than a bright pixel in a dim neighborhood. Design challenges in the development of a local operator involves choosing the compressive function, the size of the local neighborhood for each pixel, and the manner in which local pixel values are used. In general, local operators achieve better compression than global operators (Figure 20.19), albeit at a higher computational cost.
另一方面,局部算子根据特定的压缩函数压缩每个像素,该函数由从邻近像素的选择(而不是整个图像)获得的信息调制。其基本原理是明亮邻域中的明亮像素可能与暗淡邻域中的明亮像素看起来不同。局部算子开发中的设计挑战包括选择压缩函数、每个像素的局部邻域的大小以及使用局部像素值的方式。一般来说,局部算子比全局算子实现更好的压缩(图 20.19 ),尽管计算成本更高。
Figure 20.19. A global tone reproduction operator (left) and a local tone reproduction operator (right) (Reinhard, Stark, Shirley, & Ferwerda, 2002) of each image. The local operator shows more detail; for example, the metal badge on the right shows better contrast and the highlights are crisper.
图 20.19。每幅图像的全局色调再现算子(左)和局部色调再现算子(右)(Reinhard、Stark、Shirley 和 Ferwerda,2002)。局部算子显示更多细节;例如,右侧的金属徽章显示更好的对比度,高光更清晰。
Both global and local operators are often inspired by the human visual system. Most operators employ one of two distinct compressive functions, which is orthogonal to the distinction between local and global operators. Display values Ld(x, y) are most commonly derived from image luminances Lv(x, y) by the following two functional forms:
全局和局部算子通常都受到人类视觉系统的启发。大多数算子采用两种不同的压缩函数之一,这与局部和全局算子之间的区别正交。显示值L d ( x, y ) 最常见的是从图像亮度L v ( x, y ) 中得出的,其函数形式如下:
In these equations, f (x, y) may either be a constant or a function which varies per pixel. In the former case, we have a global operator, whereas a spatially varying function f (x, y) results in a local operator. The exponent n is usually a constant which is fixed for a particular operator.
在这些方程中, f ( x, y ) 可以是常数,也可以是随像素变化的函数。前一种情况下,我们有一个全局算子,而空间变化函数f ( x, y ) 则会产生一个局部算子。指数n通常是一个常数,对于特定算子而言是固定的。
Equation (20.2) divides each pixel’s luminance by a value derived from either the full image or a local neighborhood. Equation (20.3) has an S-shaped curve on a log-linear plot and is called a sigmoid for that reason. This functional form fits data obtained from measuring the electrical response of photoreceptors to flashes of light in various species. In the following sections, we discuss both functional forms.
方程 (20.2) 将每个像素的亮度除以从整个图像或局部邻域得出的值。方程 (20.3) 在对数线性图上具有 S 形曲线,因此被称为 S 形。此函数形式适合通过测量不同物种的光感受器对闪光的电响应而获得的数据。在以下部分中,我们将讨论这两种函数形式。
Each pixel may be divided by a constant to bring the high dynamic range image within a displayable range. Such a division essentially constitutes linear scaling, as shown in Figure 20.3. While Figure 20.3 shows ad-hoc linear scaling, this approach may be refined by employing psychophysical data to derive the scaling constant f (x, y) =k in Equation (20.2) (G. J. Ward, 1994; Ferwerda, Pattanaik, Shirley, & Greenberg, 1996).
每个像素可以除以一个常数,以使高动态范围图像处于可显示范围内。这种划分本质上构成了线性缩放,如图 20.3所示。虽然图 20.3显示了临时的线性缩放,但这种方法可以通过使用心理物理数据来推导公式 (20.2) 中的缩放常数f ( x, y ) = k来改进 (GJ Ward,1994 年;Ferwerda、Pattanaik、Shirley 和 Greenberg,1996 年)。
Alternatively, several approaches exist that compute a spatially varying divisor. In each of these cases, f (x, y) is a blurred version of the image, i.e., . The blur is achieved by convolving the image with a Gaussian filter (Chiu et al., 1993; Rahman, Jobson, & Woodell, 1996). In addition, the computation of f (x, y) by blurring the image may be combined with a shift in white point for the purpose of color appearance modeling (Fairchild & Johnson, 2002; G. M. Johnson & Fairchild, 2003; Fairchild & Johnson, 2004).
或者,存在几种计算空间变化除数的方法。在每种情况下, f ( x, y ) 都是图像的模糊版本,即 f(xy)=Lυblur(xy)。模糊是通过将图像与高斯滤波器卷积来实现的 (Chiu et al., 1993; Rahman, Jobson, & Woodell, 1996)。此外,通过模糊图像计算f ( x, y ) 可以与白点偏移相结合,以进行颜色外观建模 (Fairchild & Johnson, 2002; GM Johnson & Fairchild, 2003; Fairchild & Johnson, 2004)。
The size and the weight of the Gaussian filter has a profound impact on the resulting displayable image. The Gaussian filter has the effect of selecting a weighted local average. Tone reproduction is then a matter of dividing each pixel by its associated weighted local average. If the size of the filter kernel is chosen too small, then haloing artifacts will occur (Figure 20.20 (left)). Haloing is a common problem with local operators and is particularly evident when tone mapping relies on division.
高斯滤波器的大小和权重对最终可显示的图像有很大影响。高斯滤波器具有选择加权局部平均值的效果。色调再现就是将每个像素除以其相关的加权局部平均值。如果滤波器核的大小选择得太小,则会出现光晕伪影(图 20.20 (左))。光晕是局部算子的一个常见问题,当色调映射依赖于除法时尤其明显。
Figure 20.20. Images tonemapped by dividing by Gaussian-blurred versions. The size of the filter kernel is 64 pixels for the left image and 512 pixels for the right image. For division-based algorithms, halo artifacts are minimized by choosing large filter kernels.
图 20.20。通过除以高斯模糊版本对图像进行色调映射。左图的滤波器内核大小为 64 像素,右图的滤波器内核大小为 512 像素。对于基于除法的算法,通过选择较大的滤波器内核可以最大限度地减少光晕伪影。
In general, haloing artifacts may be minimized in this approach by making the filter kernel large (Figure 20.20 (right)). Reasonable results may be obtained by choosing a filter size of at least one quarter of the image. Sometimes even larger filter kernels are desirable to minimize artifacts. Note, that in the limit, the filter size becomes as large as the image itself. In that case, the local operator becomes global, and the extra compression normally afforded by a local approach is lost.
一般来说,通过使滤波器内核变大(图 20.20 (右)),可以最小化光晕伪影。通过选择至少为图像四分之一的滤波器大小,可以获得合理的结果。有时甚至需要更大的滤波器内核来最小化伪影。请注意,在极限情况下,滤波器大小会变得与图像本身一样大。在这种情况下,局部算子变为全局算子,并且通常由局部方法提供的额外压缩会丢失。
The functional form whereby each pixel is divided by a Gaussian-blurred pixel at the same spatial position thus requires an undesirable tradeoff between amount of compression and severity of artifacts.
因此,将每个像素除以相同空间位置的高斯模糊像素的函数形式需要在压缩量和伪影严重程度之间进行不良的权衡。
Equation (20.3) follows a different functional form from simple division, and, therefore, affords a different tradeoff between amount of compression, presence of artifacts, and speed of computation.
方程 (20.3) 遵循与简单除法不同的函数形式,因此,在压缩量、伪影的存在和计算速度之间提供了不同的权衡。
Sigmoids have several desirable properties. For very small luminance values, the mapping is approximately linear, so that contrast is preserved in dark areas of the image. The function has an asymptote at one, which means that the output mapping is always bounded between 0 and 1.
Sigmoid 函数具有几个理想的特性。对于非常小的亮度值,映射近似为线性,因此图像的暗区对比度得以保留。该函数在 1 处有渐近线,这意味着输出映射始终在 0 和 1 之间。
In Equation (20.3), the function f (x, y) may be computed as a global constant or as a spatially varying function. Following common practice in electro-physiology, we call f (x, y) the semi-saturation constant. Its value determines which values in the input image are optimally visible after tonemapping. In particular, if we assume that the exponent n equals 1, then luminance values equal to the semi-saturation constant will be mapped to 0.5. The effect of choosing different semi-saturation constants is shown in Figure 20.21.
在公式 (20.3) 中,函数f ( x, y ) 可以计算为全局常数或空间变化函数。按照电生理学中的惯例,我们将f ( x, y ) 称为半饱和常数。它的值决定了输入图像中的哪些值在色调映射后最佳可见。具体而言,如果我们假设指数n等于 1,则等于半饱和常数的亮度值将被映射到 05。选择不同半饱和常数的效果如图 20.21所示。
Figure 20.21. The choice of semi-saturation constant determines how input values are mapped to display values.
图 20.21.半饱和常数的选择决定了输入值如何映射到显示值。
The function f (x, y) may be computed in several different ways (Reinhard et al., 2005). In its simplest form, f (x, y) is set to , so that the geometric average is mapped to user parameter k (Figure 20.22) (Reinhard et al., 2002). In this case, a good initial value for k is 0.18, although for particularly bright or dark scenes this value may be raised or lowered. Its value may be estimated from the image itself (Reinhard, 2003). The exponent n in Equation (20.3) may be set to 1.
函数f ( x, y ) 可以用几种不同的方式计算 (Reinhard 等人,2005)。最简单的形式是, f ( x, y ) 设置为大号¯ υ /钾,这样几何平均值就映射到用户参数k (图 20.22 )(Reinhard 等人,2002 年)。在这种情况下, k的初始值最好为 018,尽管对于特别明亮或黑暗的场景,此值可能会升高或降低。其值可以从图像本身估算出来(Reinhard,2003 年)。公式 (20.3) 中的指数n可以设置为 1。
Figure 20.22. A linearly scaled image (left) and an image tonemapped using sigmoidal compression (right).
图 20.22.线性缩放的图像(左)和使用 S 形压缩进行色调映射的图像(右)。
In this approach, the semi-saturation constant is a function of the geometric average, and the operator is therefore global. A variation of this global operator computes the semi-saturation constant by linearly interpolating between the geometric average and each pixel’s luminance:
在这种方法中,半饱和常数是几何平均值的函数,因此该算子是全局的。该全局算子的变体通过在几何平均值和每个像素的亮度之间进行线性插值来计算半饱和常数:
The interpolation is governed by user parameter a which has the effect of varying the amount of contrast in the displayable image (Figure 20.23) (Reinhard & Devlin, 2005). More contrast means less visible detail in the light and dark areas and vice versa. This interpolation may be viewed as a halfway house between a fully global and a fully local operator by interpolating between the two extremes without resorting to expensive blurring operations.
插值由用户参数a控制,该参数可以改变可显示图像的对比度(图 20.23 )(Reinhard & Devlin,2005)。对比度越高,明暗区域中可见的细节越少,反之亦然。这种插值可以看作是完全全局和完全局部运算符之间的折衷方案,它在两个极端之间进行插值,而无需诉诸昂贵的模糊操作。
Figure 20.23. Linear interpolation varies contrast in the tonemapped image. The parameter a is set to 0.0 in the left image, and to 1.0 in the right image.
图 20.23。线性插值改变色调映射图像的对比度。左图中的参数a设置为 0.0,右图中的参数 a 设置为 1.0。
Although operators typically compress luminance values, this particular operator may be extended to include a simple form of chromatic adaptation. It thus presents an opportunity to adjust the level of saturation normally associated with tonemapping, as discussed at the beginning of this chapter.
尽管操作符通常压缩亮度值,但此特定操作符可以扩展为包含简单形式的色度适应。因此,它提供了一个调整通常与色调映射相关的饱和度级别的机会,如本章开头所述。
Rather than compress the luminance channel only, sigmoidal compression is applied to each of the three color channels:
不仅仅压缩亮度通道,还对三个颜色通道分别应用 S 形压缩:
The computation of f (x, y) is also modified to bilinearly interpolate between the geometric average luminance and pixel luminance and between each independent color channel and the pixel’s luminance value. We therefore compute the geometric average luminance value , as well as the geometric average of the red, green, and blue channels (, and ). From these values, we compute f (x, y) for each pixel and for each color channel independently. We show the equation for the red channel ( fr(x, y)):
f ( x, y ) 的计算也经过修改,在几何平均亮度和像素亮度之间以及每个独立颜色通道和像素亮度值之间进行双线性插值。因此,我们计算几何平均亮度值大号¯ υ以及红、绿、蓝通道的几何平均值(I¯rI¯g,和我¯ b )。根据这些值,我们为每个像素和每个颜色通道独立计算f ( x, y )。我们给出红色通道 ( f r ( x, y ) 的方程:
The interpolation parameter a steers the amount of contrast as before, and the new interpolation parameter c allows a simple form of color correction (Figure 20.24).
插值参数a与以前一样控制对比度的数量,新的插值参数c允许一种简单形式的色彩校正(图 20.24 )。
Figure 20.24. Linear interpolation for color correction. The parameter c is set to 0.0 in the left image, and to 1.0 in the right image.
图 20.24.用于颜色校正的线性插值。左图中的参数c设置为 0.0,右图中的参数 c 设置为 1.0。
So far we have not discussed the value of the exponent n in Equation (20.3). Studies in electrophysiology report values between n = 0.2 and n = 0.9 (Hood, Finkelstein, & Buckingham, 1979). While the exponent may be user-specified, for a wide variety of images we may estimate a reasonable value from the geometric average luminance and the minimum and maximum luminance in the image (Lmin and Lmax) with the following empirical equation:
到目前为止,我们还没有讨论公式 (20.3) 中指数n的值。电生理学研究报告的值介于n = 02 和n = 09 之间 (Hood、Finkelstein 和 Buckingham,1979)。虽然指数可能是用户指定的,但对于各种各样的图像,我们可以根据几何平均亮度估计一个合理的值大号¯ υ以及图像中的最小和最大亮度( L min和L max ),其经验方程如下:
The several variants of sigmoidal compression shown so far are all global in nature. This has the advantage that they are fast to compute, and they are very suitable for medium to high dynamic range images. For very high dynamic range images, it may be necessary to resort to a local operator, since this may give some extra compression. A straightforward method to extend sigmoidal compression replaces the global semi-saturation constant by a spatially varying function, which may be computed in several different ways.
到目前为止展示的几种 S 形压缩变体本质上都是全局的。这样做的好处是它们计算速度快,非常适合中高动态范围图像。对于非常高动态范围的图像,可能需要使用局部算子,因为这可能会产生一些额外的压缩。一种扩展 S 形压缩的直接方法是用空间变化函数代替全局半饱和常数,可以用几种不同的方式计算。
In other words, the function f (x, y) is so far assumed to be constant, but may also be computed as a spatially localized average. Perhaps the simplest way to accomplish this is to once more use a Gaussian-blurred image. Each pixel in a blurred image represents a locally averaged value which may be viewed as a suitable choice for the semi-saturation constant1.
换句话说,函数f ( x, y ) 目前被认为是常数,但也可以计算为空间局部平均值。实现这一点的最简单方法可能是再次使用高斯模糊图像。模糊图像中的每个像素都代表一个局部平均值,可以将其视为半饱和常数1的合适选择。
As with division-based operators discussed in the previous section, we have to consider haloing artifacts. However, when an image is divided by a Gaussian-blurred version of itself, the size of the Gaussian filter kernel needs to be large in order to minimize halos. If sigmoids are used with a spatially variant semi-saturation constant, the Gaussian filter kernel needs to be made small in order to minimize artifacts. This is a significant improvement, since small amounts of Gaussian blur may be efficiently computed directly in the spatial domain. In other words, there is no need to resort to expensive Fourier transforms. In practice, filter kernels of only a few pixels width are sufficient to suppress significant artifacts while at the same time producing more local contrast in the tonemapped images.
与上一节讨论的基于除法的运算符一样,我们必须考虑光晕伪影。但是,当图像被其自身的高斯模糊版本除时,高斯滤波器核的大小需要很大,以尽量减少光晕。如果将 S 型函数与空间变量半饱和常数一起使用,则需要将高斯滤波器核做得很小,以尽量减少伪影。这是一个重大的改进,因为可以在空间域中直接有效地计算少量的高斯模糊。换句话说,无需求助于昂贵的傅里叶变换。实际上,只有几个像素宽度的滤波器核足以抑制明显的伪影,同时在色调映射图像中产生更多的局部对比度。
One potential issue with Gaussian blur is that the filter blurs across sharp contrast edges in the same way that it blurs small details. In practice, if there is a large contrast gradient in the neighborhood of the pixel under consideration, this causes the Gaussian-blurred pixel to be significantly different from the pixel itself. This is the direct cause for halos. By using a very large filter kernel in a division-based approach, such large contrasts are averaged out.
高斯模糊的一个潜在问题是,滤镜模糊鲜明对比边缘的方式与模糊小细节的方式相同。实际上,如果所考虑像素的邻域中存在较大的对比度梯度,则会导致高斯模糊像素与像素本身存在明显差异。这是光晕的直接原因。通过在基于除法的方法中使用非常大的滤镜内核,可以平均化这种较大的对比度。
In sigmoidal compression schemes, a small Gaussian filter minimizes the chances of overlapping with a sharp contrast gradient. In that case, halos still occur, but their size is such that they usually go unnoticed and instead are perceived as enhancing contrast.
在 S 形压缩方案中,小型高斯滤波器可最大程度地减少与强烈对比度梯度重叠的可能性。在这种情况下,光晕仍然会出现,但它们的大小通常不会被注意到,而是被视为增强了对比度。
Another way to blur an image, while minimizing the negative effects of nearby large contrast steps, is to avoid blurring over such edges. A simple, but computationally expensive way, is to compute a stack of Gaussian-blurred images with different kernel sizes. For each pixel, we may choose the largest Gaussian that does not overlap with a significant gradient.
模糊图像的另一种方法是避免模糊此类边缘,同时尽量减少附近较大对比度步骤的负面影响。一种简单但计算量大的方法是计算具有不同核大小的高斯模糊图像堆栈。对于每个像素,我们可以选择不与显著梯度重叠的最大高斯。
In a relatively uniform neighborhood, the value of a Gaussian-blurred pixel should be the same regardless of the filter kernel size. Thus, the difference between a pixel filtered with two different Gaussians should be approximately zero. This difference will only change significantly if the wider filter kernel overlaps with a neighborhood containing a sharp contrast step, whereas the smaller filter kernel does not.
在相对均匀的邻域中,无论滤波器内核大小如何,高斯模糊像素的值都应该相同。因此,用两个不同的高斯滤波的像素之间的差异应该近似为零。只有当较宽的滤波器内核与包含鲜明对比步骤的邻域重叠时,此差异才会发生显着变化,而较小的滤波器内核则不会发生显着变化。
1 Although f (x, y) is now no longer a constant, we continue to refer to it as the semi-saturation constant.
1尽管f ( x, y ) 现在不再是一个常数,但我们仍然将其称为半饱和常数。
It is possible, therefore, to find the largest neighborhood around a pixel that does not contain sharp edges by examining differences of Gaussians at different kernel sizes. For the image shown in Figure 20.25, the scale selected for each pixel is shown in Figure 20.26 (left). Such a scale selection mechanism is employed by the photographic tone reproduction operator (Reinhard et al., 2002) as well as in Ashikhmin’s operator (Ashikhmin, 2002).
因此,通过检查不同核大小下高斯分布的差异,可以找到像素周围不包含尖锐边缘的最大邻域。对于图 20.25所示的图像,为每个像素选择的比例如图 20.26 (左)所示。摄影色调再现算子(Reinhard 等人,2002 年)以及 Ashikhmin 算子(Ashikhmin,2002 年)采用了这种比例选择机制。
Figure 20.25. Example image used to demonstrate the scale selection mechanism shown in Figure 20.26.
图 20.25.用于演示图 20.26中所示的比例选择机制的示例图像。
Figure 20.26. Scale selection mechanism: the left image shows the scale selected for each pixel of the image shown in Figure 20.25; the darker the pixel, the smaller the scale. A total of eight different scales were used to compute this image. The right image shows the local average computed for each pixel on the basis of the neighborhood selection mechanism.
图 20.26。尺度选择机制:左图显示了图 20.25中所示图像中每个像素所选择的尺度;像素越暗,尺度越小。共使用了 8 个不同的尺度来计算此图像。右图显示了根据邻域选择机制为每个像素计算的局部平均值。
Once the appropriate neighborhood for each pixel is known, the Gaussian-blurred average Lblur for this neighborhood (shown on the right of Figure 20.26) may be used to steer the semi-saturation constant, such as for instance employed by the photographic tone reproduction operator:
一旦知道了每个像素的适当邻域,就可以使用该邻域的高斯模糊平均L模糊(如图 20.26右侧所示)来控制半饱和常数,例如由摄影色调再现运算符使用:
An alternative, and arguably better, approach is to employ edge-preserving smoothing operators, which are designed specifically for removing small details while keeping sharp contrasts in tact. Several such filters, such as the bilateral filter (Figure 20.27), trilateral filter, Susan filter, the LCIS algorithm and the mean shift algorithm are suitable, although some of them are expensive to compute (Durand & Dorsey, 2002; Choudhury & Tumblin, 2003; Pattanaik & Yee, 2002; Tumblin & Turk, 1999; Comaniciu & Meer, 2002).
另一种可能更好的方法是采用边缘保留平滑算子,这种算子专门用于去除小细节,同时保持鲜明的对比。一些这样的滤波器,如双边滤波器(图 20.27 )、三边滤波器、Susan 滤波器、LCIS 算法和均值偏移算法都是合适的,尽管其中一些滤波器的计算成本很高(Durand & Dorsey,2002;Choudhury & Tumblin,2003;Pattanaik & Yee,2002;Tumblin & Turk,1999;Comaniciu & Meer,2002)。
Figure 20.27. Sigmoidal compression (left) and sigmoidal compression using bilateral filtering to compute the semi-saturation constant (right). Note the improved contrast in the sky in the right image.
图 20.27。S形压缩(左)和使用双边滤波计算半饱和常数的 S 形压缩(右)。请注意右图中天空的对比度有所改善。
Although the previous sections together discuss most tone reproduction operators to date, there are one or two operators that do not directly fit into the above categories. The simplest of these are variations of logarithmic compression, and the other is a histogram-based approach.
虽然前面几节讨论了迄今为止的大多数色调再现运算符,但有一两个运算符并不直接属于上述类别。其中最简单的是对数压缩的变体,另一个是基于直方图的方法。
Dynamic range reduction may be accomplished by taking the logarithm, provided that this number is greater than 1. Any positive number may then be nonlinearly scaled between 0 and 1 using the following equation:
动态范围的减少可以通过取对数来实现,只要该数大于 1。然后可以使用以下公式在 0 和 1 之间非线性缩放任何正数:
While the base b of the logarithm above is not specified, any choice of base will do. This freedom to choose the base of the logarithm may be used to vary the base with input luminance, and thus achieve an operator that is better matched to the image being compressed (Drago, Myszkowski, Annen, & Chiba, 2003). This method uses Perlin and Hoffert’s bias function which takes user parameter p (Perlin & Hoffert, 1989):
虽然上述对数的底数b没有指定,但任何底数都可以。这种选择对数底数的自由可用于根据输入亮度改变底数,从而实现与被压缩图像更匹配的运算符(Drago、Myszkowski、Annen 和 Chiba,2003 年)。此方法使用 Perlin 和 Hoffert 的偏差函数,该函数采用用户参数p (Perlin 和 Hoffert,1989 年):
Making the base b dependent on luminance and smoothly interpolating bases between 2 and 10, the logarithmic mapping above may be refined:
使底数b依赖于亮度,并在 2 到 10 之间平滑插入底数,上面的对数映射可以得到细化:
For user parameter p, an initial value of around 0.85 tends to yield plausible results (Figure 20.28 (right)).
对于用户参数p ,初始值约为 085 往往会产生合理的结果(图 20.28 (右))。
Figure 20.28. Logarithmic compression using base 10 logarithms (left) and logarithmic compression with varying base (right).
图 20.28.使用以 10 为底的对数的对数压缩(左)和使用不同底数的对数压缩(右)。
Alternatively, tone reproduction may be based on histogram equalization. Traditional histogram equalization aims to give each luminance value equal probability of occurrence in the output image. Greg Ward refines this method in a manner that preserves contrast (Ward Larson, Rushmeier, & Piatko, 1997).
或者,色调再现可以基于直方图均衡。传统的直方图均衡旨在使每个亮度值在输出图像中出现的可能性相等。Greg Ward 以保留对比度的方式改进了这种方法(Ward Larson、Rushmeier 和 Piatko,1997 年)。
First, a histogram is computed from the luminances in the high dynamic range image. From this histogram, a cumulative histogram is computed such that each bin contains the number of pixels that have a luminance value less than or equal to the luminance value that the bin represents. The cumulative histogram is a monotonically increasing function. Plotting the values in each bin against the luminance values represented by each bin therefore yields a function which may be viewed as a luminance mapping function. Scaling this function, such that the vertical axis spans the range of the display device, yields a tone reproduction operator. This technique is called histogram equalization.
首先,根据高动态范围图像中的亮度计算直方图。根据该直方图,计算累积直方图,使得每个箱包含亮度值小于或等于该箱所代表的亮度值的像素数。累积直方图是单调递增函数。因此,将每个箱中的值与每个箱所代表的亮度值作图可得到一个可视为亮度映射函数的函数。缩放该函数,使得垂直轴跨越显示设备的范围,可得到色调再现算子。该技术称为直方图均衡化。
Ward further refined this method by ensuring that the gradient of this function never exceeds 1. This means, that if the difference between neighboring values in the cumulative histogram is too large, this difference is clamped to 1. This avoids the problem that small changes in luminance in the input may yield large differences in the output image. In other words, by limiting the gradient of the cumulative histogram to 1, contrast is never exaggerated. The resulting algorithm is called histogram adjustment (see Figure 20.29).
Ward 进一步完善了该方法,确保该函数的梯度永远不会超过 1。这意味着,如果累积直方图中相邻值之间的差异太大,则该差异将被限制为 1。这避免了输入亮度的微小变化可能导致输出图像产生较大差异的问题。换句话说,通过将累积直方图的梯度限制为 1,对比度永远不会被夸大。由此产生的算法称为直方图调整(见图20.29 )。
Figure 20.29. A linearly scaled image (left) and a histogram adjusted image (right). Image created with the kind permission of the Albin Polasek museum, Winter Park, Florida.
图 20.29。线性缩放图像(左)和直方图调整图像(右)。图片由佛罗里达州温特帕克 Albin Polasek 博物馆友情提供。
The tone reproduction operators discussed so far nearly all assume that the image represents a scene under photopic viewing conditions, i.e., as seen at normal light levels. For scotopic scenes, i.e., very dark scenes, the human visual system exhibits distinctly different behavior. In particular, perceived contrast is lower, visual acuity (i.e., the smallest detail that we can distinguish) is lower, and everything has a slightly blue appearance.
到目前为止讨论的色调再现运算符几乎都假设图像代表明视条件下的场景,即在正常光照水平下看到的场景。对于暗视场景,即非常暗的场景,人类视觉系统表现出明显不同的行为。特别是,感知对比度较低,视觉敏锐度(即我们能分辨的最小细节)较低,并且所有事物都略带蓝色。
To allow such images to be viewed correctly on monitors placed in photopic lighting conditions, we may preprocess the image such that it appears as if we were adapted to a very dark viewing environment. Such preprocessing frequently takes the form of a reduction in brightness and contrast, desaturation of the image, blue shift, and a reduction in visual acuity (Thompson, Shirley, & Ferwerda, 2002).
为了使此类图像能够在明视照明条件下的显示器上正确显示,我们可能会对图像进行预处理,使其看起来就像我们适应了非常暗的观看环境一样。此类预处理通常表现为降低亮度和对比度、降低图像饱和度、蓝移和降低视觉敏锐度(Thompson、Shirley 和 Ferwerda,2002 年)。
A typical approach starts by converting the image from RGB to XYZ. Then, scotopic luminance V may be computed for each pixel:
典型的方法首先将图像从 RGB 转换为 XYZ。然后,可以计算每个像素的暗视亮度V :
This single channel image may then be scaled and multiplied by an empirically chosen bluish gray. An example is shown in Figure 20.30. If some pixels are in the photopic range, then the night image may be created by linearly blending the bluish-gray image with the input image. The fraction to use for each pixel depends on V.
然后可以缩放此单通道图像并将其乘以经验选择的蓝灰色。图 20.30显示了一个例子。如果某些像素处于明视范围内,则可以通过将蓝灰色图像与输入图像线性混合来创建夜间图像。每个像素使用的分数取决于V 。
Figure 20.30. Simulated night scene using the image shown in Figure 20.12.
图 20.30.使用图 20.12所示的图像模拟夜景。
Loss of visual acuity may be modeled by low-pass filtering the night image, although this would give an incorrect sense of blurriness. A better approach is to apply a bilateral filter to retain sharp edges while blurring smaller details (Tomasi & Manduchi, 1998).
视力下降可以通过对夜间图像进行低通滤波来模拟,尽管这会给人一种错误的模糊感。更好的方法是应用双边滤波器来保留清晰的边缘,同时模糊较小的细节(Tomasi & Manduchi,1998)。
Finally, the color transfer technique outlined in Section 20.3 may also be used to transform a day-lit image into a night scene. The effectiveness of this approach depends on the availability of a suitable night image from which to transfer colors. As an example, the image in Figure 20.12 is transformed into a night image in Figure 20.31.
最后,第 20.3 节中概述的色彩转换技术也可用于将日光图像转换为夜景。此方法的有效性取决于是否有合适的夜景图像可供转换颜色。例如,图 20.12中的图像转换为图 20.31中的夜景图像。
Figure 20.31. The image on the left is used to transform the image of Figure 20.12 into a night scene, shown here on the right.
图 20.31.左侧的图像用于将图 20.12的图像转换为夜景,如右侧所示。
Since global illumination algorithms naturally produce high dynamic range images, direct display of the resulting images is not possible. Rather than resort to linear scaling or clamping, a tone reproduction operator should be used. Any tone reproduction operator is better than using no tone reproduction. Dependent on the requirements of the application, one of several operators may be suitable.
由于全局照明算法自然会产生高动态范围图像,因此无法直接显示生成的图像。应使用色调再现算子,而不是采用线性缩放或限制。任何色调再现算子都比不使用色调再现要好。根据应用的要求,几种算子中的一种可能合适。
For instance, real-time rendering applications should probably resort to a simple sigmoidal compression, since these are fast enough to also run in real time. In addition, their visual quality is often good enough. The histogram adjustment technique (Ward Larson et al., 1997) may also be fast enough for real-time operation.
例如,实时渲染应用程序可能应该采用简单的 S 形压缩,因为这些压缩速度足够快,也可以实时运行。此外,它们的视觉质量通常足够好。直方图调整技术(Ward Larson 等,1997)也可能足够快,可以进行实时操作。
For scenes containing a very high dynamic range, better compression may be achieved with a local operator. However, the computational cost is frequently substantially higher, leaving these operators suitable only for noninteractive applications. Among the fastest of the local operators is the bilateral filter due to the optimizations afforded by this technique (Durand & Dorsey, 2002).
对于包含非常高动态范围的场景,使用局部算子可以实现更好的压缩。但是,计算成本通常要高得多,因此这些算子仅适用于非交互式应用程序。由于这种技术提供的优化,最快的局部算子是双边滤波器(Durand & Dorsey,2002)。
This filter is interesting as a tone reproduction operator by itself, or it may be used to compute a local adaptation level for use in a sigmoidal compression function. In either case, the filter respects sharp contrast changes and smoothes over smaller contrasts. This is an important feature that helps minimize halo artifacts, which are a common problem with local operators.
此滤波器本身作为色调再现运算符非常有趣,或者可用于计算局部自适应级别,以用于 S 形压缩函数。无论哪种情况,滤波器都会考虑强烈的对比度变化并平滑较小的对比度。这是一个重要的功能,有助于最大限度地减少光晕伪影,这是局部运算符的常见问题。
An alternative approach to minimize halo artifacts is the scale selection mechanism used in the photographic tone reproduction operator (Reinhard et al., 2002), although this technique is slower to compute.
减少光晕伪影的另一种方法是使用摄影色调再现算子 (Reinhard et al., 2002) 中的比例选择机制,尽管这种技术的计算速度较慢。
In summary, while a large number of tone reproduction operators is currently available, only a small number of fundamentally different approaches exist. Fourier-domain and gradient-domain operators are both rooted in knowledge of image formation. Spatial-domain operators are either spatially variant (local) or global in nature. These operators are usually based on insights gained from studying the human visual system (and the visual system of many other species).
总之,虽然目前有大量的色调再现算子可用,但只有少数根本不同的方法存在。傅里叶域算子和梯度域算子都植根于图像形成的知识。空间域算子要么是空间变量(局部),要么是全局变量。这些算子通常基于研究人类视觉系统(以及许多其他物种的视觉系统)获得的见解。
Brian Wyvill
Implicit modeling (also known as implicit surfaces) in computer graphics covers many different methods for defining models. These include skeletal implicit modeling, offset surfaces, level sets, variational surfaces, and algebraic surfaces. In this chapter, we briefly touch on these methods and describe how to build skeletal implicit models in more detail. Curves can be defined by implicit equations of the form
计算机图形学中的隐式建模(也称为隐式曲面)涵盖了许多定义模型的不同方法。这些方法包括骨架隐式建模、偏移曲面、水平集、变分曲面和代数曲面。在本章中,我们简要介绍这些方法,并更详细地描述如何构建骨架隐式模型。曲线可以通过以下形式的隐式方程来定义
If we consider a closed curve, such as a circle, with radius r, then the implicit equation can be written as
如果我们考虑一个闭合曲线,比如一个圆,半径为r ,那么隐式方程可以写成
The value of f (x, y) can be positive (outside the circle), negative (inside the circle), or zero for points precisely on the circle. The equivalent in three dimensions is a closed surface around a set of points that occupy a given volume or region of space. The volume forms a scalar field, i.e., we can compute a value for every point and as can be seen for the circle, the negative values are bounded by the implicit curve or surface. The surface can be visualized as a contour in the field, connecting points with a particular value such as zero (see Equation (21.1)). To compute such a surface implies searching through space to find the points that satisfy the implicit equation; this method is unlikely to lead to an efficient algorithm for circle drawing (and even less likely in three dimensions). This was perhaps the reason that algorithmic methods for modeling with parametric curves and surfaces were investigated before implicit methods; however, there are some good reasons to develop algorithms to visualize implicit surfaces. In this chapter we explore the implications of deriving the data from a modeling process rather than from a scanner.
f ( x, y ) 的值可以是正数(在圆外)、负数(在圆内)或零(对于圆上的点)。三维空间中的等价物是围绕占据给定体积或空间区域的一组点的封闭曲面。体积形成标量场,即我们可以为每个点计算一个值,并且如圆所示,负值由隐式曲线或曲面界定。曲面可以可视化为场中的轮廓,连接具有特定值(例如零)的点(参见公式 (21.1))。计算这样的曲面意味着在空间中搜索以找到满足隐式方程的点;这种方法不太可能导致有效的圆绘制算法(在三维空间中更不可能)。这也许是使用参数曲线和曲面建模的算法方法在隐式方法之前被研究的原因;但是,开发可视化隐式曲面的算法有一些很好的理由。在本章中,我们将探讨从建模过程而不是从扫描仪中获取数据的含义。
Despite the computational overhead of finding the implicit surface, designing with implicit modeling techniques offers some advantages over other modeling methods. Many geometric operations are simplified using implicit methods including:
尽管寻找隐式曲面的计算开销很大,但使用隐式建模技术进行设计比其他建模方法具有一些优势。使用隐式方法可以简化许多几何操作,包括:
the definition of blends;
混合物的定义;
the standard set operations (union, intersection, difference, etc.) of constructive solid geometry (CSG);
构造立体几何(CSG)的标准集合运算(并集、交集、差集等);
functional composition with other implicit functions (e.g., R-functions, Barthe blends, Ricci blends, and warping);
与其他隐函数(例如 R 函数、Barthe 混合、Ricci 混合和扭曲)的函数组合;
inside/outside tests, (e.g., for collision detection).
内部/外部测试(例如,碰撞检测)。
Visualizing the surfaces can be done either by direct ray tracing using an algorithm as described in (Kalra & Barr, 1989; Mitchell, 1990; Hart & Baker, 1996; deGroot & Wyvill, 2005) or by first converting to polygons (Wyvill, McPheeters, & Wyvill, 1986).
可视化表面可以通过直接光线追踪使用如 (Kalra & Barr, 1989; Mitchell, 1990; Hart & Baker, 1996; deGroot & Wyvill, 2005) 中描述的算法来完成,或者可以通过先转换为多边形 (Wyvill, McPheeters, & Wyvill, 1986) 来完成。
One of the first methods was proposed by Ricci as far back as 1973 (Ricci, 1973), who also introduced CSG in the same paper. Jim Blinn’s algorithm for finding contours in electron density fields, known as Blobby molecules (J. Blinn, 1982), Nishimura’s Metaballs (Nishimura et al., 1985) and Wyvills’ Soft Objects (Wyvill et al., 1986) were all early examples of implicit modeling methods. Jim Blinn’s Blobby Man (see Figure 21.1) was the first rendering of a non-algebraic implicit model.
最早的方法之一是由 Ricci 于 1973 年提出的(Ricci, 1973),他还在同一篇论文中介绍了 CSG。Jim Blinn 的用于在电子密度场中寻找轮廓的算法(称为Blobby 分子)(J. Blinn, 1982)、Nishimura 的Metaballs (Nishimura 等人,1985)和 Wyvills 的Soft Objects (Wyvill 等人,1986)都是隐式建模方法的早期示例。Jim Blinn 的Blobby Man (见图21.1 )是第一个非代数隐式模型的渲染。
Figure 21.1. Blinn’s Blobby Man 1980. Image courtesy Jim Blinn.
图 21.1。 Blinn 的 Blobby Man 1980。图片由 Jim Blinn 提供。
In the context of modeling an implicit function is defined as a function f applied to a point yielding a scalar value .
在建模中,隐函数被定义为将函数f应用于某个点页∈ ℍ 3产生标量值∈ ℝ 。
The implicit function fi(x, y, z) may be split into a distance function di(x, y, z) and a fall-off filter function1gi(r), where r stands for the distance from the skeleton and the subscript refers to the ith skeletal element.
隐式函数 f( x, y, z ) 可以分解为距离函数 d( x, y, z ) 和衰减滤波函数1 g( r ),其中r代表与骨架的距离,下标表示第 i个骨架元素。
1 These functions have been given many names by researchers in the past, e.g., filter, potential, radial-basis, kernel, but we use fall-off filter as a simple term to describe their appearance.
1过去的研究人员曾给这些函数起过很多名字,例如,滤波器、势能、径向基、核,但我们使用衰减滤波器作为简单的术语来描述它们的外观。
We will use the following notation:
我们将使用以下符号:
A simple example is a point primitive, and we take the analogy of a star radiating heat into space. The field value (temperature in this example) may be measured at any point p and can be found by taking the distance from p to the center of the star and supplying the value to a fall-off filter function similar to one of those given in Figure 21.2. In these sample functions, the field is given a value of 1 at the center of the star; the value falls off with distance. The surface of a model may be derived from the implicit function f (x, y, z) as the points of space whose values are equal to some desired iso-value (iso); in the star example, a spherical shell for values of iso ∈ (0, 1).
一个简单的例子是点基元,我们以恒星向太空辐射热量来类比。可以在任意点p测量场值(本例中为温度),可以通过获取从p到恒星中心的距离并将该值提供给类似于图 21.2中给出的衰减滤波函数之一来找到。在这些示例函数中,场在恒星中心的值是 1;该值随着距离的增加而衰减。模型的表面可以从隐函数f ( x, y, z ) 中推导出来,即空间中的点,其值等于某个所需的iso 值(iso);在恒星的例子中,当 iso ∈ (0, 1) 时为球壳。
Figure 21.2. Fall-off filter functions (0 ≤ r ≤ 1). (a) Blinn’s Gaussian or “Blobby” function; (b) Nishimura’s “Metaball” function; (c) Wyvill et al.’s “soft objects” function; (d) the Wyvill function.
图 21.2。衰减滤波函数(0 ≤ r ≤ 1)。(a)Blinn 的高斯或“Blobby”函数;(b)Nishimura 的“Metaball”函数;(c)Wyvill 等人的“软物体”函数;(d)Wyvill 函数。
In general, filter functions ( gi) are chosen so that the field values are maximized on the skeleton and fall off to zero at some chosen distance from the skeleton. In the simple case where the resulting surfaces are blended together, the global field f (x, y, z) of an object, the implicit function, may be defined as
一般来说,选择过滤函数 (g) 时,场值在骨架上最大化,并在距骨架的某个选定距离处降至零。在将生成的表面混合在一起的简单情况下,对象的全局场f ( x, y, z )(隐函数)可以定义为
where n skeletal elements contribute to the resulting field value. An example is shown in Figure 21.3 in which the field at any point (x, y, z) is calculated as in Equation (21.3).
其中n 个骨架元素对最终的场值有贡献。图 21.3给出了一个示例,其中任意点 ( x, y, z ) 处的场按照公式 (21.3) 进行计算。
Figure 21.3. Each column shows two point primitives approaching each other. From left to right: the fall-off filter functions used are Blobby, Metaball, soft objects, and Wyvill. Image courtesy Erwin DeGroot.
图 21.3。每列显示两个点基元相互接近。从左到右:使用的衰减过滤函数是 Blobby、Metaball、软对象和 Wyvill。图片由 Erwin DeGroot 提供。
In this case, two point primitives are placed in close proximity. As the two points are brought together, the surfaces bulge and then blend together. The term filter function is used because the function causes the primitives to be blurred together somewhat akin to a filter function for images. The summation blend is the most compact and efficient blending operation that can be applied to implicit surfaces (see Equation (21.3)).
在这种情况下,两个点图元被放置在非常接近的位置。当两个点被放在一起时,表面会凸起然后混合在一起。使用术语过滤函数是因为该函数导致图元模糊在一起,有点类似于图像的过滤函数。求和混合是可以应用于隐式表面的最紧凑和最有效的混合操作(参见公式 (21.3))。
One advantage of using filter functions with finite support is that primitives that are far from p will have zero contribution and thus need not be considered (Wyvill et al., 1986).
使用具有有限支持的过滤函数的一个优点是,远离p的基元将具有零贡献,因此无需考虑(Wyvill 等,1986)。
The most basic form of continuity is C0 continuity, which ensures that there are no “jumps” in a function. Higher-order continuity is defined in terms of derivatives of functions (see Chapter 15).
连续性的最基本形式是C 0连续性,它确保函数中没有“跳跃”。高阶连续性是根据函数的导数来定义的(参见第 15 章)。
In the case of a 3D scalar field f , the first derivative is a vector function known as the gradient, written ▽f and defined as
对于三维标量场f ,一阶导数是一个矢量函数,称为梯度,写为 ▽ f ,定义为
If ▽f is defined at all points, and the three one-dimensional partial derivatives are each C0, then f is C1. Informally, C1 surface continuity means that the surface normal varies smoothly over the surface. The surface normal is the unit vector perpendicular to the surface. If no unique surface normal can be defined on the edge of a cube, for example, then the surface is not C1. For points on an implicit surface, the surface normal can be computed by normalizing the gradient vector ▽f . In the example of the circle, points inside have a negative value and those on the outside have a positive one. For many types of implicit surfaces, the sense of inside and outside is inverted, and since the normal vector must always point outward, it can be opposite to the gradient direction.
如果 ▽ f在所有点上都有定义,且三个一维偏导数均为C 0 ,则f为C 1 。通俗地说, C 1曲面连续性意味着曲面法线在曲面上平滑变化。曲面法线是垂直于曲面的单位向量。例如,如果在立方体的边缘无法定义唯一的曲面法线,则该曲面不是C 1 。对于隐式曲面上的点,可以通过对梯度向量 ▽ f进行归一化来计算曲面法线。在圆的例子中,内部的点具有负值,而外部的点具有正值。对于许多类型的隐式曲面,内部和外部的意义是相反的,并且由于法线向量必须始终指向外,因此它可以与梯度方向相反。
Skeletal implicit primitives are created by applying a fall-off filter function to an unsigned distance field as in Equation (21.2). Although the distance field is never C1 at the skeleton, these discontinuities can be removed by using a suitable fall-off function (Akleman & Chen, 1999). If an operator, g, combines implicit functions, f1 and f2, where all points are C1, then g(f1,f2) is not necessarily C1. For example, it is possible to make a sharp CSG junction using the min and max operators. The combination is not C1 continuous because the min and max operators don’t have that property (see Section 21.5).
骨架隐式基元是通过将衰减过滤函数应用于无符号距离场来创建的,如公式 (21.2) 所示。尽管骨架处的距离场永远不会是C 1 ,但可以使用合适的衰减函数消除这些不连续性(Akleman & Chen,1999)。如果运算符g组合了隐式函数f 1和f 2 ,其中所有点都是C 1 ,则g ( f 1 ,f 2 ) 不一定是C 1 。例如,可以使用 min 和 max 运算符制作尖锐的 CSG 连接。该组合不是C 1连续的,因为 min 和 max 运算符不具备该属性(参见第 21.5 节)。
The analysis of operators is complicated by the fact that it is sometimes desirable to create a C1 discontinuity. This case occurs whenever a crease in the surface is desired. For example, a cube is not C1 because tangent discontinuities occur at each edge. To create creases using C1 primitives, the operator must introduce C1 discontinuities, and hence cannot be C1 itself.
由于有时需要创建C 1不连续性,因此运算符的分析变得复杂。每当需要在表面上产生折痕时就会发生这种情况。例如,立方体不是C 1 ,因为每个边缘都会发生切线不连续性。要使用C 1基元创建折痕,运算符必须引入C 1不连续性,因此不能是C 1本身。
The distance field is defined with respect to some geometric object T:
距离场是相对于某个几何对象T定义的:
Visually, F(T, p) is the shortest distance from p to T.Hence, when p lies on T, F(T, p) = 0 and the surface created by the implicit function is the object T. Outside of T, a nonzero distance is returned. The function T can be any geometric entity embedded in 3D—a point, curve, surface, or solid. Procedural modeling with distance fields started with Ricci (Ricci, 1973); R-functions (Rvachev, 1963) were first applied to shape modeling more than 20 years later (see (Shapiro, 1994) and (A. Pasko, Adzhiev, Sourin, & Savchenko, 1995)).
从视觉上看, F ( T , p ) 是从p到T的最短距离。因此,当p位于T上时, F ( T , p ) = 0,且隐式函数创建的表面为对象T。在T之外,将返回非零距离。函数T可以是任何嵌入 3D 中的几何实体 — 点、曲线、曲面或立体。使用距离场进行程序建模始于 Ricci (Ricci, 1973); R 函数(Rvachev, 1963) 在 20 多年后首次应用于形状建模(参见 (Shapiro, 1994) 和 (A. Pasko, Adzhiev, Sourin, & Savchenko, 1995))。
An R-function or Rvachev function is a function whose sign can change if and only if the sign of one of its arguments changes; that is, its sign is determined solely by its arguments. R-functions provide a robust theoretical framework for boolean composition of real functions, permitting the construction of Cn CSG operators (Shapiro, 1988). These CSG operators can be used to create blending operators simply by adding a fixed offset to the result (A. Pasko et al., 1995). Although these blending functions are no longer technically R-functions, they have most of the desirable properties and can be mixed freely with R-functions to create complex hierarchical models (Shapiro, 1988). These R-function-based blending and CSG operators are referred to as R-operators (see Section 21.4). The Hyperfun system (Adzhiev et al., 1999) is based on F-reps (function representation), another name for an implicit surface. The system uses a procedural C-like language to describe many types of implicit surfaces.
R 函数或 Rvachev 函数是一种函数,当且仅当其参数之一的符号发生变化时,其符号才会发生变化;也就是说,其符号仅由其参数决定。R 函数为实函数的布尔组合提供了一个强大的理论框架,允许构造C n CSG 运算符(Shapiro,1988 年)。这些 CSG 运算符可用于创建混合运算符,只需在结果中添加一个固定偏移量即可(A. Pasko 等人,1995 年)。虽然这些混合函数在技术上不再是 R 函数,但它们具有大多数理想的属性,并且可以与 R 函数自由混合以创建复杂的分层模型(Shapiro,1988 年)。这些基于 R 函数的混合和 CSG 运算符称为R 运算符(参见第 21.4 节)。Hyperfun 系统(Adzhiev 等人,1999 年)基于F-reps (函数表示),这是隐式曲面的另一个名称。该系统使用类似于 C 的过程语言来描述多种类型的隐式曲面。
It is useful to represent an implicit field discretely via a regular grid (Barthe, Mora, Dodgson, & Sabin, 2002) or an adaptive grid (Frisken, Perry, Rockwood, & Jones, 2000). This is exactly what the polygonization algorithm does in the case of level sets; moreover, the grid can be used for various other purposes besides building polygons. Discrete representations of f are commonly obtained by sampling a continuous function at regular intervals. For example, the sampled function may be defined by other volume model representations (V. V. Savchenko, Pasko, Sourin, & Kunii, 1998). The data may also be a physical object sampled using three-dimensional imaging techniques. Discrete volume data has most often been used in conjunction with the level sets method (Osher & Sethian, 1988), which defines a means for dynamically modifying the data structure using curvature-dependent speed functions. Interactive modeling environments based on level sets have been defined (Museth, Breen, Whitaker, & Barr, 2002), although level sets are only one method employing a discrete representation of the implicit field. Methods for interactively defining discrete representations using standard implicit surfaces techniques have also been explored (Baerentzen & Christensen, 2002).
通过规则网格 (Barthe、Mora、Dodgson 和 Sabin,2002) 或自适应网格 (Frisken、Perry、Rockwood 和 Jones,2000) 离散地表示隐式场很有用。这正是多边形化算法在水平集的情况下所做的;此外,除了构建多边形之外,网格还可用于各种其他目的。f 的离散表示通常是通过以规则间隔对连续函数进行采样而获得的。例如,采样函数可以由其他体积模型表示 (VV Savchenko、Pasko、Sourin 和 Kunii,1998) 定义。数据也可以是使用三维成像技术采样的物理对象。离散体积数据最常与水平集方法 (Osher 和 Sethian,1988) 结合使用,该方法定义了一种使用曲率相关速度函数动态修改数据结构的方法。虽然水平集只是采用隐式场离散表示的一种方法,但基于水平集的交互式建模环境已经得到定义(Museth、Breen、Whitaker 和 Barr,2002 年)。还探索了使用标准隐式表面技术交互式定义离散表示的方法(Baerentzen 和 Christensen,2002 年)。
A key advantage to employing a discrete data structure is its ability to act as a unifying approach for all of the various volume models defined by potential fields (discrete or not) (V. V. Savchenko et al., 1998). The conversion of any continuous function to a discrete representation introduces the problem of how to reconstruct a continuous function, needed for the combined purposes of additional modeling operations and visualization of the resulting potential field. A well-known solution to this problem is to apply a filter g using the convolution operator (see Chapter 9). The choice of a filter is guided by the desired properties of the reconstruction, and many filters have been explored (Marschner & Lobb, 1994). The salient point is that there is typically a tradeoff between the efficiency of the chosen filter and the smoothness of the resulting reconstruction; see also Section 21.9.
采用离散数据结构的一个关键优势是它能够作为所有由势场(离散或非离散)定义的各种体积模型的统一方法(VV Savchenko 等人,1998)。将任何连续函数转换为离散表示都会引入如何重建连续函数的问题,这对于其他建模操作和可视化所得势场的综合目的必不可少。该问题的一个众所周知的解决方案是使用卷积算子应用滤波器g (参见第 9 章)。滤波器的选择取决于重建的所需属性,并且已经探索了许多滤波器(Marschner & Lobb,1994)。突出的一点是,所选滤波器的效率和所得重建的平滑度之间通常存在权衡;另请参见第 21.9 节。
To be interactive, a discrete system must restrict the size of the grid relative to the available computing power. This, in turn, limits the ability of the modeler to include high-frequency details. Additionally, the smoothing triquadratic filter makes it impossible to include sharp edges, should they be desired. A partial solution to this problem is the use of adaptive grids, although with any discrete representation there will be limitations. A discrete grid is used in (Schmidt, Wyvill, & Galin, 2005) to act as a cache representing a BlobTree node. The grid in this work is used for fast prototyping and uses trilinear interpolation for position and the slower, more accurate triquadratic interpolation to calculate gradient values, because the eye is more discerning in observing gradient errors than position errors.
为了实现交互,离散系统必须限制网格相对于可用计算能力的大小。这反过来又限制了建模者包含高频细节的能力。此外,平滑三二次滤波器使得包含尖锐边缘(如果需要)成为不可能。此问题的部分解决方案是使用自适应网格,尽管任何离散表示都会有局限性。在(Schmidt、Wyvill 和 Galin,2005)中,离散网格用作表示BlobTree节点的缓存。本文中的网格用于快速原型设计,并使用三线性插值进行定位,使用速度较慢但更准确的三二次插值来计算梯度值,因为眼睛在观察梯度误差方面比观察位置误差更敏锐。
It is often required to convert sampled data to an implicit representation. Variational implicit surfaces interpolate or approximate a set of points using a weighted sum of globally supported basis functions (V. Savchenko, Pasko, Okunev, & Kunii, 1995; Turk & O’Brien, 1999; Carr et al., 2001; Turk & O’Brien, 2002). These radially symmetric basis functions are applied at each sample point. The continuity of such a surface depends on the choice of basis function. The C2 thin-plate spline is most commonly used (Turk & O’Brien, 2002; Carr et al., 2001). Like Blinn’s exponential function (see Figure 21.2), this function is unbounded as is the resulting variational implicit surface.
通常需要将采样数据转换为隐式表示。变分隐式曲面使用全局支持的基函数的加权和来插值或近似一组点 (V. Savchenko、Pasko、Okunev 和 Kunii,1995 年;Turk 和 O'Brien,1999 年;Carr 等人,2001 年;Turk 和 O'Brien,2002 年)。这些径向对称的基函数应用于每个采样点。这种曲面的连续性取决于基函数的选择。最常用的是C 2薄板样条 (Turk 和 O'Brien,2002 年;Carr 等人,2001 年)。与 Blinn 的指数函数(见图21.2 )一样,此函数和得到的变分隐式曲面一样无界。
If the field is is globally C2, creases cannot be defined;2 however, anisotropic basis functions can be used to produce fields which change more rapidly and may appear to have creases (Dinh, Slabaugh, & Turk, 2001). At the appropriate scale, the surface is still smooth. The smooth field implies that self-intersections do not occur, and hence volumes are always well-defined. The thin-plate spline guarantees that global curvature is minimized (Duchon, 1977). Variational interpolation has many properties which are desirable for 3D modeling; however, controlling the resulting surfaces can be difficult.
如果场是全局C 2 ,则无法定义折痕; 2但是,可以使用各向异性基函数来生成变化更快且可能看起来有折痕的场(Dinh、Slabaugh 和 Turk,2001)。在适当的尺度下,表面仍然是光滑的。光滑场意味着不会发生自相交,因此体积总是定义明确的。薄板样条保证全局曲率最小化(Duchon,1977)。变分插值具有许多适合 3D 建模的属性;但是,控制生成的表面可能很困难。
2 Except see Section 15.2.
2除第 15.2 节外。
Variational implicit surfaces can also be based on compactly supported radial basis functions (CS-RBFs) to reduce the computational cost of variational interpolation techniques (Morse, Yoo, Rheingans, Chen, & Subramanian, 2001). Each CS-RBF only influences a local region, so computing f (p) requires only evaluation of basis functions within some small neighborhood of p. As with the globally supported counterpart, the resulting field is Ck, creases are not supported, and self-intersections cannot occur.3 The local support of each basis function results in a bounded global field. This also guarantees that additional iso-contours will be present, as noted by various researchers (Ohtake, Belyaev, & Pasko, 2003; Reuter, 2003).
变分隐式曲面还可以基于紧支撑径向基函数 (CS-RBF),以降低变分插值技术的计算成本 (Morse、Yoo、Rheingans、Chen 和 Subramanian,2001)。每个 CS-RBF 仅影响局部区域,因此计算f ( p ) 只需要评估p某个小邻域内的基函数。与全局支持的对应项一样,结果场为C k ,不支持折痕,并且不会发生自相交。3每个基函数的局部支持都会产生一个有界的全局场。这也保证了将存在额外的等高线,正如多位研究人员所指出的那样 (Ohtake、Belyaev 和 Pasko,2003;Reuter,2003)。
Convolution surfaces, introduced by Bloomenthal and Shoemake (Bloomenthal & Shoemake, 1991) are produced by convolving a geometric skeleton S with a kernel function h. Hence, the value at any position in space is defined by an integral over the skeleton:
卷积曲面由 Bloomenthal 和 Shoemake (Bloomenthal & Shoemake, 1991) 提出,它是通过将几何骨架S与核函数h进行卷积而生成的。因此,空间中任何位置的值都由骨架上的积分定义:
Any finitely supported function can be used as h; see (Sherstyuk, 1999) for a detailed analysis of different kernels.
任何有限支撑函数都可以用作h ;有关不同内核的详细分析,请参阅(Sherstyuk, 1999)。
Like skeletal primitives, convolution surfaces have bounded fields. Blinn’s “Blobby molecules” is the simplest form of a convolution surface (J. Blinn, 1982); in this case, the skeleton consists of points only. This idea was extended by Bloomenthal to include line, arc, triangle, and polygon skeletons (Bloomenthal & Shoemake, 1991). These represent 1D and 2D primitives; 3D primitives were later described by Bloomenthal (Bloomenthal, 1995).
与骨架基元一样,卷积曲面也有有界域。Blinn 的“Blobby 分子”是最简单的卷积曲面形式(J. Blinn,1982 年);在这种情况下,骨架仅由点组成。Bloomenthal 将这个想法扩展为包括线、弧、三角形和多边形骨架(Bloomenthal & Shoemake,1991 年)。这些代表 1 D和 2 D基元;Bloomenthal 后来描述了 3 D基元(Bloomenthal,1995 年)。
Combination of convolution surfaces is defined by composition of the underlying geometric skeletons and has the advantage of eliminating the bulges that tend to occur when composing multiple skeletal primitives with additive blending. The surface resulting from convolution of the combined skeleton does not have bulges, as in Figure 21.4, and the field is continuous even if the combined skeleton is nonconvex. Convolution surfaces are offset a fixed distance from convex portions of a skeleton, but produce a fillet along concave portions of a skeleton.
卷积曲面的组合由底层几何骨架的组合定义,其优点是可以消除使用加法混合组合多个骨架基元时容易出现的凸起。组合骨架的卷积产生的曲面没有凸起,如图 21.4所示,即使组合骨架是非凸的,场也是连续的。卷积曲面与骨架的凸起部分偏移固定距离,但沿骨架的凹陷部分产生圆角。
3 Note, k > 0 depending on the RBF (see Section 15.2).
3注意, k > 0 取决于 RBF(参见第 15.2 节)。
Figure 21.4. Two blended cylinders. Left: summation blend; right: convolution surface with barely discernible bulge (Bloomenthal, 1997). Image courtesy Erwin DeGroot.
图 21.4。两个混合圆柱体。左图:求和混合;右图:几乎看不清凸起的卷积曲面(Bloomenthal,1997 年)。图片由 Erwin DeGroot 提供。
An example of skeletal elements convolved to build a complex model is shown in Figure 21.5. The hand model contains fourteen primitives.
图 21.5显示了卷积骨骼元素构建复杂模型的示例。手部模型包含 14 个基元。
Figure 21.5. Skeletal elements convolved to build a hand model. Image courtesy Jules Bloomenthal.
图 21.5。卷积骨骼元素以构建手模型。图片由 Jules Bloomenthal 提供。
As we will see in the following sections rendering the implicit models requires finding the field value and gradient for a large number of points. We need the distance to supply to Equation (21.2) and the gradient is useful for root finding as well as lighting calculations. Supplying the distance to the fall-off filter functions of Figure 21.2 is a matter of calculating the nearest distance to the skeletal primitive, simple for point primitives but a little trickier for more complex geometrical shapes. A line segment primitive ( AB) can be defined as a cylinder around a line with hemispherical end caps (see Figure 21.6). Point P0 lies on the surface where f (P0) = iso and f (P1) = 0 since it lies outside of the influence of the line primitive. The distance from some Pi to the line is found by simply projecting onto the line AB and calculating the perpendicular distance, e.g., |CP0|; this can be found from AC, since A, P0, and B, are all known:
正如我们将在以下部分中看到的那样,渲染隐式模型需要找到大量点的场值和梯度。我们需要距离来提供给方程 (21.2),而梯度对于根查找以及照明计算都很有用。将距离提供给图 21.2的衰减过滤函数就是计算到骨架图元的最近距离,对于点图元来说很简单,但对于更复杂的几何形状来说有点棘手。线段图元 ( AB ) 可以定义为围绕线的圆柱体,端盖为半球形(参见图 21.6 )。点P 0位于f ( P 0 ) = iso 和f ( P 1 ) = 0 的表面上,因为它位于线图元的影响之外。从某个 P 到线的距离可以通过简单地投影到线AB上并计算垂直距离(例如|CP 0 |)来找到;这可以从AC中找到,因为A、P 0和B都是已知的:
Figure 21.6. Line primitive ab and example points p0, p1, p2 showing distance calculation.
图 21.6.线图元ab和示例点p 0 、 p 1 、 p 2显示距离计算。
In Figure 21.6, the field value of P2> 0,since P2 is in the hemispherical endcap, which can be checked separately. Variations of this idea can define primitives with endcaps of different radii producing interesting cone shapes. An example is shown in Figure 21.7.
在图 21.6中, P 2的字段值为> 0,因为P 2位于半球形端盖中,因此可以单独检查。此想法的变体可以定义具有不同半径端盖的基元,从而产生有趣的圆锥形状。图 21.7显示了一个例子。
Figure 21.7. Cylinder primitive blended with a sphere. Image courtesy Erwin DeG-root.
图 21.7。圆柱体图元与球体混合。图片由 Erwin DeG-root 提供。
A great variety of geometrical skeletons have been described, and, in principle, it is simply a matter of defining the distance to the skeleton from some point p and also the gradient at p. For example, an offset surface of a triangle can be defined from the vertices of the triangle and a radius r. A simple way to implement this is to use line segment primitives to describe bounding cylinders connecting the vertices (radius r). The distance from a point q within the triangle that does not fall within the bounding fields of one of the line segment primitives is returned as the perpendicular distance to the plane of the triangle. Other examples include an implicit disk, defined by a circle and a thickness parameter, a torus also defined by a circle and the radius of the cross section (or inner and outer circle radii), a circular cone from a disk and a height, a cube with rounded corners, etc. (see Figure 21.8).
人们描述了各种各样的几何骨架,原则上,这只是定义从某个点p到骨架的距离以及p处的梯度的问题。例如,可以从三角形的顶点和半径r定义三角形的偏移表面。实现这一点的一种简单方法是使用线段基元来描述连接顶点(半径为r )的边界圆柱。从三角形内不属于某个线段基元边界场的点q到三角形平面的距离将作为垂直距离返回。其他示例包括由圆和厚度参数定义的隐式圆盘、也由圆和横截面半径(或内圆和外圆半径)定义的圆环、由圆盘和高度定义的圆锥、具有圆角的立方体等(见图21.8 )。
Figure 21.8. Implicit models from various skeletal primitives. Image courtesy ErwinDeGroot.
图 21.8。来自各种骨骼基元的隐式模型。图片由 ErwinDeGroot 提供。
Modeling methods, such as parametric surfaces, lend themselves to visualization, since it is easy to iterate over points on the surface that can be found directly from the defining equations; for example (x, y) = (cosθ, sinθ), θ∈ [0, 2π) produces a circle.
参数曲面等建模方法有利于可视化,因为它很容易迭代曲面上的点,而这些点可以直接从定义方程中找到;例如 ( x, y ) = (cos θ, sinθ) , θ ∈ [0, 2π) 可以得到一个圆。
There are two techniques that are commonly used to render implicit surfaces: ray tracing and surface tiling. In practice, a designer wants to visualize an implicit surface model quickly, sacrificing quality for speed for interaction purposes. Prototyping algorithms have been concerned with producing a polygon mesh that can be rendered in real time on modern workstations. Finding the polygonal mesh which best approximates the desired surface is referred to as polygonization or surface tiling. For animation or for a final visualization, where quality is preferred over speed, ray tracing implicit surfaces directly without first polygonizing produces excellent results.
有两种常用的技术来渲染隐式表面:光线追踪和表面平铺。实际上,设计师希望快速可视化隐式表面模型,为了交互目的而牺牲质量换取速度。原型算法一直致力于生成可在现代工作站上实时渲染的多边形网格。找到最接近所需表面的多边形网格称为多边形化或表面平铺。对于动画或最终可视化,质量比速度更重要,直接光线追踪隐式表面而无需先进行多边形化即可产生出色的效果。
Figure 21.9. A ray-traced dinosaur model showing the underlying skeletal primitives. Image courtesy Erwin DeG-root.
图 21.9。光线追踪的恐龙模型显示了底层的骨骼原件。图片由 Erwin DeG-root 提供。
As previously mentioned, finding an implicit surface requires searching through space to find the points that satisfy, f (p) = 0. There are two main approaches to executing such a search: space partitioning—partitioning space into manageable units such as cubes, and non-space partitioning, e.g., marching triangles (Hartmann, 1998; Akkouche & Galin, 2001) and the shrinkwrap algorithm (van Overveld & Wyvill, 2004).
如前所述,寻找隐式曲面需要在空间中搜索以找到满足f ( p ) = 0 的点。执行此类搜索主要有两种方法:空间分区 - 将空间划分为可管理的单元(例如立方体)和非空间分区,例如行进三角形(Hartmann,1998;Akkouche & Galin,2001)和收缩包裹算法(van Overveld & Wyvill,2004)。
In this chapter, we describe the original space partitioning algorithm and leave it to the reader to explore the more advanced methods. This algorithm together with postprocessing for mesh refinement (see Chapter 12) and caching provide a method for interactive viewing of implicit models on modern workstations.
在本章中,我们描述了原始的空间分割算法,并让读者探索更高级的方法。该算法与网格细化的后处理(参见第 12 章)和缓存相结合,提供了一种在现代工作站上交互式查看隐式模型的方法。
The basic cubic space partitioning algorithm for tiling implicit surfaces was first published in (Wyvill et al., 1986) and a similar algorithm oriented toward volume visualization, called marching cubes in (Lorensen & Cline, 1987). Since then there have been many refinements and extensions.
用于平铺隐式曲面的基本立方空间分割算法首次发表于 (Wyvill et al., 1986),而面向体积可视化的类似算法,称为行进立方体 (Lorensen & Cline, 1987)。从那时起,已经进行了许多改进和扩展。
A first approach to finding the implicit surface might be to subdivide space uniformly into a regular lattice of cubic cells and calculate a value for every vertex. Each cell is replaced with a set of polygons that best approximates the part of the surface contained within that cell. The problem with this method is that many of the cells will be completely outside or completely inside the volume; thus, many cells that contain no part of the surface are processed. For large grids of data this can be very time consuming and memory intensive.
寻找隐式表面的第一种方法可能是将空间均匀地细分为立方体单元的规则格子,并计算每个顶点的值。每个单元都被一组多边形替换,这些多边形最接近该单元内所含的表面部分。这种方法的问题是许多单元将完全在体积之外或完全在体积之内;因此,许多不包含表面部分的单元都会被处理。对于大型数据网格,这可能非常耗时且占用大量内存。
To avoid storing the whole grid, a hash table is used to store only the cubes that contain a piece of the surface, based on the data structures used in (Wyvill et al., 1986). Working software was published in Graphics Gems IV (Bloomenthal, 1990). The algorithm is based on numerical continuation; it starts with a seed cube that intersects part of the surface and builds neighboring cubes as necessary to follow the surface.
为了避免存储整个网格,使用哈希表仅存储包含表面一部分的立方体,这基于 (Wyvill et al., 1986) 中使用的数据结构。工作软件发表在Graphics Gems IV (Bloomenthal, 1990) 中。该算法基于数值连续性;它从与表面部分相交的种子立方体开始,并根据需要构建相邻立方体以跟随表面。
The algorithm has two parts. In the first part, cubic cells are found that contain the surface and in the second part, each cube is replaced by triangles. The first part of the algorithm is driven by a queue of cubes, each of which contains part of the surface; the second part of the algorithm is table-driven.
该算法分为两部分。第一部分是找到包含表面的立方体单元,第二部分是每个立方体被三角形取代。算法的第一部分由立方体队列驱动,每个立方体包含表面的一部分;算法的第二部分由表格驱动。
A fast overview of the algorithm is as follows:
该算法的快速概述如下:
divide space into cubic voxels;
将空间划分为立方体素;
search for surface, starting from a skeletal element;
从骨架元素开始搜索表面;
add voxel to queue, mark it visited;
将体素添加到队列,标记为已访问;
search neighbors;
搜索邻居;
when done, replace voxel with polygons.
完成后,用多边形替换体素。
First, space is subdivided into a cubic lattice, and the next task is to find a seed cube containing part of the surface. A cube vertex vi inside the surface will have a field value vi>= iso and a vertex outside the surface will have a field value vi< iso; thus, an edge with one of each type of vertex will intersect the surface. We call this an intersecting edge. The field value at the nearest cube vertex to the first primitive can be evaluated by summing the contributions of the primitives as per Equation (21.3), although other operators can also be used as will be seen later. We will assume that f (v0)> iso, which indicates that v0 lies within the solid. The value of iso is chosen by the user; an example is iso = 0.5 when using the soft fall-off function, which has some symmetry properties that lead to nice blending (see Figure 21.3). The vertices along one axis are evaluated in turn until a value vi< iso is found. The cube containing the intersecting edge is the seed cube.
首先,将空间细分为立方格子,下一个任务是找到一个包含部分曲面的种子立方体。曲面内的立方体顶点 v 的域值 v>= iso,曲面外的顶点的域值 v< iso;因此,与曲面相交的边是每种类型顶点各一个的边。我们称之为相交边。距离第一个图元最近的立方体顶点的域值可以通过根据公式 (21.3) 对图元的贡献求和来计算,尽管也可以使用其他运算符,稍后会看到。我们假设f ( v0 ) > iso,它表示v0位于固体内。iso的值由用户选择;一个例子是使用软衰减函数时 iso = 05,它具有一些对称性质,可以实现很好的混合(参见图 21.3 )。依次评估沿一个轴的顶点,直到找到值 v < iso。包含相交边的立方体是种子立方体。
The neighbors of the seed cube are examined, and those that contain at least one intersecting edge are added to the queue ready for processing. To process a cube, we examine each face. If any of the bounding edges have oppositely signed vertices, the surface will pass through that face and the face neighbor must be processed. When this process has been completed for all the faces, the second phase of the algorithm is applied to the cube. If the surface is closed, eventually a cube will be revisited and no more unmarked neighbors found, and the search algorithm will terminate. Processing a cube involves marking it as processed and processing its unmarked neighbors. Those that contain intersecting edges are processed until the entire surface has been covered (see Figure 21.10).
检查种子立方体的邻居,并将那些包含至少一条相交边的邻居添加到准备处理的队列中。要处理立方体,我们检查每个面。如果任何边界边具有相反符号的顶点,则表面将穿过该面,并且必须处理面邻居。当所有面的这个过程完成后,算法的第二阶段将应用于立方体。如果表面是封闭的,最终将重新访问立方体并且不再找到未标记的邻居,搜索算法将终止。处理立方体包括将其标记为已处理并处理其未标记的邻居。处理那些包含相交边的立方体,直到覆盖整个表面(参见图 21.10 )。
Figure 21.10. A section through the cubic lattice. The + sign indicates a vertex inside the surface ( f ( vi ≥ iso) and - is outside f ( vi< iso).
图 21.10。立方晶格的截面。 +号表示顶点位于表面 ( f ( v ≥ iso) 内, -表示顶点位于f ( vi< iso) 外。
Each cube is indexed by an identifying vertex which we define to be the lower-left far corner (i.e., the vertex with the lowest (x, y, z)-coordinate values (see Figure 21.11)). For each vertex that is inside the surface, the corresponding bit will be set to form the address in an 8-bit table (see Figure 21.11 and Section 21.3.3).
每个立方体都由一个标识顶点索引,我们将其定义为左下角(即具有最低( x,y,z )坐标值的顶点(参见图 21.11 ))。对于表面内的每个顶点,将设置相应的位以形成 8 位表中的地址(参见图 21.11和第 21.3.3 节)。
Figure 21.11. Vertex numbering.
图 21.11.顶点编号。
The identifying vertex is addressed by integers i, j, k, computed from the (x, y, z)-coordinate location of the cube such that x = side * i, etc., where side is the size of the cube. The identifying vertex of each cube may appear in as many as eight other cubes, and it would be inefficient to store these vertices more than once. Thus, the vertices are stored uniquely in a chained hash table. Since most of the space does not contain any part of the surface, only those cubes that are visited will be stored. The implicit function value is found for each vertex as it is stored in the hash table.
标识顶点通过整数i、j、k来寻址,这些整数是根据立方体的 ( x, y, z ) 坐标位置计算得出的,即x = side * i等,其中 side 是立方体的大小。每个立方体的标识顶点可能出现在多达八个其他立方体中,并且多次存储这些顶点效率很低。因此,顶点以唯一方式存储在链式哈希表中。由于大多数空间不包含表面的任何部分,因此只会存储那些被访问的立方体。每个顶点的隐式函数值都是在哈希表中存储时找到的。
Nothing is known about the topology of the surface so a search must be started from every primitive to avoid any disconnected parts of the surface being missed. A scalar can be used to scale the influence of a primitive. If the scalar can be less than zero, then it is possible to search along an axis without finding an intersecting edge. In this case, a more sophisticated search must be done to find a seed cube (Galin & Akkouche, 1999).
由于对曲面的拓扑结构一无所知,因此必须从每个图元开始搜索,以避免遗漏曲面的任何不连续部分。标量可用于缩放图元的影响。如果标量小于零,则可以沿轴搜索而不会找到相交边。在这种情况下,必须进行更复杂的搜索才能找到种子立方体(Galin & Akkouche,1999)。
The hash table entry holds five values:
哈希表条目包含五个值:
the i, j, k lattice indices of the identifying vertex (see Figure 21.11);
识别顶点的i, j, k格子索引(见图21.11 );
f , the implicit function value of the identifying vertex;
f ,识别顶点的隐函数值;
Boolean to indicate whether this cube has been visited.
布尔值表示该立方体是否已被访问过。
The hash function computes an address in the hash table by selecting a few bits out of each of i, j, k and combining them arithmetically. For example, the five least significant bits produces a 15-bit address for a table, which must have a length of 215. Such a hash function can be neatly implemented in the C-preprocessor as follows:
哈希函数通过从i、j、k中选择几个位并对其进行算术组合来计算哈希表中的地址。例如,五个最低有效位为表生成一个 15 位地址,该表的长度必须为 2 15 。这样的哈希函数可以在 C 预处理器中巧妙地实现如下:
#define NBITS 5
#define BMASK 037
#define HASH(a,b,c) (((a&BMASK)<<NBITS|b&BMASK)
<<NBITS|c&BMASK)
#define HSIZE 1<<NBITS*3
The queue (FIFO list) is used as temporary storage to identify the neighbors for processing. The algorithm begins with a seed cube that is marked as visited and placed on the queue. The first cube on the queue is dequeued and all its unvisited neighbors are added to the queue. Each cube is processed and passed to the second phase of the algorithm if it contains part of the surface. The queue is then processed until empty.
队列(FIFO 列表)用作临时存储,以识别要处理的邻居。算法从标记为已访问并放置在队列中的种子立方体开始。队列中的第一个立方体出队,其所有未访问的邻居都添加到队列中。如果每个立方体包含部分表面,则处理每个立方体并将其传递到算法的第二阶段。然后处理队列,直到队列为空。
The second phase of the algorithm treats each cube independently. The cell is replaced by a set of triangles that best matches the shape of the part of the surface that passes through the cell. The algorithm must decide how to polygonize the cell given the implicit function values at each vertex. These values will be positive or negative (i.e., less than or greater than the iso-value), giving 256 combinations of positive or negative vertices for the eight vertices of the cube. A table of 256 entries provides the right vertices to use in each triangle (Figure 21.12). For example, entry 4(00000100) points to a second table that records the vertices that bound the intersecting edges. In this example, vertex number 2 is inside the surface ( f (V 2)>= iso) and, therefore, we wish to draw a triangle that connects the points on the surface that intersect with edges bounded by ( V 2,V 0), ( V 2,V 3), and ( V 2,V 6) as shown in Figure 21.13.
算法的第二阶段独立处理每个立方体。单元格被一组三角形替换,这些三角形与穿过单元格的表面部分的形状最匹配。算法必须根据每个顶点的隐式函数值决定如何对单元格进行多边形化。这些值可以是正数或负数(即小于或大于等值),为立方体的八个顶点提供 256 种正或负顶点组合。一个包含 256 个条目的表提供了每个三角形中要使用的正确顶点(图 21.12 )。例如,条目 4(00000100)指向第二个表,该表记录了相交边的顶点。在这个例子中,顶点 2 位于曲面 ( f ( V 2 ) > = iso) 内部,因此,我们希望绘制一个三角形,连接曲面上与 ( V 2 ,V 0 )、( V 2 ,V 3 ) 和 ( V 2 ,V 6 ) 为界的边相交的点,如图 21.13所示。
Figure 21.12. Table 2 contains the edges intersected by the surface. Table 1 points to the appropriate entry in Table 2.
图 21.12。表 2 包含与表面相交的边。表 1 指向表 2 中的相应条目。
Figure 21.13 shows a cube where vertex V2 is inside the surface and all other vertices are outside. Intersections with the surface occur on three edges as shown. The surface intersects edge V2 – V6 at the point A. The fastest, but inaccurate, way to calculate A is to use linear interpolation:
图 21.13显示了一个立方体,其中顶点V 2在表面内,而其他所有顶点都在表面外。与表面的交点出现在三条边上,如图所示。表面与边V 2 – V 6相交于点A 。计算A 的最快但不准确的方法是使用线性插值:
Figure 21.13. Finding the intersection of the surface with a cube edge.
图 21.13。寻找表面与立方体边缘的交点。
If the cube side is 1 and the iso-value sought for f (A) is 0.5,then
如果立方体的边长为 1,且f ( A ) 求得的等值是 05,则
This works well for a static image, but in animation error differences between frames will be very noticeable. A root-finding method such as regula falsi should be employed. This becomes more computationally costly as the gradient is needed to evaluate the point of intersection. The gradient is also needed at surface points for rendering. For many types of primitives it is simpler to find a numerical approximation using sample points around p, as in
这对于静态图像很有效,但在动画中,帧之间的误差差异将非常明显。应采用诸如regula falsi之类的求根方法。由于需要梯度来评估交点,因此这会变得更加耗费计算资源。在渲染时,表面点也需要梯度。对于许多类型的图元,使用p周围的样本点找到数值近似值更为简单,例如
A reasonable value for Δ has been found empirically to be 0.01 * side where side is the length of a cube edge.
根据经验发现,Δ 的合理值为 0.01 * 边,其中边是立方体边的长度。
For manufacturing a mesh, as opposed to a set of independent triangles, a second hash table can maintain a list of all the intersecting edges. Since each cube edge is shared by up to four neighbors, the edge hash table prevents repetition of the surface-cube edge intersection calculation. The hash address can be derived from the same hash function as for vertices (applied to the edge endpoints).
为了制造网格,与一组独立的三角形不同,第二个哈希表可以维护所有相交边的列表。由于每个立方体边最多由四个邻居共享,因此边哈希表可防止重复进行表面立方体边相交计算。哈希地址可以从与顶点相同的哈希函数(应用于边端点)中得出。
Ambiguities occur when opposite corners of a face (or the cube) have the same sign and the other pair of vertices on the face have the opposite sign (see Figure21.14). A sample taken in the center of the face will give a clue as to whether the cube represents the meeting of two surfaces or a saddle. It should be made clear that a spatial grid stores a sample of the implicit function at every vertex. If the function happens to vary considerably within a cell, the polygonal representation will not show such variations (see Figure 21.15). The surface cannot be resolved by sampling alone unless something is known about the curvature of the surface. A good discussion of this topic appears in (Kalra & Barr, 1989).
当面(或立方体)的对角具有相同的符号,而面上的另一对顶点具有相反的符号时,就会出现歧义(见图 21.14)。在面中心进行的采样将提供有关立方体是代表两个表面的交汇处还是鞍座的线索。应该明确的是,空间网格在每个顶点处存储隐函数的样本。如果函数恰好在单元格内变化很大,则多边形表示将不会显示这种变化(见图21.15 )。除非对表面的曲率有所了解,否则无法仅通过采样来解析表面。(Kalra & Barr,1989)对此主题进行了很好的讨论。
Figure 21.14. Examples of vertices inside (+) and outside (-) the surface. Note the extra sample gives a clue to avoid ambiguous cases.
图 21.14。表面内部 (+) 和外部 (-) 顶点的示例。请注意,额外的示例提供了避免歧义情况的线索。
Figure 21.15. Cube too large to capture small variation in implicit function.
图 21.15.立方体太大,无法捕捉隐函数中的微小变化。
This ambiguity problem (not the undersampling problem) is avoided by subdividing the cubic cell into tetrahedra. The tetrahedra can then be polygonized unambiguously. Since there are four vertices in each tetrahedron, a table of 16 entries will provide the correct triangle information. The disadvantage is that approximately twice the number of polygons will be generated.
通过将立方体单元细分为四面体,可以避免这种模糊问题(而不是欠采样问题)。然后可以明确地将四面体多边形化。由于每个四面体有四个顶点,因此包含 16 个条目的表将提供正确的三角形信息。缺点是将生成大约两倍数量的多边形。
Without requiring additional cell vertices, a cube may be decomposed into five or six tetrahedra as shown in Figure 21.16. These decompositions introduce diagonals on the cube faces, and to maintain a consistent diagonal direction between neighbors, the six decomposition is preferable. The introduction of diagonal edges produces a higher-resolution surface than replacing each cube directly with triangles. The decomposition into tetrahedra and the replacement of the tetrahedra with triangles are fast, table-driven algorithms, which produce topologically consistent meshes.
无需额外的单元顶点,立方体可以分解成五个或六个四面体,如图 21.16所示。这些分解在立方体面上引入了对角线,为了保持相邻面之间的对角线方向一致,最好进行六面分解。引入对角线边缘比直接用三角形替换每个立方体产生更高分辨率的表面。分解成四面体和用三角形替换四面体是快速的表驱动算法,可产生拓扑一致的网格。
Figure 21.16. Decomposing a cube into six tetrahedra. Image courtesy Erwin DeGroot.
图 21.16。将立方体分解为六个四面体。图片由 Erwin DeGroot 提供。
Two obvious problems emerge from the use of uniform space subdivision. The size of triangles output by this algorithm do not adapt to the curvature of the surface and a further sample is required to solve the ambiguities, in which cubic cells are replaced by polygons. A space subdivision algorithm based on an octree was developed by Bloomenthal (Bloomenthal, 1988), which does adapt to the curvature of the surface. Cells are subdivided into eight octants and cracks are avoided by using a restricted octree scheme, i.e., neighboring cells cannot differ by more than one level of subdivision. This indeed reduces the number of polygons generated, but full advantage of large cells can only be taken if the flat regions of the surface happen to fall entirely within the appropriate octants. The algorithm proves in practice to be considerably slower than the uniform voxel algorithm and is more complicated to implement.
使用均匀空间细分会产生两个明显的问题。此算法输出的三角形大小不适应表面的曲率,需要进一步采样来解决歧义问题,其中立方体单元被多边形取代。Bloomenthal(Bloomenthal,1988)开发了一种基于八叉树的空间细分算法,该算法可以适应表面的曲率。单元被细分为八个八分圆,并使用受限八叉树方案避免裂缝,即相邻单元的细分级别不能相差一个以上。这确实减少了生成的多边形数量,但只有当表面的平坦区域恰好完全落在适当的八分圆内时,才能充分利用大单元。实践证明,该算法比均匀体素算法慢得多,并且实施起来更复杂。
Section 21.1 showed that blending can be made to occur when field values are summed. Ricci, in his landmark paper (Ricci, 1973), describes super-elliptic blending. Given two functions FA and FB, previously we simply found the implicit value as Ftotal =FA + FB. We can denote this more general blending operator as A ◊ B. The Ricci blend is defined as:
第 21.1 节表明,当字段值相加时可以进行混合。Ricci 在他的里程碑式论文(Ricci,1973)中描述了超椭圆混合。给定两个函数F A和F B ,之前我们简单地发现隐式值是F total = F A + F B 。我们可以将这个更一般的混合算子表示为A ◊ B 。Ricci 混合定义为:
It is interesting to point out the following properties:
值得注意的是以下特性:
Moreover, this generalized blending is associative, i.e., f(A◊B)◊C =fA◊(B◊C). The standard blending operator + proves to be a special case of the super-elliptic blend with n = 1. When n varies from 1 to infinity, it creates a set of blends interpolating between blending A + B and union A ∪ B (see Figure 21.17). Figure 21.27 shows the nodes to be binary or unary; in fact the binary nodes can easily be extended using the above formulation to n-ary nodes.
此外,这种广义混合具有结合性,即f (A◊B)◊C = f A◊(B◊C) 。标准混合算子 + 被证明是n = 1 的超椭圆混合的特例。当n从 1 变化到无穷大时,它会创建一组在混合A + B和并集A ∪ B之间进行插值的混合(见图21.17 )。图 21.27显示节点是二元的还是一元的;实际上,使用上述公式可以轻松地将二元节点扩展到 n 元节点。
Figure 21.17. By varying n, the Ricci blend may be made to change smoothly from blend to union. Image courtesy Erwin DeGroot.
图 21.17。通过改变n ,里奇混合可以平滑地从混合变为并集。图片由 Erwin DeGroot 提供。
The power of Ricci’s operators is that they are closed under the operations on the space of all possible implicit volumes, meaning that an application of an operator simply produces another scalar field defining another implicit volume. This new field can be composed with other fields, again using Ricci’s operators. Equation (21.4) will always produce the exact union of two implicit volumes, regardless of how complex they are. Compared with the difficulties involved in applying boolean CSG operations to B-rep surfaces, solid modeling with implicit volumes is incredibly simple.
Ricci 算子的强大之处在于,它们在所有可能的隐式体积空间上的运算下都是封闭的,这意味着一个算子的应用只会产生另一个定义另一个隐式体积的标量场。这个新场可以与其他场组合,同样使用 Ricci 算子。无论两个隐式体积有多复杂,方程 (21.4) 总是会产生两个隐式体积的精确并集。与将布尔 CSG 运算应用于 B-rep 曲面所涉及的困难相比,使用隐式体积进行实体建模非常简单。
Following Pasko’s functional representation (A. Pasko et al., 1995), another generalized blending function may be defined:
按照 Pasko 的函数表示(A. Pasko 等,1995),可以定义另一个广义混合函数:
When α∈ [—1, 1] varies from —1 to 1, it creates a set of blends interpolating the union and the intersection operators. However, this operator is no longer associative which is incompatible with the definition of n-ary operators.
当α∈ [—1 1] 从—1 变化到 1 时,它创建一组混合插值并集和交集运算符。但是,此运算符不再具有结合性,这与 n 元运算符的定义不兼容。
Implicit models are frequently termed implicit surfaces; however, they are inherently volume models and useful for solid modeling operations. Ricci introduced a constructive geometry for defining complex shapes from operations such as union, intersection, difference, and blend upon primitives (Ricci, 1973). The surface was considered as the boundary between the half spaces f (p)< 1, defining the inside, and f (p)> 1 defining the outside. This initial approach to solid modeling evolved into constructive solid geometry or CSG (Ricci, 1973; Requicha, 1980). CSG is typically evaluated bottom-up according to a binary tree, with low-degree polynomial primitives as the leaf nodes and internal nodes representing Boolean set operations. These methods are readily adapted for use in implicit modeling, and in the case of skeletal implicit surfaces, the Boolean set operations union ∪max, intersection ∩min and difference \minmax are defined as follows (Wyvill, Galin, & Guy, 1999):
隐式模型通常被称为隐式曲面;然而,它们本质上是体积模型,可用于实体建模操作。Ricci 引入了一种构造性几何,用于通过对基元进行并集、交集、差集和混合等操作来定义复杂形状(Ricci,1973 年)。曲面被视为半空间f ( p ) <1 (定义内部)和f ( p ) % 3E1(定义外部)之间的边界。这种最初的实体建模方法演变为构造性实体几何或 CSG(Ricci,1973 年;Requicha,1980 年)。CSG 通常根据二叉树自下而上进行评估,低次多项式基元作为叶节点,内部节点表示布尔集运算。这些方法很容易适用于隐式建模,对于骨架隐式曲面,布尔集合运算 union ∪ max 、intersection ∩ min和 difference \ minmax定义如下 (Wyvill, Galin, & Guy, 1999):
The Ricci operators are illustrated in Figure 21.18 for point primitives A and B. For union (bottom left) the field at all points inside the union will be the greater of fA() and fB(). For intersection (center), points in the region marked as P1 will have value min (fA(P1),fB(P1)) = 0, since the contribution of B will be zero outside of its range of influence. Similarly, for the region marked as P2, (influence of A is zero, i.e., the minimum) leaving only the intersection region with positive values. Difference works similarly using the iso-value in the three marked regions ( Pi) as follows:
图 21.18针对点基元A和B说明了 Ricci 算子。对于并集(左下角),并集内所有点的场都将是f A () 和f B () 中的较大者。对于交集(中心),标记为P 1的区域中的点将具有值 min ( f A ( P 1 ) ,f B ( P 1 )) = 0,因为B的贡献在其影响范围之外为零。类似地,对于标记为P 2 的区域( A的影响为零,即最小),仅使交集区域具有正值。差分使用三个标记区域(P)中的等值类似地工作,如下所示:
Figure 21.18. Ricci operators for CSG. Image courtesy Erwin DeGroot.
图 21.18。CSG的 Ricci 算子。图片由 Erwin DeGroot 提供。
CSG operators create creases, i.e., C1 discontinuities. For example, the min() operator (Equation (21.5)) creates C1 discontinuities at all points where f1(p) = f2(p). When applied to two spheres, the discontinuities produced by this union operator result in a crease on the surface, as shown in Figure 21.18, which is the desired result. Discontinuities unfortunately extend into the field outside of the surface, which is not visible in this image. If a blend is then applied to the result of the union, the C1-discontinuous plane in the field produces a shading discontinuity (Figure 21.19).
CSG 运算符会产生折痕,即C 1不连续性。例如,min() 运算符(公式 (21.5))会在f 1 ( p ) = f 2 ( p ) 的所有点处产生C 1不连续性。当应用于两个球体时,此并集运算符产生的不连续性会导致表面产生折痕,如图 21.18所示,这是所需的结果。不幸的是,不连续性延伸到表面外部的场中,这在本图中不可见。如果随后将混合应用于并集的结果,场中的C 1不连续平面会产生阴影不连续性(图 21.19 )。
Figure 21.19. Two point primitives on the left are connected by the Ricci union. A third primitive is blended to the result, creating an unwanted crease in the field. Image courtesy Erwin DeGroot.
图 21.19。左侧的两个点图元通过 Ricci 并集连接。第三个图元与结果混合,在视场中产生不必要的折痕。图片由 Erwin DeGroot 提供。
The problem can be avoided to an extent (G. Pasko, Pasko, Ikeda, & Kunii, 2002), and CSG operators have been developed that are C1 at all points except those where f1(p) =f2(p) = iso (Barthe, Dodgson, Sabin, Wyvill, & Gaildrat, 2003).
该问题可以在一定程度上避免(G. Pasko、Pasko、Ikeda & Kunii,2002),并且已经开发出CSG算子,除了f 1 ( p ) = f 2 ( p ) = iso 的点之外,所有点都是 C 1 (Barthe、Dodgson、Sabin、Wyvill & Gaildrat,2003)。
The ability to distort the shape of a surface by warping the space in its neighborhood is a useful modeling tool. A warp is a continuous function w(x, y, z) that maps ℝ3 onto ℝ3. Sederberg provides a good analogy for warping when describing free form deformations (Sederberg & Parry, 1986). He suggests that the warped space can be likened to a clear, flexible, plastic parallelepiped in which the objects to be warped are embedded. A warped element may be defined by simply applying some warp function w(p) to the implicit equation:
通过扭曲邻域空间来扭曲曲面形状的能力是一种有用的建模工具。扭曲是一个连续函数w ( x, y, z ),它将 ℝ 3映射到 ℝ 3 。Sederberg 在描述自由形式变形时为扭曲提供了一个很好的类比(Sederberg & Parry,1986)。他认为扭曲空间可以比作一个透明、柔韧的塑料平行六面体,其中嵌入了要扭曲的物体。扭曲元素可以通过简单地将某个扭曲函数w ( p ) 应用于隐式方程来定义:
A warped element may be fully characterized by the distance to its skeleton di(x, y, z), its fall-off filter function gi(r), and eventually its warp function wi(x, y, z). To render or perform operations on an implicit surface, the implicit value of many points f (P ) must be found. First, P is transformed by the warp function to some new point Q, and f (Q) is returned in place of f (P ). In Figure 21.20, instead of returning the implicit value of some point f (Q), the value for f (P ) is returned. In this case, the iso-value is returned and the implicit surface (curve in 2D) passes through Q instead of P . Thus, the circle is warped into an ellipse.
扭曲元素可以通过其到骨架的距离 d( x, y, z )、衰减过滤函数 g( r ) 以及最终的扭曲函数 w( x, y, z ) 来完全表征。要在隐式曲面上渲染或执行操作,必须找到许多点f ( P ) 的隐式值。首先,通过扭曲函数将P转换为某个新点Q ,并返回f ( Q ) 来代替f ( P )。在图 21.20中,不是返回某个点f ( Q ) 的隐式值,而是返回f ( P ) 的值。在这种情况下,返回等值,并且隐式曲面(二维曲线)穿过Q而不是P。因此,圆被扭曲成了椭圆。
Figure 21.20. Point Q returns the field value for point P.
图 21.20。点Q返回点P的字段值。
Barr introduced the notion of global and local deformations using the operations of twist, taper, and bend applied to parametric surfaces (Barr, 1984). The deformations can be nested to produce models such as the one shown in Figure 21.27. Conceptually, these are easy to apply to an implicit surface, as indicated in Equation (21.6).
Barr 引入了全局和局部变形的概念,将扭曲、锥化和弯曲操作应用于参数曲面(Barr,1984 年)。变形可以嵌套以生成如图 21.27所示的模型。从概念上讲,这些变形很容易应用于隐式曲面,如公式 (21.6) 所示。
Note that the normal cannot be calculated in a similar manner to warping a point. This problem is similar to the problem outlined in Section 13.2 on instancing. In this case, the normal can most easily be approximated using Equation (21.3.3) although the use of the Jacobian, as suggested in (Barr, 1984), yields precise results. The Barr warps are described in the following sections.
请注意,法线不能以与扭曲点类似的方式计算。此问题类似于第 13.2 节中关于实例化的问题。在这种情况下,最容易使用公式 (21.3.3) 近似法线,尽管使用雅可比矩阵(如 (Barr, 1984) 中建议的那样)可以得到精确的结果。Barr 扭曲将在以下章节中描述。
In this example, the twist is around the z-axis by θ (see Figure 21.21) for three blended implicit cylinders with a twist warp applied to them.
在此示例中,对三个应用了扭曲变形的混合隐式圆柱体进行了绕z轴 θ 方向的扭曲(参见图 21.21 )。
Figure 21.21. Three blended implicit cylinders twisted together. Image courtesy Erwin DeGroot.
图 21.21。三个混合隐式圆柱体扭在一起。图片由 Erwin DeGroot 提供。
The twist around z is expressed as
绕z 的扭曲表示为
Taper is applied along one major axis. A linear taper has proved to be the most useful although quadratic and cubic tapers are easily implemented. For example, a linear taper along the y-axis involves changing both x- and z-coordinates. (See Figure 21.22.) A linear scale is applied to y between ymax and ymin:
锥度沿一个主轴应用。线性锥度已被证明是最有用的,尽管二次和三次锥度很容易实现。例如,沿y轴的线性锥度涉及改变x和z坐标。(见图21.22 。)在y max和y min之间对y应用线性比例:
Figure 21.22. Three blended implicit cylinders, twisted then tapered. Image courtesy Erwin DeGroot.
图 21.22。三个混合隐式圆柱体,先扭曲然后变细。图片由 Erwin DeGroot 提供。
Bend is also applied along one major axis. (See Figure 21.23.) For the bend example below, the bending rate is k measured in radians per unit length, the axis of the bend is (x0, 1/k), and the angle θ is defined as (x – x0) * k. The bend around z is
弯曲也沿一个主轴应用。(见图21.23 。)对于下面的弯曲示例,弯曲率为k ,以弧度/单位长度为单位,弯曲的轴为 ( x 0 1 /k ),角度 θ 定义为 ( x – x 0 ) * k 。绕z 的弯曲为
Figure 21.23. Three blended implicit cylinders, twisted together, tapered and bent. Image courtesy Erwin DeGroot.
图 21.23。三个混合隐式圆柱体,扭曲在一起,逐渐变细并弯曲。图片由 Erwin DeGroot 提供。
Precise contact modeling (PCM) is a method of deforming implicit surface primitives in contact situations while maintaining a precise contact surface with C1 continuity (Gascuel, 1993). PCM is important in that it is a simple and automatic way of showing how a model can react to its environment. This cannot be so easily done with non-implicit methods (see Figure 21.24).
精确接触建模(PCM) 是一种在接触情况下变形隐式表面基元的方法,同时保持具有C 1连续性的精确接触表面 (Gascuel, 1993)。PCM 的重要性在于它是一种简单而自动的方法来显示模型如何对其环境做出反应。使用非隐式方法无法如此轻松地做到这一点(见图21.24 )。
Figure 21.24. Sea anemone deforms to implicit rock. Image courtesy Mai Nur and X. Liang.
图 21.24。海葵变形为隐性岩石。图片由 Mai Nur 和 X. Liang 提供。
PCM is implemented by the inclusion of a deforming function s(p) that modifies the field value returned for each point. For each pair of objects, collision is first detected using a bounding-box test. Once it is established that a collision is likely, PCM is applied. A local, geometric deformation term si is computed and added to the implicit function fi. The volume of the colliding objects is divided into an interpenetration region and a deformation region. The result of applying si is that the interpenetration region is compressed so that contact is maintained without interpenetration occurring (see Figure 21.25). The effect of si is attenuated to zero within the propagation region so that the volume outside of the two regions is not deformed.
PCM 是通过包含一个变形函数s ( p ) 来实现的,该函数会修改为每个点返回的字段值。对于每对物体,首先使用边界框测试检测碰撞。一旦确定可能发生碰撞,就应用 PCM。计算局部几何变形项 s 并将其添加到隐式函数 f 中。碰撞物体的体积被分为相互穿透区域和变形区域。应用 s 的结果是相互穿透区域被压缩,从而保持接触而不会发生相互穿透(见图21.25 )。s 的影响在传播区域内衰减到零,因此两个区域之外的体积不会变形。
Figure 21.25. A 2D slice through objects in collision showing the various regions and PCM deformation. Image courtesy Erwin DeGroot.
图 21.25。碰撞中物体的 2D 切片显示了各个区域和 PCM 变形。图片由 Erwin DeGroot 提供。
Given two skeletal elements generating fields f1(p) and f2(p), the surface around each one is calculated as
给定两个骨架元素生成场f 1 ( p ) 和f 2 ( p ),则每个骨架元素周围的表面计算为
We need to generate a surface common to both elements (dotted line in Figure 21.25), i.e., where they share a solution in the interpenetration region for some p in that region:
我们需要生成两个元素共同的曲面(图 21.25中的虚线),即它们在互穿区域中共享一个解,该解中p为该区域中的某个值:
Intuitively, the deeper within object 1 that object 2 penetrates, the higher the implicit value of object 1 and thus the more that object 2 will be compressed.
直观地看,对象 2 渗透到对象 1 的越深,对象 1 的隐含值就越高,因此对象 2 的压缩程度就越高。
The function, si is defined to produce a smooth junction at the boundary of the interpenetration region, in other words where si = 0 but its derivative is greater than zero. From here to the boundary of the propagation region, si is used to attenuate the propagation to zero. The nearest point on the interpenetration region boundary p0 is found by following the gradient.
函数 s 被定义为在互穿区域边界处产生平滑连接,换句话说,其中 s = 0 但其导数大于零。从这里到传播区域的边界,s 用于将传播衰减到零。通过遵循梯度找到互穿区域边界上的最近点p 0 。
Within the propagation region si(p) =hi(r), where p = (x, y, z) is the point whose implicit value is being calculated and r = ||p – p0|| (see Figure 21.26). The value of ri, set by the user, defines the size of the propagation region; no deformation occurs beyond this region. To control how much the objects inflate in the propagation region, the user provides a value for the parameter α. The maximum value of hi is Mi. The current minimum of si is negative in the interpenetration region and is given as simin, where Mi = –αisimin. Thus an object will be compressed in the interpenetration region and will inflate in the propagation region. The equation for hi is formed in two parts by two cubic polynomials that are designed to join at r =ri/2, where the slope is zero:
在传播区域内,s( p ) =h( r ),其中p = ( x, y, z ) 是正在计算其隐式值的点, r = || p – p 0 ||(参见图 21.26 )。r 的值由用户设置,定义了传播区域的大小;此区域之外不会发生任何变形。为了控制对象在传播区域中膨胀的程度,用户为参数α提供了一个值。h 的最大值是 M。s 的当前最小值在穿透区域中为负,并给出为 s min ,其中 M = –αs min 。因此,对象将在穿透区域中被压缩,并在传播区域中膨胀。h 的方程由两个三次多项式组成,它们被设计为在r =r/2 处连接,其中斜率为零:
Figure 21.26. The function, hi(r) is the value of the deformation function wi in the propagation region.
图 21.26.函数 h(r) 是传播区域中变形函数 w 的值。
It is desirable that we have C1-continuity as we move from the interpenetration to the propagation region. Thus, h'i(0) =k in Figure 21.26, is the directional derivative of si at the junction (marked as p0 in Figure 21.25). As indicated in Equation (21.7), si = –fi in the interpenetration region, thus:
我们希望从贯穿区移动到传播区时具有C 1连续性。因此,图 21.26中的 h'(0) = k是连接点处 s 的方向导数(图 21.25中标记为p 0 )。如公式 (21.7) 所示,贯穿区中 s = – f ,因此:
PCM is only an approximation to a properly deformed surface, but it is an attractive algorithm due to its simplicity.
PCM 只是对适当变形表面的近似,但由于其简单性,它是一种很有吸引力的算法。
The BlobTree is a method that employs a tree structure that extended the CSG tree to include various blending operations using skeletal primitives (Wyvill et al., 1999). A system with similar capabilities, the Hyperfun project, used a specialized language to describe F-rep objects (Adzhiev et al., 1999).
BlobTree是一种采用树结构的方法,它扩展了 CSG 树以包含使用骨架基元的各种混合操作(Wyvill 等人,1999 年)。具有类似功能的系统Hyperfun项目使用专门的语言来描述 F-rep 对象(Adzhiev 等人,1999 年)。
In the BlobTree system, models are defined by expressions that combine implicit primitives and the operators ∪ (union), ∩ (intersection), — (difference), + (blend), ◊ (super-elliptic blend), and w (warp). The BlobTree is not only the data structure built from these expressions but also a way of visualizing the structure of the models. The operators listed above are binary with the exception of warp, which is a unary operator. In general it is more efficient to use n-ary rather than binary operators. The BlobTree incorporates affine transformations as nodes so that it is also a scene graph and primitives (e.g., skeletons) form the leaf nodes.
在 BlobTree 系统中,模型由结合隐式基元和运算符 ∪(并集)、∩(交集)、—(差集)、+(混合)、◊(超椭圆混合)和w (扭曲)的表达式定义。BlobTree 不仅是由这些表达式构建的数据结构,而且还是可视化模型结构的一种方式。上面列出的运算符都是二进制的,但扭曲除外,它是一元运算符。通常,使用 n 元运算符比使用二元运算符更有效。BlobTree 将仿射变换合并为节点,因此它也是一个场景图,并且基元(例如骨架)形成叶节点。
An example of a BlobTree including the Barr warps and CSG operations is shown in Figure 21.27. Other nodes can include 2D texturing (Schmidt, Grimm, & Wyvill, 2006), precise contact modeling, as well as animation and other attributes. The traversal of the BlobTree is in essence very simple. All that is required to render the object either by polygonizing or ray tracing is to find the implicit value of any point (and the corresponding gradient). This can be done by traversing the tree. Polygonization and ray-tracing algorithms need to evaluate the implicit field function at a large number of points in space. The function f (N,M ) returns the field value for the node N at the point M , which depends on the type of the node. The values L and R indicate that the left or right branch of the tree is explored. The algorithm below is written (for simplicity) as if the tree were binary:
图 21.27显示了包含 Barr 扭曲和 CSG 操作的 BlobTree 示例。其他节点可以包括 2D 纹理(Schmidt、Grimm & Wyvill,2006)、精确接触建模以及动画和其他属性。BlobTree 的遍历本质上非常简单。通过多边形化或光线追踪渲染对象所需的只是找到任何点的隐式值(以及相应的渐变)。这可以通过遍历树来完成。多边形化和光线追踪算法需要在空间中的大量点处评估隐式场函数。函数f ( N,M ) 返回点M处节点N的场值,这取决于节点的类型。值L和R表示探索树的左分支还是右分支。下面的算法写成(为简单起见)好像树是二进制的:
Figure 21.27. BlobTree. The spiral staircase is built from a central textured cylinder to which the stairs and the railing are blended. The railing is comprised of a series of cylinders blended with two circle (torus) primitives, blended together and further blended with a vertical cylinder. The BlobTree is also a scene graph and instancing nodes repeat the various parts transformed by the appropriate matrices. Each stair is made from a tapered polygon primitive (that becomes an offset surface); intersection and union nodes combine the inflated disk with the stair.
图 21.27。BlobTree 。螺旋楼梯由中央纹理圆柱体构成,楼梯和栏杆与圆柱体混合。栏杆由一系列圆柱体与两个圆形(圆环)图元混合而成,这些图元混合在一起,然后与垂直圆柱体进一步混合。BlobTree 也是一个场景图,实例化节点重复由适当矩阵变换的各个部分。每个楼梯都由锥形多边形图元(成为偏移表面)构成;交点和并集节点将膨胀的圆盘与楼梯结合在一起。
function f (N,M ):
函数f ( N,M ):
primitive: f (M );
原始: f ( M );
warp: f (L(N ),w(M ));
扭曲: f ( L ( N ) ,w ( M ));
blend: f (L(N ),M )+ f (R(N ),M ));
混合: f ( L ( N ) ,M )+ f ( R ( N ) ,M ));
union: max(f (L(N ),M ),f (R(N ),M ));
并集:max( f ( L ( N ) ,M ) ,f ( R ( N ) ,M ));
intersection: min(f (L(N ),M ),f (R(N ),M ));
交点:最小( f ( L ( N ) ,M ) ,f ( R ( N ) ,M ));
difference: min(f (L(N ),M), –f (R(N ),M )).
差异:最小值( f ( L ( N ) ,M ),- f ( R ( N ) ,M ))。
A complex BlobTree model showing many of the features that have been integrated is shown in Figure 21.28.
图 21.28显示了一个复杂的 BlobTree 模型,其中显示了许多已集成的功能。
Figure 21.28. “Spiral Stairs.” A complex BlobTree implicit model created in Erwin DeGroot’s BlobTree.net system.
图 21.28。 “螺旋楼梯。”在 Erwin DeGroot 的 BlobTree.net 系统中创建的复杂 BlobTree 隐式模型。
Early sketch-based modeling systems, such as Teddy (Igarashi, Matsuoka, & Tanaka, 1999), used a few drawn strokes from the user to infer a polygonal model in 3-space. With better hardware and improved algorithms, sketch-based implicit modeling systems are now possible. Shapeshop uses implicit sweep surfaces to manufacture 3D strokes from 2D user strokes and also preserves the hierarchy of the BlobTree unlike the early systems that produced homogeneous meshes (Schmidt, Wyvill, Sousa, & Jorge, 2005). This enables a user to produce complex models of arbitrary topology from a few simple strokes. The margin figures show a closed drawn stroke (Figure 21.29) inflated into a an implicit sweep and a second sweep (Figure 21.30) that has a smaller sweep object subtracted using CSG.
早期基于草图的建模系统,例如 Teddy(Igarashi、Matsuoka & Tanaka,1999),使用用户的一些绘制笔触来推断三维空间中的多边形模型。随着硬件的改进和算法的改进,基于草图的隐式建模系统现已成为可能。Shapeshop 使用隐式扫描曲面从用户的 2D 笔触制造 3D 笔触,并且与生成同质网格的早期系统(Schmidt、Wyvill、Sousa & Jorge,2005)不同,它保留了 BlobTree 的层次结构。这使用户能够从一些简单的笔触中生成任意拓扑的复杂模型。边注中的图显示了一个闭合的绘制笔触(图 21.29 ),它膨胀为一个隐式扫描,第二个扫描(图 21.30 )具有一个使用 CSG 减去的较小扫描对象。
Figure 21.29. Outlines are inflated. Image courtesy Erwin DeGroot.
图 21.29。轮廓被夸大了。图片由 Erwin DeGroot 提供。
One of the improvements that made this possible is a caching system that uses a fixed 3D grid of implicit values at each node of the BlobTree representing the values found by traversing the tree below the node (Schmidt, Wyvill, & Galin, 2005). If the value of some point p is required at node N, a value may be returned without traversing the tree below N, provided that part of the tree is unaltered. Instead, an interpolation scheme (see Chapter 9) is used to find a value for p. This scheme speeds up traversal for complex BlobTrees and is one factor in enabling a system to run at interactive rates.
实现这一目标的改进之一是缓存系统,它在 BlobTree 的每个节点使用固定的 3D 隐式值网格,表示通过遍历节点下方的树找到的值(Schmidt、Wyvill 和 Galin,2005 年)。如果在节点N处需要某个点p的值,则可以返回一个值,而无需遍历N下方的树,前提是树的该部分未改变。相反,使用插值方案(参见第 9 章)来查找p的值。此方案加快了复杂 BlobTree 的遍历速度,是使系统能够以交互速率运行的一个因素。
The next generation of implicit modeling systems will exploit hardware and software advances to be able to handle more and more complex hierarchical models interactively. A more complex Shapeshop example is shown in Figure 21.31.
下一代隐式建模系统将利用硬件和软件的进步,能够以交互方式处理越来越复杂的分层模型。图 21.31显示了一个更复杂的 Shapeshop 示例。
Figure 21.30. BlobTree operations can be applied, e.g.,CSG difference. Image courtesy Erwin DeGroot.
图 21.30。可以应用 BlobTree 操作,例如 CSG 差异。图片由 Erwin DeGroot 提供。
Figure 21.31. “The Next Step.” A complex BlobTree implicit model created interactively in Ryan Schmidt’s Shapeshop by artist Corien Clapwijk (Andusan).
图 21.31。 “下一步。”艺术家 Corien Clapwijk (Andusan) 在 Ryan Schmidt 的 Shapeshop 中以交互方式创建的复杂 BlobTree 隐式模型。
1. In an implicit surface modeling system the fall-off filter function is defined as
1.在隐式曲面建模系统中,衰减滤波函数定义为
where R is a constant. A point primitive placed at (—1, 0) and another at (1, 0) are rendered to show the f = 0.5 iso-surface. The value R, the distance where the potential due to the point falls to zero in both cases, is 1.5.
其中R为常数。将放置在 (-1 0) 处的一个点基元和放置在 (1 0) 处的另一个点基元渲染以显示f = 05 等值面。值R (即在两种情况下由该点引起的电位降至零的距离)为 15。
Calculate the potential at the point (0, 0) and at +0.5 intervals until the point (2.5, 0). Sketch the 0.5 contour and the contour at which the field falls to zero.
计算点 (0 0) 处以及以 +05 为间隔直到点 (25 0) 处的电位。绘制 05 轮廓线和场降至零时的轮廓线。
2. Why are the ambiguous cases in the polygonization algorithm considered to be a sampling problem?
2.为什么多边形化算法中的模糊情况被认为是采样问题?
3. Calculate the error involved in using linear interpolation to estimate the intersection of an implicit surface and a cubic voxel.
3.计算使用线性插值估计隐式表面与立方体素的交点所涉及的误差。
4. Design an implicit primitive function using the skeleton of your choice. The function must take as input a point and return an implicit value and also the gradient at that point.
4.使用您选择的框架设计一个隐式原始函数。该函数必须将一个点作为输入并返回一个隐式值以及该点的梯度。
Naty Hoffman
Of all the applications of computer graphics, computer and video games attract perhaps the most attention. The graphics methods selected for a given game have a profound effect, not only on the game engine code, but also on the art asset creation, and even sometimes on the gameplay, or core game mechanics.
在计算机图形学的所有应用中,计算机和视频游戏也许是最受关注的。为特定游戏选择的图形方法不仅对游戏引擎代码有深远的影响,而且对艺术资产创作也有深远的影响,有时甚至对游戏玩法或核心游戏机制也有深远的影响。
Although game graphics rely on the material in all of the preceding chapters, two chapters are particularly germane. Games need to make highly efficient use of graphics hardware, so an understanding of the material in Chapter 17 is important.
虽然游戏图形依赖于前面所有章节的内容,但其中两章尤为重要。游戏需要高效利用图形硬件,因此理解第 17 章的内容非常重要。
In this chapter, I will detail the specific considerations that apply to graphics in game development, from the platforms on which games run to the game production process.
在本章中,我将详细介绍游戏开发中适用于图形的具体考虑因素,从游戏运行的平台到游戏制作过程。
Here, I use the term platform to refer to a specific combination of hardware, operating system, and API (application programming interface) for which a game is designed. Games run on a large variety of platforms, ranging from virtual machines used for browser-based games to dedicated game consoles using specialized hardware and APIs.
这里,我使用“平台”一词来指代游戏所针对的硬件、操作系统和 API(应用程序编程接口)的特定组合。游戏可以在各种各样的平台上运行,从用于基于浏览器的游戏的虚拟机到使用专用硬件和 API 的专用游戏机。
In the past, it was common for games to be designed for a single platform. The increasing cost of game development has made this rare; multiplatform game development is now the norm. The incremental increase in development cost to support multiple platforms is more than repaid by a potential doubling or tripling of the customer base.
过去,游戏通常只针对单一平台进行设计。但随着游戏开发成本的不断上升,这种情况已变得非常少见;如今,多平台游戏开发已成为常态。为支持多个平台而增加的开发成本,将通过客户群的潜在增长(翻倍或三倍)得到补偿。
Some platforms are quite loosely defined. For example, when developing a game for the Windows PC platform, the developer must account for a very large variety of possible hardware configurations. Games are even expected to run (and run well) on PC configurations that did not exist when the game was developed! This is only possible due to the abstractions afforded by the APIs defining the Windows platform.
有些平台的定义相当松散。例如,在为 Windows PC 平台开发游戏时,开发人员必须考虑各种可能的硬件配置。游戏甚至有望在开发游戏时不存在的 PC 配置上运行(并运行良好)!这只有通过定义 Windows 平台的 API 提供的抽象才有可能。
One way in which developers account for wide variance in graphics performance is by scaling—adjusting graphics quality in response to system capabilities. This can ensure reasonable performance on low-end systems, while still achieving competitive visuals on high-performance systems. This adjustment is sometimes done automatically by profiling the system performance, but more often this control is left in the hands of the user, who can best judge his personal preferences for quality versus speed. Display resolution is easiest to adjust, followed by antialiasing quality. It is also fairly common to offer several quality levels for visual effects such as shadows and motion blur, including the option of turning the effect off entirely.
开发人员解决图形性能差异的方法之一是缩放— 根据系统功能调整图形质量。这可以确保低端系统上的性能合理,同时在高性能系统上仍能实现具有竞争力的视觉效果。这种调整有时通过分析系统性能自动完成,但更多时候这种控制权掌握在用户手中,用户最能判断自己对质量和速度的个人偏好。显示分辨率最容易调整,其次是抗锯齿质量。为阴影和运动模糊等视觉效果提供多个质量级别也是相当常见的,包括完全关闭效果的选项。
Differences in graphics performance can be so large that some machines may not run the game at a playable frame rate, even with the lowest quality settings; for this reason PC game developers publish minimum and recommended machine specifications for each game.
图形性能的差异可能非常大,以至于某些机器可能无法以可玩的帧速率运行游戏,即使在最低质量设置下也是如此;因此,电脑游戏开发商会为每款游戏发布最低和推荐的机器规格。
As platforms, game consoles are strictly defined. When developing a game for, e.g., Nintendo’s Wii console, the developer knows exactly what hardware the game will run on. If the platform’s hardware implementation is changed (often done to reduce manufacturing costs), the console manufacturer must ensure that the new implementation behaves exactly like the previous one, including timing and performance. This is not to say that the console developer’s task is easy; console APIs tend to be much less abstract and closer to the underlying hardware. This gives console development its own set of difficulties. In some sense, multiplatform development (which commonly includes at least two different console platforms and often Windows as well) is the hardest of all, since the multiplatform game developer has neither the assurance of a fixed platform or the convenience of a single high-level API.
作为平台,游戏机的定义十分严格。当为任天堂的 Wii 游戏机开发游戏时,开发人员会确切地知道游戏将在哪种硬件上运行。如果平台的硬件实现发生变化(通常是为了降低制造成本),游戏机制造商必须确保新实现的行为与之前的完全相同,包括时间和性能。这并不是说游戏机开发人员的任务很容易;游戏机 API 往往不那么抽象,更接近底层硬件。这给游戏机开发带来了一系列困难。从某种意义上说,多平台开发(通常至少包括两个不同的游戏机平台,通常还包括 Windows)是最难的,因为多平台游戏开发人员既没有固定平台的保证,也没有单一高级 API 的便利性。
Browser-based virtual machines such as Adobe Flash are an interesting class of game platforms. Although such virtual machines run on a wide class of hardware from personal computers to mobile phones, the high degree of abstraction provided by the virtual machine results in a stable and unified development platform. The relative ease of development for these platforms and the huge pool of potential customers makes them increasingly attractive to game developers. However, these platforms are defined by the lowest common denominator of the supported hardware, and virtual machines have lower performance than native code on any given platform. For these reasons, such platforms are best suited to games with modest graphics requirements.
基于浏览器的虚拟机(例如 Adobe Flash)是一类有趣的游戏平台。尽管此类虚拟机可以在从个人电脑到手机等各种硬件上运行,但虚拟机提供的高度抽象性可带来稳定而统一的开发平台。这些平台的开发相对简单,并且拥有庞大的潜在客户群,因此对游戏开发者的吸引力越来越大。然而,这些平台是由所支持硬件的最低公分母定义的,虚拟机在任何给定平台上的性能都低于本机代码。出于这些原因,此类平台最适合图形要求适中的游戏。
Platforms can also be characterized by their openness to development, which is a business or legal distinction rather than a technical one. For example, Windows is open in the sense that development tools are widely available, and there are no gatekeepers controlling access to the marketplace of Windows games. Apple’s iPhone is a somewhat more restricted platform in that all applications need to pass a certification process and certain classes of applications are banned outright. Consoles are the most restrictive game platforms, where access to the development tools is tightly controlled. This is opening up somewhat with the introduction of online console game marketplaces, which tend to be more open. A particularly interesting example is Microsoft’s Xbox LIVE Community Games service, where the development tools are freely available and the “gatekeeping” is performed primarily by peer review. Games distributed through this service must use a virtual machine platform provided by Microsoft for security reasons.
平台还可以通过其对开发的开放性来加以区分,这是一种商业或法律区别,而非技术区别。例如,Windows 是开放的,因为开发工具随处可见,而且没有守门人控制对 Windows 游戏市场的访问。Apple 的 iPhone 是一个限制性更强的平台,因为所有应用程序都需要通过认证流程,并且某些类别的应用程序被彻底禁止。游戏机是限制性最强的游戏平台,对开发工具的访问受到严格控制。随着在线游戏机市场(往往更加开放)的推出,这种情况有所开放。一个特别有趣的例子是 Microsoft 的 Xbox LIVE 社区游戏服务,其中的开发工具是免费提供的,“守门人”主要由同行评审来执行。出于安全原因,通过此服务分发的游戏必须使用 Microsoft 提供的虚拟机平台。
The game platform determines many elements of the game experience. For example, PC gamers use keyboard and mouse, while console gamers use specialized game controllers. Many console games support multiple players on the same console, either sharing a screen or providing a window for each player. Due to the difficulty of sharing keyboard and mouse, this type of play is not found on PC. A handheld game system will have a different control scheme than a touch-screen phone, etc.
游戏平台决定了游戏体验的许多元素。例如,PC 游戏玩家使用键盘和鼠标,而主机游戏玩家使用专门的游戏控制器。许多主机游戏支持同一台主机上的多个玩家,要么共享一个屏幕,要么为每个玩家提供一个窗口。由于共享键盘和鼠标的难度,这种类型的游戏在 PC 上是找不到的。手持游戏系统的控制方案与触摸屏手机不同,等等。
Although game platforms vary widely, some common trends can be discerned. Most platforms have multiple processing cores, divided between general-purpose (CPU) and graphics-specific (GPU). Performance gains over time are due mostly to increases in core count; gains in individual core performance are modest. As GPU cores grow in generality, the lines between GPU and CPU cores are increasingly blurred. Storage capacity tends to increase at a slower rate than processing power, and communication bandwidth (between cores as well as between each core and storage) grows at a slower pace still.
尽管游戏平台千差万别,但还是可以看出一些共同趋势。大多数平台都有多个处理核心,分为通用核心 (CPU) 和图形专用核心 (GPU)。随着时间的推移,性能提升主要归因于核心数量的增加;单个核心性能的提升幅度不大。随着 GPU 核心的普遍增长,GPU 和 CPU 核心之间的界限越来越模糊。存储容量的增长速度往往低于处理能力,通信带宽(核心之间以及每个核心与存储之间的)的增长速度更慢。
One of the primary challenges of game graphics is the need to manage multiple pools of limited resources. Each platform imposes its own constraints on hardware resources such as processing time, storage, and memory bandwidth. At a higher level, development resources also need to be managed; there is a fixed-size team of programmers, artists, and game designers with limited time to complete the game, hopefully without working too much overtime! This needs to be taken into account when deciding which graphics techniques to adopt.
游戏图形的主要挑战之一是需要管理多个有限资源池。每个平台对硬件资源(如处理时间、存储和内存带宽)都有自己的限制。在更高层次上,开发资源也需要管理;有一个固定规模的程序员、艺术家和游戏设计师团队,他们只有有限的时间来完成游戏,但希望不要加班太多!在决定采用哪种图形技术时,需要考虑到这一点。
Early game developers only had to worry about budgeting a single processor. Current game platforms contain multiple CPU and GPU cores. These processors need to be carefully synchronized to avoid deadlocks or excessive stalls.
早期的游戏开发者只需要担心单个处理器的预算。当前的游戏平台包含多个 CPU 和 GPU 核心。这些处理器需要仔细同步,以避免死锁或过度停顿。
Since the time consumed by a single rendering command is highly variable, graphics processors are decoupled from the rest of the system via a command buffer. This buffer acts as a queue; commands are deposited on one end and the GPU reads rendering commands from the other. Increasing the size of this buffer decreases the chances of GPU starvation. It is fairly common for games to buffer an entire frame’s worth of rendering commands before sending them to the GPU; this guarantees that GPU starvation does not occur. However, this approach requires reserving enough storage space for two full frame’s worth of commands (the GPU works on one, while the CPU deposits commands in the other). It also increases the latency between the user’s input and the display, which can be problematic for fast-paced games.
由于单个渲染命令所花费的时间变化很大,因此图形处理器通过命令缓冲区与系统的其余部分分离。此缓冲区充当队列;命令存放在一端,GPU 从另一端读取渲染命令。增加此缓冲区的大小可降低 GPU 饥饿的可能性。游戏在将整帧的渲染命令发送到 GPU 之前对其进行缓冲是相当常见的;这可以保证不会发生 GPU 饥饿。但是,这种方法需要为两个整帧的命令保留足够的存储空间(GPU 处理其中一个命令,而 CPU 将命令存放在另一端)。它还会增加用户输入和显示之间的延迟,这对于快节奏的游戏来说可能会有问题。
Processing budgets are determined by the frame rate, which is the frequency at which the frame buffer is refreshed with new renderings of the scene. On fixed platforms (such as consoles), the frame rate experienced by the user is essentially the same one seen by the game developer, so fairly strict frame–rate limits can be imposed. Most games target a frame rate of 30 frames per second (fps); in games where response latency is especially important, the target is often 60 fps. On highly variable platforms (such as PCs), the frame-rate budgets are (by necessity) defined more loosely.
处理预算由帧速率决定,帧速率是帧缓冲区刷新场景新渲染的频率。在固定平台(如游戏机)上,用户体验到的帧速率与游戏开发者看到的帧速率基本相同,因此可以施加相当严格的帧速率限制。大多数游戏的目标帧速率为每秒 30 帧 (fps);在响应延迟特别重要的游戏中,目标通常为 60 fps。在高度可变的平台(如 PC)上,帧速率预算(必然)定义得更宽松。
The required frame rate gives the graphics programmer a fixed budget per frame to work with. In the case of a 30 fps target, the CPU cores have 33 milliseconds to gather inputs, process the game logic, perform any physical simulations, traverse the scene description, and send the rendering commands to the graphics hardware. In parallel, other tasks such as audio and network processing must be handled, with their own required response times. While this is happening, the GPU is typically executing the graphics commands submitted during the previous frame.
所需的帧速率为图形程序员提供了每帧固定的预算。在 30 fps 目标的情况下,CPU 核心有 33 毫秒的时间来收集输入、处理游戏逻辑、执行任何物理模拟、遍历场景描述以及将渲染命令发送到图形硬件。同时,还必须处理音频和网络处理等其他任务,这些任务有各自的响应时间要求。在此过程中,GPU 通常会执行上一帧期间提交的图形命令。
In most cases, CPU cores are a homogeneous resource; all cores are the same, and any of them are equally well suited to a given workload (there are some exceptions, such as the Cell processor used in Sony’s PLAYSTATION 3 console).
在大多数情况下,CPU 核心是同质资源;所有核心都是相同的,并且其中任何一个都同样适合给定的工作负载(也有一些例外,例如 Sony 的 PLAYSTATION 3 控制台中使用的 Cell 处理器)。
In contrast, GPUs contain a heterogeneous mix of resources, each specialized to a certain set of tasks. Some of these resources consist of fixed-function hardware (for triangle rasterization, alpha blending, and texture sampling), and some are programmable cores. On older GPUs, programmable cores were further differentiated into vertex and pixel processing cores; newer GPU designs have unified shader cores which can execute any of the programmable shader types.
相比之下,GPU 包含各种资源,每种资源都专门用于一组特定的任务。其中一些资源由固定功能硬件组成(用于三角形光栅化、alpha 混合和纹理采样),一些则是可编程核心。在较旧的 GPU 上,可编程核心进一步分为顶点和像素处理核心;较新的 GPU 设计具有统一的着色器核心,可以执行任何可编程着色器类型。
Such heterogeneous resources are budgeted separately. Typically, at any point, only one resource type will be the bottleneck, and the others will have excess capacity. On the one hand, this is good, since this capacity can be leveraged to improve visual quality without decreasing performance. On the other hand, it makes it harder to improve performance, since decreasing usage of any of the non-bottleneck resources will have no effect. Even decreasing usage of the bottleneck resource may only improve performance slightly, depending on the degree of utilization of the “next bottleneck.”
这些异构资源是单独预算的。通常,在任何时候,只有一种资源类型会成为瓶颈,其他资源类型都有过剩的容量。一方面,这是好事,因为可以利用这种容量来提高视觉质量,而不会降低性能。另一方面,这会使提高性能变得更加困难,因为减少任何非瓶颈资源的使用都不会产生任何效果。即使减少瓶颈资源的使用也只能稍微提高性能,这取决于“下一个瓶颈”的利用程度。
Game platforms, like any modern computing system, possess multi-stage storage hierarchies, with smaller, faster memory types at the top and larger, slower storage at the bottom. This arrangement is borne of engineering necessity, although it does complicate life for the developer. Most platforms include optical disc storage, which is extremely slow and is used mostly for delivery. On platforms such as Windows, a lengthy installation process is performed once to move all data from the optical disc onto the hard drive, which is significantly faster. The optical disc is never used again (except as an anti-piracy measure). On console platforms, this is less common, although it does sometimes happen when a hard drive is guaranteed to be present, as on Sony’s PLAYSTATION 3 console. More often, the hard drive (if present) is only used as a cache for the optical disc.
游戏平台与任何现代计算系统一样,具有多阶段存储层次结构,较小、较快的内存类型位于顶部,较大、较慢的存储位于底部。这种安排是工程上的需要,尽管它确实使开发人员的工作变得复杂。大多数平台都包含光盘存储,但它的速度极慢,主要用于交付。在 Windows 等平台上,需要执行一次漫长的安装过程才能将所有数据从光盘移动到硬盘上,这要快得多。光盘再也不会使用(除非作为反盗版措施)。在游戏机平台上,这种情况不太常见,尽管有时在保证有硬盘的情况下也会发生这种情况,例如在 Sony 的 PLAYSTATION 3 游戏机上。更常见的是,硬盘(如果有)仅用作光盘的缓存。
The next step up the memory hierarchy is RAM, which on many platforms is divided into general system RAM and VRAM (video RAM) which benefits from a high-speed interface to the graphics hardware. A game level may be too large to fit in RAM, in which case the game developer needs to manage moving the data in and out of RAM as needed. On platforms such as Windows, virtual memory is often used for this. On console platforms, custom data streaming and caching systems are typically employed.
内存层次的下一步是 RAM,在许多平台上,RAM 分为通用系统 RAM 和 VRAM(视频 RAM),后者得益于与图形硬件的高速接口。游戏级别可能太大而无法放入 RAM 中,在这种情况下,游戏开发人员需要根据需要管理数据进出 RAM。在 Windows 等平台上,通常使用虚拟内存来实现这一点。在控制台平台上,通常使用自定义数据流和缓存系统。
Finally, both the CPU and GPU boast various kinds of on-chip memory and caches. These are extremely small and fast and are usually managed by the graphics API.
最后,CPU 和 GPU 都拥有各种片上内存和缓存。这些内存和缓存非常小巧、速度快,通常由图形 API 管理。
Graphics resources take up a lot of memory, so they are a primary focus of storage budgets in game development. Textures are usually the greatest memory consumers, followed by geometry (vertex data), and finally other types of graphics data such as animations. Not all memory can be used for graphics—audio also takes up a fair bit, and game logic may use sizeable data structures. As in the case of processing time, budgeting tends to be somewhat looser on Windows, where the exact amount of memory present on the user’s system is unknown and virtual memory covers a multitude of sins. In contrast, memory budgeting on console platforms is quite strict—often the lead programmer keeps track of memory on a spreadsheet and a programmer requiring more memory for their system needs to beg, borrow, or steal it from someone else.
图形资源占用大量内存,因此它们是游戏开发中存储预算的主要重点。纹理通常是最大的内存消耗者,其次是几何图形(顶点数据),最后是其他类型的图形数据,例如动画。并非所有内存都可以用于图形 - 音频也占用了相当多的内存,游戏逻辑可能会使用相当大的数据结构。与处理时间的情况一样,Windows 上的预算往往有些宽松,因为用户系统上存在的确切内存量是未知的,虚拟内存可以弥补许多缺点。相比之下,控制台平台上的内存预算非常严格 - 首席程序员通常会在电子表格上跟踪内存,而需要更多内存的程序员需要向其他人请求、借用或窃取内存。
The various levels of the memory hierarchy differ not only in size, but also in access speed. This has two separate dimensions: latency and bandwidth.
内存层次结构的各个级别不仅大小不同,访问速度也不同。这有两个独立的维度:延迟和带宽。
Latency is the time that elapses between a storage access request and its final fulfillment. This varies from a few clock cycles (for on-chip cache) to millions of clock cycles (for data residing on optical disc). Latency is usually an issue for read access (although write latency can also be an issue if the result needs to be read back from memory soon after). In some cases, the read request is blocking,which means that the processor core that submitted the read can do nothing else until the request is fulfilled. In other cases, the read is non-blocking; the processing core can submit the read request, do other types of processing, and then use the results of the read after it has arrived. Texture accesses by the GPU are an example of non-blocking reads; an important aspect of GPU design is to find ways to “hide” texture read latency by performing unrelated computations while the texture read is being fulfilled.
延迟是指从存储访问请求到最终完成所经过的时间。延迟从几个时钟周期(对于片上缓存)到数百万个时钟周期(对于光盘上的数据)不等。延迟通常是读取访问的一个问题(尽管如果需要在之后不久从内存中读回结果,写入延迟也可能是一个问题)。在某些情况下,读取请求是阻塞的,这意味着提交读取的处理器核心在请求完成之前无法执行任何其他操作。在其他情况下,读取是非阻塞的;处理核心可以提交读取请求,执行其他类型的处理,然后在读取结果到达后使用读取结果。GPU 的纹理访问是非阻塞读取的一个例子;GPU 设计的一个重要方面是找到通过在执行纹理读取时执行不相关的计算来“隐藏”纹理读取延迟的方法。
For this latency hiding to work, there must be a sufficient amount of computation relative to texture accesses. This is an important consideration for the shader writer; the optimal mix of computation vs. texture access keeps changing (in favor of more computation) as memory fails to keep up with increases in processing power.
要使这种延迟隐藏发挥作用,必须有足够的计算量来访问纹理。这对于着色器编写者来说是一个重要的考虑因素;由于内存无法跟上处理能力的提高,计算与纹理访问的最佳组合不断变化(倾向于更多计算)。
Bandwidth refers to the maximum rate of transfer to and from storage. It is typically measured in gigabytes per second.
带宽是指与存储设备之间的最大传输速率。通常以 GB/秒为单位。
Besides hardware resources, such as processing power and storage space, the game graphics programmer also has to contend with a different kind of limited resource—the time of his teammates! When selecting graphics techniques, the engineering resources needed to implement each technique must be taken into account, as well as any tools necessary to compute the input data (in many cases, tools can take significantly more time than implementing the technique itself). Perhaps most importantly, the impact on artist productivity must be taken into account. Most graphics techniques use assets created by game artists, who comprise by far the largest part of most modern game teams. The graphics programmer must foster the artist’s productivity and creativity, which will ultimately determine the visual quality of the game.
除了硬件资源(如处理能力和存储空间)之外,游戏图形程序员还必须应对另一种有限资源——队友的时间!在选择图形技术时,必须考虑实施每种技术所需的工程资源,以及计算输入数据所需的任何工具(在许多情况下,工具可能比实施技术本身花费更多的时间)。也许最重要的是,必须考虑对艺术家生产力的影响。大多数图形技术都使用游戏艺术家创建的资产,游戏艺术家是大多数现代游戏团队中最大的组成部分。图形程序员必须培养艺术家的生产力和创造力,这最终将决定游戏的视觉质量。
Making wise use of these limited resources is the primary challenge of the game graphics programmer. To this end, various optimization techniques are commonly employed.
合理利用这些有限的资源是游戏图形程序员面临的主要挑战。为此,通常会采用各种优化技术。
In many games, pixel shader processing is a primary bottleneck. Most GPUs contain hierarchical depth-culling hardware which can avoid executing pixel shaders on occluded surfaces. To make good use of this hardware, opaque objects can be rendered back-to-front. Alternatively, optimal depth-culling usage can be achieved by performing a depth prepass, i.e., rendering all the opaque objects into the depth buffer (without any color output or pixel shaders) before rendering the scene normally. This does incur some overhead (due to the need to render every object twice), but in many cases the performance gain is worth it.
在许多游戏中,像素着色器处理是主要瓶颈。大多数 GPU 都包含分层深度剔除硬件,可以避免在遮挡表面上执行像素着色器。为了充分利用此硬件,可以从后向前渲染不透明物体。或者,可以通过执行深度预处理,即在正常渲染场景之前将所有不透明对象渲染到深度缓冲区中(没有任何颜色输出或像素着色器)。这确实会产生一些开销(因为需要渲染每个对象两次),但在许多情况下,性能提升是值得的。
The fastest way to render an object is to not render it at all; thus any method of discerning early on that an object is occluded can be useful. This saves not only pixel processing but also vertex processing and even CPU time that would be spent submitting the object to the graphics API. View frustum culling (see Section 8.4.1) is universally employed, but in many games it is not sufficient. High-level occlusion culling algorithms are often used, utilizing data structures such as PVS (potentially visible sets) or BSP (binary spatial partitioning) trees to quickly narrow down the pool of potentially visible objects.
渲染对象的最快方法是根本不渲染它;因此,任何能够尽早辨别对象是否被遮挡的方法都是有用的。这不仅可以节省像素处理时间,还可以节省顶点处理时间,甚至可以节省将对象提交给图形 API 所花费的 CPU 时间。视锥剔除(参见第 8.4.1 节)被广泛使用,但在许多游戏中它还不够用。通常使用高级遮挡剔除算法,利用 PVS(潜在可见集)或 BSP(二进制空间分区)树等数据结构来快速缩小潜在可见对象的范围。
Even if an object is visible, it may be at such a distance that most of its detail can be removed without apparent effect. LOD (level-of-detail) algorithms render different representations of an object based on distance (or other factors, such as screen coverage or importance). This can save significant processing, vertex processing in particular. Examples can be seen in Figure 22.1.
即使物体可见,也可能位于距离很远的地方,以至于其大部分细节可以被移除而不会产生明显影响。LOD(细节级别)算法根据距离(或其他因素,如屏幕覆盖范围或重要性)渲染物体的不同表示。这可以节省大量处理,尤其是顶点处理。示例见图 22.1 。
Figure 22.1. Two examples of game objects at a varying level of detail. The small inset images show the relative sizes at which the simplified models might be used. Upper row of images courtesy Crytek; lower row courtesy Valve Corp.
图 22.1。两个不同细节级别的游戏对象示例。小插图显示了简化模型可能使用的相对大小。上排图片由 Crytek 提供;下排图片由 Valve Corp 提供。
In many cases, processing can be performed before the game even starts. The results of such preprocessing can be stored and used each frame, thus speeding up the game. This is most commonly employed for lighting, where global illumination algorithms are utilized to compute lighting throughout the scene and store it in lightmaps and other data structures for later use.
在许多情况下,处理甚至可以在游戏开始之前进行。这种预处理的结果可以存储并在每一帧中使用,从而加快游戏速度。这最常用于照明,其中全局照明算法用于计算整个场景的照明并将其存储在光照贴图和其他数据结构中以供以后使用。
Since game requirements vary widely, the selection of graphics techniques is driven by the exact type of game being developed.
由于游戏要求千差万别,图形技术的选择取决于正在开发的游戏的具体类型。
The allocation of processing time depends strongly on the frame rate. Currently, most console games tend to target 30 frames per second, since this enables much higher graphics quality. However, certain game types with fast gameplay require very low latency, and such games typically render at 60 frames per second. This includes music games such as Guitar Hero and first-person shooters such as Call of Duty.
处理时间的分配在很大程度上取决于帧速率。目前,大多数主机游戏倾向于以每秒 30 帧为目标,因为这可以实现更高的图形质量。但是,某些游戏速度快的游戏类型需要非常低的延迟,而这类游戏通常以每秒 60 帧的速度渲染。这包括音乐游戏(如《吉他英雄》)和第一人称射击游戏(如《使命召唤》) 。
The frame rate determines the available time to render the scene. The composition of the scene itself also varies widely from game to game. Most games have a division between background geometry (scenery, mostly static) and foreground geometry (characters and dynamic objects). These are handled differently by the rendering engine. For example, background geometry will often have lightmaps containing precomputed lighting, which is not feasible for foreground objects. Precomputed lighting is typically applied to foreground objects via some type of volumetric representation which can take account of the changing position of each object over time.
帧速率决定了渲染场景的可用时间。场景本身的构成也因游戏而异。大多数游戏都有背景几何体(风景,大多是静态的)和前景几何体(角色和动态物体)的划分。渲染引擎以不同的方式处理它们。例如,背景几何体通常会有包含预计算照明的光照贴图,这对于前景物体来说是不可行的。预计算照明通常通过某种类型的体积表示应用于前景物体,这种体积表示可以考虑每个物体随时间变化的位置。
Some games have relatively enclosed environments, where the camera remains largely in place. The purest examples are fighting games such as the Street Fighter series, but this is also true to some extent for games such as Devil May Cry and God of War. These games have cameras that are not under direct player control, and the game play tends to move from one enclosed environment to another, spending a significant amount of playing time in each. This allows the game developer to lavish large amounts of resources (processing, storage, and artist time) on each room or enclosed environment, resulting in very high levels of graphics fidelity.
有些游戏具有相对封闭的环境,其中摄像机大部分保持在原位。最纯粹的例子是《街头霸王》系列等格斗游戏,但在某种程度上, 《鬼泣》和《战神》等游戏也是如此。这些游戏的摄像机不受玩家直接控制,游戏玩法往往会从一个封闭环境移动到另一个封闭环境,在每个环境中花费大量的游戏时间。这使得游戏开发者可以在每个房间或封闭环境中投入大量资源(处理、存储和艺术家时间),从而实现非常高的图形保真度。
Other games have extremely large worlds, where the player can move about freely. This is most true for “sandbox games” such as the Grand Theft Auto series and online role-playing games such as World of Warcraft. Such games pose great challenges to the graphics developer, since resource allocation is very difficult when during each frame the player can see a large extent of the world. Further complicating things, the player can freely go to some formerly distant part of the world and observe it from up close. Such games typically have changing time of day, which makes precomputation of lighting difficult at best, if not impossible.
其他游戏拥有非常大的世界,玩家可以在其中自由移动。对于“沙盒游戏”如《侠盗猎车手》系列和在线角色扮演游戏如《魔兽世界》来说,情况尤其如此。这类游戏对图形开发人员提出了巨大挑战,因为当玩家在每一帧中都可以看到世界的大部分范围时,资源分配非常困难。更复杂的是,玩家可以自由地前往世界某个以前遥远的地方,近距离观察它。这类游戏通常会随着时间的变化而变化,这使得预先计算光照非常困难,甚至不可能。
Most games, such as first-person shooters, are somewhere between the two extremes. The player can see a fair amount of scenery each frame, but movement through the game world is somewhat constrained. Many games also have a fixed time of day for each game level, for ease of lighting precomputation.
大多数游戏(例如第一人称射击游戏)都介于这两个极端之间。玩家每帧可以看到相当多的风景,但在游戏世界中的移动会受到一定限制。许多游戏还为每个游戏级别设置了固定的时间,以便于进行光照预计算。
The number of foreground objects rendered also varies widely between game types. Real-time strategy games such as the Command and Conquer series often have many dozens, if not hundreds, of units visible on screen. Other types of games have more limited quantities of visible characters, with fighting games at the opposite extreme, where only two characters are visible, each rendered with extremely high detail. A distinction must be drawn between the number of characters visible at any time (which affects budgeting of processing time) and the number of unique characters which can potentially be visible at short notice (which affects storage budgets).
不同游戏类型所渲染的前景对象的数量也存在很大差异。 《命令与征服》系列等实时战略游戏通常在屏幕上显示数十个甚至数百个单位。其他类型的游戏的可见角色数量则更为有限,而格斗游戏则恰恰相反,只有两个角色可见,每个角色都以极高的细节进行渲染。必须区分随时可见的角色数量(这会影响处理时间的预算)和可能在短时间内可见的独特角色数量(这会影响存储预算)。
The type or genre of game also determines audience expectations of the graphics. For example, first-person shooters have historically had very high levels of graphics fidelity, and this expectation drives the graphics design when developing new games in that genre; see Figure 22.2. On the other hand, puzzle games have typically had relatively simplistic graphics, so most game developers will not invest large amounts of programming or art resources into developing photorealistic graphics for such games.
类型或游戏类型也决定了玩家对图像的期望。例如,第一人称射击游戏历来具有非常高的图形保真度,这种期望推动了开发该类型新游戏时的图形设计;参见图 22.2 。另一方面,益智游戏通常具有相对简单的图像,因此大多数游戏开发者不会投入大量编程或艺术资源来为此类游戏开发逼真的图像。
Figure 22.2. Crysis exemplifies the realistic and detailed graphics expected of first-person shooters. Image courtesy Crytek.
图 22.2。 《孤岛危机》体现了第一人称射击游戏应有的逼真、细致的画面。图片由 Crytek 提供。
Although most games aim for a photorealistic look, a few do attempt more stylized rendering. One interesting example of this is Okami, which can be seen in Figure 22.3.
虽然大多数游戏都追求照片级的真实感,但也有少数游戏尝试了更风格化的渲染。一个有趣的例子是《大神》 ,如图 22.3所示。
Figure 22.3. An example of highly stylized, non-photorealistic rendering from the game Okami. Image courtesy Capcom Entertainment, Inc.
图 22.3。游戏《大神》中高度风格化、非真实感渲染的示例。图片由 Capcom Entertainment, Inc. 提供。
The management of development resources also differs by game type. Most games have a closed development cycle of one to two years, which ends after the game ships. Recently it has become common to have downloadable content (DLC), which can be purchased after the game ships, so some development resources need to be reserved for that. Persistent-world online games have a never-ending development process where new content is continually being generated, at least as long as the game is economically viable (which may be a period of decades).
开发资源的管理也因游戏类型而异。大多数游戏都有一到两年的封闭开发周期,游戏发布后开发周期就结束了。最近,游戏发布后可以购买可下载内容 (DLC) 的情况越来越普遍,因此需要为此保留一些开发资源。持久世界在线游戏有一个永无止境的开发过程,新内容不断生成,至少在游戏具有经济可行性的前提下(可能长达数十年)。
The creative exploitation of the specific requirements and restrictions of a particular game is the hallmark of a skilled game graphics programmer. A good example is the game LittleBigPlanet, which has a “two-and-a-half-dimensional” game world comprising a small number of two-dimensional layers, as well as a noninteractive background. The graphics quality of this game is excellent, driven by the use of unusual rendering techniques specialized to this type of environment; see Figure 22.4.
创造性地利用特定游戏的特定要求和限制是熟练的游戏图形程序员的标志。一个很好的例子是游戏LittleBigPlanet ,它有一个“二维半”游戏世界,由少量二维层以及非交互式背景组成。这款游戏的图形质量非常出色,这要归功于专门针对此类环境的不同寻常的渲染技术;见图22.4 。
Figure 22.4. The LittleBigPlanet developers took care to choose techniques that fit the game’s constraints, combining them in unusual ways to achieve stunning results. LittleBigPlanet © 2007 Sony Computer Entertainment Europe. Developed by Media Molecule. LittleBigPlanet is a trademark of Sony Computer Entertainment Europe.
图 22.4。LittleBigPlanet开发人员精心选择了适合游戏限制的技术,并以不同寻常的方式组合这些技术,以实现令人惊叹的效果。LittleBigPlanet © 2007 Sony Computer Entertainment Europe。由 Media Molecule 开发。LittleBigPlanet 是 Sony Computer Entertainment Europe 的商标。
The game production process starts with the basic game design or concept. In some cases (such as sequels), the basic gameplay and visual design is clear, and only incremental changes are made. In the case of a new game type, extensive prototyping is needed to determine gameplay and design. Most cases sit somewhere in the middle, where there are some new gameplay elements and the visual design is somewhat open. After this step there may be a greenlight stage where some early demo or concept is shown to the game publisher to get approval (and funding!) for the game.
游戏制作流程从基本游戏设计或概念开始。在某些情况下(例如续集),基本游戏玩法和视觉设计很清晰,只需进行增量更改。对于新游戏类型,需要进行大量原型设计来确定游戏玩法和设计。大多数情况处于中间位置,其中有一些新的游戏元素,而视觉设计则有些开放。在此步骤之后,可能会进入绿灯阶段,向游戏发行商展示一些早期演示或概念,以获得游戏的批准(和资金!)。
The next step is typically pre-production. While other teams are working on finishing up the last game, a small core team works on making any needed changes to the game engine and production tool chain, as well as working out the rough details of any new gameplay elements. This core team is working under a strict deadline. After the existing game ships and the rest of the team comes back from a well-deserved vacation, the entire tool chain and engine must be ready for them. If the core team misses this deadline, several dozen developers may be left idle—an extremely expensive proposition!
下一步通常是前期制作。当其他团队正在努力完成最后一款游戏时,一个小的核心团队正在努力对游戏引擎和制作工具链进行必要的更改,并制定任何新游戏元素的粗略细节。这个核心团队的工作时间非常严格。现有游戏发布后,团队的其他成员从应得的假期回来,整个工具链和引擎必须为他们做好准备。如果核心团队错过了这个最后期限,几十名开发人员可能会被闲置——这是一个极其昂贵的提议!
Full production is the next step, with the entire team creating art assets, designing levels, tweaking gameplay, and implementing further changes to the game engine. In a perfect world, everything done during this process would be used in the final game, but in reality there is an iterative nature to game development which will result in some work being thrown out and redone. The goal is to minimize this with careful planning and prototyping.
下一步是全面制作,整个团队将创建艺术资产、设计关卡、调整游戏玩法并对游戏引擎进行进一步更改。在理想情况下,在此过程中完成的所有工作都将用于最终游戏,但实际上游戏开发具有迭代性质,这会导致一些工作被丢弃并重新完成。目标是通过仔细规划和原型设计将这种情况最小化。
When the game is functionally complete, the final stage begins. The term alpha release usually refers to the version which marks the start of extensive internal testing, beta release to the one which marks the start of extensive external testing, and gold release to the final release submitted to the console manufacturer, but different companies have slightly varying definitions of these terms. In any case, testing, or quality assurance (QA) is an important part of this phase, and it involves testers at the game development studio, at the publisher, at the console manufacturer, and possibly external QA contractors as well. These various rounds of testing result in bug reports which are submitted back to the game developers and worked on until the next release.
当游戏功能完成时,最后阶段就开始了。术语alpha 版本通常指标志着广泛内部测试开始的版本, beta 版本指标志着广泛外部测试开始的版本, gold 版本指提交给游戏机制造商的最终版本,但不同公司对这些术语的定义略有不同。无论如何,测试或质量保证(QA) 是此阶段的重要组成部分,它涉及游戏开发工作室、发行商、游戏机制造商的测试人员,也可能涉及外部 QA 承包商。这些多轮测试会产生错误报告,这些报告将提交给游戏开发者并进行处理,直到下一个版本。
After the game ships, most of the developers go on vacation for a while, but a small team may have to stay to work on patches or downloadable content. In the meantime, a small core team has been working on pre-production for the next game.
游戏发布后,大多数开发人员都会休假一段时间,但一个小团队可能会留下来开发补丁或可下载内容。与此同时,一个小的核心团队一直在为下一款游戏进行前期制作。
Art asset creation is an aspect of game production that is particularly relevant to graphics development, so I will go into it in some detail.
艺术资产创作是游戏制作的一个方面,与图形开发特别相关,因此我将详细介绍它。
While the exact process of art asset creation varies from game to game, the outline I give here is fairly representative. In the past, a single artist would create an entire asset from start to finish, but this process is now much more specialized, involving people with different skill sets working on each asset at various times. Some of these stages have clear dependencies (for example, a character cannot be animated until it is rigged and cannot be rigged before it is modeled). Most game developers have well-defined approval processes, where the art director or a lead artist signs off on each stage before the asset is sent on to the next. Ideally an asset proceeds through each stage exactly once, but in practice changes may be made that require resubmission.
虽然艺术资产创建的具体过程因游戏而异,但我在此给出的概述相当具有代表性。过去,一位艺术家会从头到尾创建整个资产,但现在这个过程更加专业化,涉及具有不同技能的人在不同时间处理每个资产。其中一些阶段具有明确的依赖关系(例如,角色在装配之前无法动画化,在建模之前无法装配)。大多数游戏开发商都有明确的审批流程,艺术总监或首席艺术家在每个阶段签字后,资产才会进入下一个阶段。理想情况下,资产每个阶段只进行一次,但在实践中可能会进行需要重新提交的更改。
Typically the art asset creation process starts by modeling the object geometry. This step is performed in a general-purpose modeling package such as Maya, MAX or Softimage. The modeled geometry will be passed directly to the game engine, so it is important to minimize vertex count while preserving good silhouettes. Character meshes must also be constructed so as to be amenable to animation.
通常,艺术资产创建过程从建模对象几何体开始。此步骤在通用建模软件包(如 Maya、MAX 或 Softimage)中执行。建模的几何体将直接传递到游戏引擎,因此在保留良好轮廓的同时尽量减少顶点数量非常重要。角色网格也必须构建以适合动画。
In this stage, a two-dimensional surface parameterization for textures is usually created. It is important that this parameterization be highly continuous, since discontinuities require vertex duplication and may cause filtering artifacts. An example of a mesh with its associated texture parameterization is shown in Figure 22.5.
在此阶段,通常会创建纹理的二维表面参数化。重要的是,此参数化必须高度连续,因为不连续性需要顶点重复,并可能导致过滤伪影。图 22.5显示了网格及其相关纹理参数化的示例。
Figure 22.5. A mesh being modeled in Maya, with associated texture parameterization. Image courtesy Keith Bruns.
图 22.5.在 Maya 中建模的网格,带有相关的纹理参数化。图片由 Keith Bruns 提供。
In the past, texturing was a straightforward process of painting a color texture, typically in Photoshop. Now, specialized detail modeling packages such as ZBrush or Mudbox are commonly used to sculpt fine surface detail. Figures 22.6 and 22.7 show an example of this process.
过去,纹理处理是绘制彩色纹理的简单过程,通常在 Photoshop 中完成。现在,通常使用 ZBrush 或 Mudbox 等专门的细节建模软件来雕刻精细的表面细节。图 22.6和22.7显示了此过程的一个示例。
Figure 22.6. The mesh from Figure 22.5 has been brought into ZBrush for detail modeling. Image courtesy Keith Bruns.
图 22.6。图 22.5中的网格已导入 ZBrush 进行详细建模。图片由 Keith Bruns 提供。
Figure 22.7. The mesh from Figure 22.6, with fine detail added to it in ZBrush. Image courtesy Keith Bruns.
图 22.7。图 22.6中的网格,在 ZBrush 中添加了精细细节。图片由 Keith Bruns 提供。
If this additional detail were to be represented with actual geometry, millions of triangles would be needed. Instead, the detail is commonly “baked” into a normal map which is applied onto the original, coarse mesh, as shown in Figures 22.8 and 22.9.
如果要用实际几何体来表示这些额外的细节,则需要数百万个三角形。相反,细节通常被“烘焙”到法线贴图中,然后应用到原始的粗糙网格上,如图 22.8和22.9所示。
Figure 22.8. A visualization (in ZBrush) of the mesh from Figure 22.6, rendered with a normal map derived from the detailed mesh in Figure 22.7. The bottom of the figure shows the interface for ZBrush’s “Zmapper” tool, which was used to derive the normal map. Image courtesy Keith Bruns.
图 22.8。图 22.6中的网格的可视化(在 ZBrush 中),使用从图 22.7中的细节网格派生的法线贴图进行渲染。图的底部显示了 ZBrush 的“Zmapper”工具的界面,该工具用于派生法线贴图。图片由 Keith Bruns 提供。
Figure 22.9. The normal map used in Figure 22.8. In this image, the red, green, and blue channels of the texture contain the X, Y, and Z coordinates of the surface normals. Image courtesy Keith Bruns.
图 22.9。图 22.8中使用的法线贴图。在此图中,纹理的红色、绿色和蓝色通道包含表面法线的 X、Y 和 Z 坐标。图片由 Keith Bruns 提供。
Besides normal maps, multiple textures containing surface properties such as diffuse color, specular color, and smoothness (specular power) are also created. These are either painted directly on the surface in the detail modeling application, or in a two-dimensional application such as Photoshop. All of these texture maps use the surface parameterization defined in the initial modeling phase. When the texture is painted in a two-dimensional painting application, the artist must frequently switch between the painting application and some other application which can show a three-dimensional rendering of the object with the texture applied. This iterative process is illustrated in Figures 22.10, 22.11, 22.12, and 22.13.
除了法线贴图之外,还会创建包含表面属性(例如漫反射颜色、镜面反射颜色和平滑度(镜面反射强度))的多个纹理。这些纹理要么直接在细节建模应用程序中绘制在表面上,要么在 Photoshop 等二维应用程序中绘制。所有这些纹理贴图都使用在初始建模阶段定义的表面参数化。当在二维绘画应用程序中绘制纹理时,艺术家必须频繁在绘画应用程序和其他可以显示应用了纹理的对象的三维渲染的应用程序之间切换。图 22.10、22.11、22.12和22.13说明了此迭代过程。
Figure 22.10. An early version of a diffuse color texture for the mesh from Figure 22.8, shown in Photoshop. Image courtesy Keith Bruns.
图 22.10。图 22.8中网格的漫反射颜色纹理的早期版本,在 Photoshop 中显示。图片由 Keith Bruns 提供。
Figure 22.11. A rendering (in ZBrush) of the mesh with normal map and early diffuse color texture (from Figure 22.10) applied. Image courtesy Keith Bruns.
图 22.11。应用了法线贴图和早期漫反射颜色纹理(见图 22.10 )的网格渲染(在 ZBrush 中)。图片由 Keith Bruns 提供。
Figure 22.12. Final version of the color texture from Figure 22.10. Image courtesy Keith Bruns.
图 22.12。图 22.10中颜色纹理的最终版本。图片由 Keith Bruns 提供。
Figure 22.13. Rendering of the mesh with normal map and final color texture (from Figure 22.12) applied. Image courtesy Keith Bruns.
图 22.13。使用法线贴图和最终颜色纹理(来自图 22.12 )渲染网格。图片由 Keith Bruns 提供。
Shaders are typically applied in the same application used for initial modeling. In this process, a shader (from the set of shaders defined for that game) is applied to the mesh. The various textures resulting from the detail modeling stage are applied as inputs to this shader, using the surface parameterization defined during initial modeling. Various other shader inputs are set via visual experimentation (“tweaking”); see Figure 22.14.
着色器通常应用于用于初始建模的同一应用程序中。在此过程中,着色器(来自为该游戏定义的着色器集)应用于网格。细节建模阶段产生的各种纹理作为此着色器的输入,使用初始建模期间定义的表面参数化。各种其他着色器输入通过视觉实验(“调整”)进行设置;参见图 22.14 。
Figure 22.14. Shader configuration in Maya. The interface on the right is used to select the shader, assign textures to shader inputs, and set the values of non-texture shader inputs (such as the “Specular Color” and “Specular Power” sliders). The rendering on the left is updated dynamically while these properties are modified, enabling immediate visual feedback. Image courtesy Keith Bruns.
图 22.14。Maya中的着色器配置。右侧的界面用于选择着色器、将纹理分配给着色器输入以及设置非纹理着色器输入的值(例如“镜面颜色”和“镜面强度”滑块)。修改这些属性时,左侧的渲染会动态更新,从而实现即时的视觉反馈。图片由 Keith Bruns 提供。
In the case of background scenery, lighting artists will typically start their work after modeling, texturing, and shading have been completed. Light sources are placed and their effect computed in a preprocessing step. The results of this process are stored in lightmaps for later use by the rendering engine.
对于背景场景,光照艺术家通常会在建模、纹理和着色完成后开始工作。在预处理步骤中放置光源并计算其效果。此过程的结果存储在光照贴图中,以供渲染引擎稍后使用。
Character meshes undergo several additional steps related to animation. The primary method used to animate game characters is skinning. This requires a rig, consisting of a hierarchy of transform nodes that is attached to the character, a process known as rigging. The area of effect of each transform node is painted onto a subset of mesh vertices. Finally, animators create animations that move, rotate, and scale these transform nodes, “dragging” the mesh behind them.
角色网格需要经过几个与动画相关的额外步骤。为游戏角色制作动画的主要方法是蒙皮。这需要一个装备,由附加到角色的变换节点层次组成,这个过程称为装备。每个变换节点的效果区域被绘制到网格顶点的子集上。最后,动画师创建移动、旋转和缩放这些变换节点的动画,将网格“拖”到它们后面。
A typical game character will have many dozens of animations, corresponding to different modes of motion (walking, running, turning) as well as different actions such as attacks. In the case of a main character, the number of animations can be in the hundreds. Transitions between different animations also need to be defined.
典型的游戏角色会有几十个动画,对应不同的运动模式(行走、跑步、转身)以及不同的动作(如攻击)。对于主角来说,动画的数量可能多达数百个。还需要定义不同动画之间的过渡。
For facial animation, another technique, called morph targets is sometimes employed. In this technique, the mesh vertices are directly manipulated to deform the mesh. Different copies of the deformed mesh are stored (e.g., for different facial expressions) and combined by the game engine at runtime. The creation of morph targets is shown in Figure 22.15.
对于面部动画,有时会采用另一种称为变形目标的技术。在此技术中,直接操纵网格顶点以使网格变形。变形网格的不同副本被存储(例如,用于不同的面部表情)并由游戏引擎在运行时组合。变形目标的创建如图 22.15所示。
Figure 22.15. Morph target interface in Maya. The bottom row shows four different morph targets, and the model at the top shows the effects of combining several morph targets together. The interface at the upper left is used to control the degree to which each morph target is applied. Image courtesy Keith Bruns.
图 22.15. Maya 中的变形目标界面。底部一行显示四个不同的变形目标,顶部的模型显示将多个变形目标组合在一起的效果。左上角的界面用于控制每个变形目标的应用程度。图片由 Keith Bruns 提供。
There is a huge amount of information on real-time rendering and game programming available, both in books and online. Here are some resources I can recommend from personal familiarity.
书籍和网上都有很多关于实时渲染和游戏编程的信息。以下是我根据个人经验推荐的一些资源。
Game Developer Magazine is a good source of information on game development, as are slides from the talks given at the annual Game Developers Conference (GDC) and Microsoft’s Gamefest conference. The GPU Gems and ShaderX book series also contain good information—all of the former and the first two of the latter are also available online.
《游戏开发者杂志》是有关游戏开发的良好信息来源,年度游戏开发者大会(GDC) 和 Microsoft 的Gamefest大会上的演讲幻灯片也是如此。 《GPU Gems 》和《ShaderX 》丛书也包含很好的信息 - 前者的全部内容以及后者的前两本也可以在线获取。
Eric Lengyel’s Mathematics for 3D Game Programming & Computer Graphics, now in its second edition, is a good reference for the various types of math used in graphics and games. A specific area of game programming that is closely related to graphics is collision detection, for which Christer Ericson’s Real-Time Collision Detection is the definitive resource.
Eric Lengyel 的《3D 游戏编程与计算机图形学的数学》现已出到第二版,是图形学和游戏中使用的各种数学的良好参考资料。游戏编程中与图形学密切相关的一个特定领域是碰撞检测,而 Christer Ericson 的《实时碰撞检测》是该领域的权威资源。
Since its first edition in 1999, Eric Haines and Tomas Akenine-Möller’s Real-Time Rendering has endeavored to cover this fast-growing field in a thorough manner. As a longtime fan of this book, I was glad to have the opportunity to be a coauthor on the third edition, which came out in mid-2008.
自 1999 年出版第一版以来,Eric Haines 和 Tomas Akenine-Möller 合著的《实时渲染》一直致力于全面介绍这一快速发展的领域。作为这本书的长期粉丝,我很高兴有机会成为 2008 年年中出版的第三版的合著者。
Reading is not enough—make sure you play a variety of games regularly to get a good idea of the requirements of various game types, as well as the current state of the art.
光阅读是不够的——确保你定期玩各种游戏,这样才能充分了解各种游戏类型的要求以及当前的技术水平。
1. Examine the visuals of two dissimilar games. What differences can you deduce in the graphics requirements of these two games? Analyze the effect on rendering time, storage budgets, etc.
1.检查两款不同游戏的视觉效果。你能推断出这两款游戏的图形要求有哪些不同?分析对渲染时间、存储预算等的影响。
Tamara Munzner
A major application area of computer graphics is visualization, where computer-generated images are used to help people understand both spatial and nonspatial data. Visualization is used when the goal is to augment human capabilities in situations where the problem is not sufficiently well defined for a computer to handle algorithmically. If a totally automatic solution can completely replace human judgment, then visualization is not typically required. Visualization can be used to generate new hypotheses when exploring a completely unfamiliar dataset, to confirm existing hypotheses in a partially understood dataset, or to present information about a known dataset to another audience.
计算机图形学的一个主要应用领域是可视化,其中计算机生成的图像用于帮助人们理解空间和非空间数据。当目标是在问题定义不够明确而计算机无法通过算法处理的情况下增强人类能力时,就会使用可视化。如果完全自动化的解决方案可以完全取代人类判断,那么通常就不需要可视化了。可视化可用于在探索完全不熟悉的数据集时生成新假设,在部分理解的数据集中确认现有假设,或向其他受众展示有关已知数据集的信息。
Visualization allows people to offload cognition to the perceptual system, using carefully designed images as a form of external memory. The human visual system is a very high-bandwidth channel to the brain, with a significant amount of processing occurring in parallel and at the pre-conscious level. We can thus use external images as a substitute for keeping track of things inside our own heads. For an example, let us consider the task of understanding the relationships between a subset of the topics in the splendid book Gödel, Escher, Bach: The Eternal Golden Braid (Hofstadter, 1979); see Figure 23.1.
可视化使人们能够将认知转移到感知系统,使用精心设计的图像作为外部记忆的形式。人类视觉系统是通往大脑的带宽非常高的通道,其中大量的处理是并行进行的,并且是在前意识层面。因此,我们可以用外部图像来代替在我们自己的头脑中跟踪事物。例如,让我们考虑理解精彩书籍《哥德尔、埃舍尔、巴赫:永恒的金辫》 (Hofstadter,1979)中主题子集之间的关系的任务;见图23.1 。
Figure 23.1. Keeping track of relationships between topics is difficult using a text list.
图 23.1.使用文本列表来跟踪主题之间的关系很困难。
When we see the dataset as a text list, at the low level we must read words and compare them to memories of previously read words. It is hard to keep track of just these dozen topics using cognition and memory alone, let alone the hundreds of topics in the full book. The higher-level problem of identifying neighborhoods, for instance finding all the topics two hops away from the target topic Paradoxes, is very difficult.
当我们将数据集视为文本列表时,在低级层面上,我们必须阅读单词并将它们与之前阅读的单词的记忆进行比较。仅使用认知和记忆很难跟踪这十几个主题,更不用说整本书中的数百个主题了。识别邻域的高级问题(例如找到距离目标主题Paradoxes两跳的所有主题)非常困难。
Figure 23.2 shows an external visual representation of the same dataset as a node-link graph, where each topic is a node and the linkage between two topics is shown directly with a line. Following the lines by moving our eyes around the image is a fast low-level operation with minimal cognitive load, so higher-level neighborhood finding becomes possible. The placement of the nodes and the routing of the links between them was created automatically by the dot graph drawing program (Gansner, Koutsofois, North, & Vo, 1993).
图 23.2显示了同一数据集的外部视觉表示,即节点链接图,其中每个主题都是一个节点,两个主题之间的链接直接用一条线显示。通过在图像周围移动眼睛来跟随线条是一种快速的低级操作,认知负荷最小,因此可以进行更高级别的邻域查找。节点的放置和它们之间的链接的路由是由点图绘制程序自动创建的(Gansner、Koutsofois、North 和 Vo,1993 年)。
Figure 23.2. Substituting perception for cognition and memory allows us to understand relationships between book topics quickly.
图 23.2.用感知代替认知和记忆使我们能够快速理解书籍主题之间的关系。
We call the mapping of dataset attributes to a visual representation a visual encoding. One of the central problems in visualization is choosing appropriate encodings from the enormous space of possible visual representations, taking into account the characteristics of the human perceptual system, the dataset in question, and the task at hand.
我们将数据集属性到视觉表示的映射称为视觉编码。可视化的核心问题之一是从巨大的可能的视觉表示空间中选择合适的编码,同时考虑到人类感知系统、相关数据集和当前任务的特点。
People have a long history of conveying meaning through static images, dating back to the oldest known cave paintings from over thirty thousand years ago. We continue to visually communicate today in ways ranging from rough sketches on the back of a napkin to the slick graphic design of advertisements. For thousands of years, cartographers have studied the problem of making maps that represent some aspect of the world around us. The first visual representations of abstract, nonspatial datasets were created in the 18th century by William Playfair (Friendly, 2008).
人类通过静态图像传达意义的历史悠久,可以追溯到三万多年前已知最古老的洞穴壁画。今天,我们仍然以各种方式进行视觉交流,从餐巾纸背面的草图到广告的精美图形设计。数千年来,制图师一直在研究如何制作代表我们周围世界某些方面的地图。威廉·普莱费尔 (William Playfair) 在 18 世纪创建了第一个抽象的非空间数据集的视觉表示(Friendly,2008 年)。
Although we have had the power to create moving images for over one hundred and fifty years, creating dynamic images interactively is a more recent development only made possible by the widespread availability of fast computer graphics hardware and algorithms in the past few decades. Static visualizations of tiny datasets can be created by hand, but computer graphics enables interactive visualization of large datasets.
尽管我们有能力创建动态图像已有一百五十多年的历史,但以交互方式创建动态图像却是最近才取得的进展,这要归功于过去几十年来快速计算机图形硬件和算法的广泛普及。微小数据集的静态可视化可以手动创建,但计算机图形可以实现大型数据集的交互式可视化。
When designing a visualization system, we must consider three different kinds of limitations: computational capacity, human perceptual and cognitive capacity, and display capacity.
在设计可视化系统时,我们必须考虑三种不同的限制:计算能力、人类的感知和认知能力以及显示能力。
As with any application of computer graphics, computer time and memory are limited resources and we often have hard constraints. If the visualization system needs to deliver interactive response, then it must use algorithms that can run in a fraction of a second rather than minutes or hours.
与任何计算机图形应用一样,计算机时间和内存都是有限的资源,而且我们经常受到严格限制。如果可视化系统需要提供交互式响应,那么它必须使用可以在几分之一秒内而不是几分钟或几小时内运行的算法。
On the human side, memory and attention must be considered as finite resources. Human memory is notoriously limited, both for long-term recall and for shorter-term working memory. In Section 23.4, we discuss some of the power and limitations of the low-level visual attention mechanisms that carry out massively parallel processing of the visual field. We store surprisingly little information internally in visual working memory, leaving us vulnerable to change blindness, the phenomenon where even very large changes are not noticed if we are attending to something else in our view (Simons, 2000). Moreover, vigilance is also a highly limited resource; our ability to perform visual search tasks degrades quickly, with far worse results after several hours than in the first few minutes (Ware, 2000).
从人类的角度来看,记忆和注意力必须被视为有限的资源。人类的记忆是出了名的有限,无论是长期回忆还是短期工作记忆。在第 23.4 节中,我们讨论了对视野进行大规模并行处理的低级视觉注意机制的一些功能和局限性。我们在视觉工作记忆中存储的信息少得惊人,这让我们容易受到变化视盲的影响,这种现象是指如果我们注意视野中的其他东西,即使是很大的变化也不会被注意到 (Simons, 2000)。此外,警觉性也是一种非常有限的资源;我们执行视觉搜索任务的能力会迅速下降,几个小时后的结果比最初几分钟要差得多 (Ware, 2000)。
Display capacity is a third kind of limitation to consider. Visualization designers often “run out of pixels,” where the resolution of the screen is not large enough to show all desired information simultaneously. The information density of a particular frame is a measure of the amount of information encoded versus the amount of unused space. There is a tradeoff between the benefits of showing as much as possible at once, to minimize the need for navigation and exploration, and the costs of showing too much at once, where the user is overwhelmed by visual clutter.
显示容量是需要考虑的第三种限制。可视化设计师经常会“用尽像素”,即屏幕分辨率不够大,无法同时显示所有所需信息。特定帧的信息密度是编码信息量与未使用空间量的度量。在一次显示尽可能多的信息以最大限度地减少导航和探索需求的好处与一次显示太多信息的成本(用户会被视觉混乱所淹没)之间需要权衡。
Many aspects of a visualization design are driven by the type of the data that we need to look at. For example, is it a table of numbers, or a set of relations between items, or inherently spatial data such as a location on the Earth’s surface or a collection of documents?
可视化设计的许多方面都取决于我们需要查看的数据类型。例如,它是数字表、项目之间的关系集,还是固有的空间数据(如地球表面的位置或文档集合)?
We start by considering a table of data. We call the rows items of data and the columns are dimensions, also known as attributes. For example, the rows might represent people, and the columns might be names, age, height, shirt size, and favorite fruit.
我们首先考虑一个数据表。我们将行称为数据项,将列称为维度,也称为属性。例如,行可能代表人,列可能是姓名、年龄、身高、衬衫尺码和最喜欢的水果。
We distinguish between three types of dimensions: quantitative, ordered, and categorical. Quantitative data, such as age or height, is numerical and we can do arithmetic on it. For example, the quantity of 68 inches minus 42 inches is 26 inches. With ordered data, such as shirt size, we cannot do full-fledged arithmetic, but there is a well-defined ordering. For example, large minus medium is not a meaningful concept, but we know that medium falls between small and large. Categorical data, such as favorite fruit or names, does not have an implicit ordering. We can only distinguish whether two things are the same (apples) or different (apples vs. bananas).
我们区分三种类型的维度:定量、有序和分类。定量数据(例如年龄或身高)是数字,我们可以对其进行算术运算。例如,68 英寸减去 42 英寸等于 26 英寸。对于有序数据(例如衬衫尺寸),我们无法进行全面的算术运算,但存在明确定义的排序。例如,大号减中号不是一个有意义的概念,但我们知道中号介于小号和大号之间。分类数据(例如最喜欢的水果或名字)没有隐含的排序。我们只能区分两件东西是相同(苹果)还是不同(苹果与香蕉)。
Relational data, or graphs, are another data type where nodes are connected by links. One specific kind of graph is a tree, which is typically used for hierarchical data. Both nodes and edges can have associated attributes. The word graph is unfortunately overloaded in visualization. The node-link graphs we discuss here, following the terminology of graph drawing and graph theory, could also be called networks. In the field of statistical graphics, graph is often used for chart, as in the line charts for time-series data shown in Figure 23.10.
关系数据或图形是另一种数据类型,其中节点通过链接连接。一种特定的图形是树,通常用于分层数据。节点和边都可以具有关联属性。不幸的是, “图形”一词在可视化中被过度使用。我们在此讨论的节点链接图,按照图形绘制和图论的术语,也可以称为网络。在统计图形领域,图形通常用于图表,如图 23.10所示的时间序列数据折线图。
Some data is inherently spatial, such as geographic location or a field of measurements at positions in three-dimensional space as in the MRI or CT scans used by doctors to see the internal structure of a person’s body. The information associated with each point in space may be an unordered set of scalar quantities, or indexed vectors, or tensors. In contrast, nonspatial data can be visually encoded using spatial position, but that encoding is chosen by the designer rather than given implicitly in the semantics of the dataset itself. This choice is one of the most central and difficult problems of visualization design.
有些数据本质上是空间数据,例如地理位置或三维空间中位置的测量场,如医生用来查看人体内部结构的 MRI 或 CT 扫描。与空间中每个点相关的信息可能是一组无序的标量、索引向量或张量。相比之下,非空间数据可以使用空间位置进行视觉编码,但该编码由设计者选择,而不是在数据集本身的语义中隐含给出。这种选择是可视化设计中最核心和最困难的问题之一。
The number of data dimensions that need to be visually encoded is one of the most fundamental aspects of the visualization design problem. Techniques that work for a low-dimensional dataset with a few columns will often fail for very high-dimensional datasets with dozens or hundreds of columns. A data dimension may have hierarchical structure, for example with a time series dataset where there are interesting patterns at multiple temporal scales.
需要进行视觉编码的数据维度数量是可视化设计问题的最基本方面之一。适用于具有几列的低维数据集的技术通常不适用于具有数十或数百列的高维数据集。数据维度可能具有层次结构,例如时间序列数据集,其中在多个时间尺度上存在有趣的模式。
The number of data items is also important: a visualization that performs well for a few hundred items often does not scale to millions of items. In some cases the difficulty is purely algorithmic, where a computation would take too long; in others it is an even deeper perceptual problem that even an instantaneous algorithm could not solve, where visual clutter makes the representation unusable by a person. The range of possible values within a dimension may also be relevant.
数据项的数量也很重要:对于几百个项目来说,效果良好的可视化通常无法扩展到数百万个项目。在某些情况下,困难纯粹是算法上的,计算会花费太长时间;在其他情况下,这是一个更深层次的感知问题,即使是即时算法也无法解决,视觉混乱会使人无法使用表示。维度内可能值的范围也可能相关。
Data is often transformed from one type to another as part of a visualization pipeline for solving the domain problem. For example, an original data dimension might be made up of quantitative data: floating point numbers that represent temperature. For some tasks, like finding anomalies in local weather patterns, the raw data might be used directly. For another task, like deciding whether water is an appropriate temperature for a shower, the data might be transformed into an ordered dimension: hot, warm, or cold. In this transformation, most of the detail is aggregated away. In a third example, when making toast, an even more lossy transformation into a categorical dimension might suffice: burned or not burned.
作为解决领域问题的可视化流程的一部分,数据通常会从一种类型转换为另一种类型。例如,原始数据维度可能由定量数据组成:表示温度的浮点数。对于某些任务,如查找当地天气模式中的异常,原始数据可能会直接使用。对于另一项任务,如确定水温是否适合淋浴,数据可能会转换为有序维度:热、温或冷。在这种转换中,大部分细节都被聚合掉了。在第三个例子中,当烤面包时,将损失更大的维度转换为分类维度可能就足够了:烤焦或没烤焦。
The principle of transforming data into derived dimensions, rather than simply visually encoding the data in its original form, is a powerful idea. In Figure 23.10, the original data was an ordered collection of time-series curves. The transformation was to cluster the data, reducing the amount of information to visually encode to a few highly meaningful curves.
将数据转化为派生的原理维度,而不是简单地以原始形式对数据进行视觉编码,这是一个很强大的想法。在图 23.10中,原始数据是时间序列曲线的有序集合。转换是对数据进行聚类,将要视觉编码的信息量减少为几条非常有意义的曲线。
The visualization design process can be split into a cascading set of layers, as shown in Figure 23.3. These layers all depend on each other; the output of the level above is input into the level below.
可视化设计过程可以拆分为一组级联的层,如图 23.3所示。这些层都相互依赖;上一级的输出是下一级的输入。
Figure 23.3. Four nested layers of validation for visualization.
图 23.3.用于可视化的四个嵌套验证层。
A given dataset has many possible visual encodings. Choosing which visual encoding to use can be guided by the specific needs of some intended user. Different questions, or tasks, require very different visual encodings. For example, consider the domain of software engineering. The task of understanding the coverage of a test suite is well supported by the Tarantula interface shown in Figure 23.11. However, the task of understanding the modular decomposition of the software while refactoring the code might be better served by showing its hierarchical structure more directly as a node-link graph.
给定的数据集有许多可能的视觉编码。选择使用哪种视觉编码可以由某些目标用户的特定需求来指导。不同的问题或任务需要非常不同的视觉编码。例如,考虑软件工程领域。图 23.11所示的 Tarantula 界面很好地支持了理解测试套件覆盖率的任务。但是,在重构代码时理解软件的模块化分解的任务可能更适合将其层次结构更直接地显示为节点链接图。
Understanding the requirements of some target audience is a tricky problem. In a human-centered design approach, the visualization designer works with a group of target users over time (C. Lewis & Rieman, 1993). In most cases, users know they need to somehow view their data but cannot directly articulate their needs as clear-cut tasks in terms of operations on data types. The iterative design process includes gathering information from the target users about their problems through interviews and observation of them at work, creating prototypes, and observing how users interact with those prototypes to see how well the proposed solution actually works. The software engineering methodology of requirements analysis can also be useful (Kovitz, 1999).
了解某些目标受众的需求是一个棘手的问题。在以人为本的设计方法中,可视化设计师会与一组目标用户长期合作(C. Lewis & Rieman,1993)。在大多数情况下,用户知道他们需要以某种方式查看他们的数据,但无法直接将他们的需求表达为对数据类型的操作方面的明确任务。迭代设计过程包括通过采访和观察目标用户的工作情况来收集有关他们问题的信息,创建原型,并观察用户如何与这些原型交互,以了解所提出的解决方案的实际效果。需求分析的软件工程方法也很有用(Kovitz,1999)。
After the specific domain problem has been identified in the first layer, the next layer requires abstracting it into a more generic representation as operations on the data types discussed in the previous section. Problems from very different domains can map to the same visualization abstraction. These generic operations include sorting, filtering, characterizing trends and distributions, finding anomalies and outliers, and finding correlation (Amar, Eagan, & Stasko, 2005). They also include operations that are specific to a particular data type, for example following a path for relational data in the form of graphs or trees.
在第一层确定了特定领域问题之后,下一层需要将其抽象为更通用的表示,作为上一节讨论的数据类型的操作。来自不同领域的问题可以映射到相同的可视化抽象。这些通用操作包括排序、过滤、描述趋势和分布、查找异常和离群值以及查找相关性(Amar、Eagan 和 Stasko,2005 年)。它们还包括特定于特定数据类型的操作,例如以图形或树的形式跟踪关系数据的路径。
This abstraction step often involves data transformations from the original raw data into derived dimensions. These derived dimensions are often of a different type than the original data: a graph may be converted into a tree, tabular data may be converted into a graph by using a threshold to decide whether a link should exist based on the field values, and so on.
此抽象步骤通常涉及将原始数据转换为派生维度。这些派生维度通常与原始数据的类型不同:图形可以转换为树,表格数据可以使用阈值转换为图形,以根据字段值确定是否存在链接,等等。
Once an abstraction has been chosen, the next layer is to design appropriate visual encoding and interaction techniques. Section 23.4 covers the principles of visual encoding, and we discuss interaction principles in Sections 23.5. We present techniques that take these principles into account in Sections 23.6 and 23.7.
一旦选择了抽象,下一层就是设计适当的视觉编码和交互技术。第 23.4 节介绍了视觉编码的原理,第 23.5 节讨论了交互原理。第 23.6和23.7节介绍了考虑到这些原理的技术。
A detailed discussion of visualization algorithms is unfortunately beyond the scope of this chapter.
不幸的是,可视化算法的详细讨论超出了本章的范围。
Each of the four layers has different validation requirements.
四个层中的每一层都有不同的验证要求。
The first layer is designed to determine whether the problem is correctly characterized: is there really a target audience performing particular tasks that would benefit from the proposed tool? An immediate way to test assumptions and conjectures is to observe or interview members of the target audience, to ensure that the visualization designer fully understands their tasks. A measurement that cannot be done until a tool has been built and deployed is to monitor its adoption rate within that community, although of course many other factors in addition to utility affect adoption.
第一层旨在确定问题是否被正确描述:是否真的有目标受众执行特定任务并从所提出的工具中受益?检验假设和推测的直接方法是观察或采访目标受众的成员,以确保可视化设计者完全了解他们的任务。在构建和部署工具之前无法进行的测量是监控该工具在该社区中的采用率,尽管当然除了实用性之外还有许多其他因素会影响采用率。
The next layer is used to determine whether the abstraction from the domain problem into operations on specific data types actually solves the desired problem. After a prototype or finished tool has been deployed, a field study can be carried out to observe whether and how it is used by its intended audience. Also, images produced by the system can be analyzed both qualitatively and quantitatively.
下一层用于确定从领域问题抽象为特定数据类型的操作是否真正解决了所需的问题。在部署原型或成品工具后,可以进行实地研究,以观察其目标受众是否以及如何使用它。此外,可以定性和定量分析系统生成的图像。
The purpose of the third layer is to verify that the visual encoding and interaction techniques chosen by the designer effectively communicate the chosen abstraction to the users. An immediate test is to justify that individual design choices do not violate known perceptual and cognitive principles. Such a justification is necessary but not sufficient, since visualization design involves many tradeoffs between interacting choices. After a system is built, it can be tested through formal laboratory studies where many people are asked to do assigned tasks so that measurements of the time required for them to complete the tasks and their error rates can be statistically analyzed.
第三层的目的是验证设计师选择的视觉编码和交互技术是否有效地将所选的抽象传达给用户。一个直接的测试是证明个人设计选择不违反已知的感知和认知原则。这种证明是必要的,但还不够,因为可视化设计涉及交互选择之间的许多权衡。系统构建后,可以通过正式的实验室研究对其进行测试,在实验室研究中,许多人被要求完成分配的任务,以便可以统计分析他们完成任务所需的时间和错误率。
A fourth layer is employed to verify that the algorithm designed to carry out the encoding and interaction choices is faster or takes less memory than previous algorithms. An immediate test is to analyze the computational complexity of the proposed algorithm. After implementation, the actual time performance and memory usage of the system can be directly measured.
第四层用于验证所设计的用于执行编码和交互选择的算法是否比以前的算法更快或占用更少的内存。一个直接的测试是分析所提算法的计算复杂度。实施后,可以直接测量系统的实际时间性能和内存使用情况。
We can describe visual encodings as graphical elements, called marks, that convey information through visual channels. A zero-dimensional mark is a point, a one-dimensional mark is a line, a two-dimensional mark is an area, and a three-dimensional mark is a volume. Many visual channels can encode information, including spatial position, color, size, shape, orientation, and direction of motion. Multiple visual channels can be used to simultaneously encode different data dimensions; for example, Figure 23.4 shows the use of horizontal and vertical spatial position, color, and size to display four data dimensions. More than one channel can be used to redundantly code the same dimension, for a design that displays less information but shows it more clearly.
我们可以将视觉编码描述为通过视觉通道传递信息的图形元素,称为标记。零维标记是一个点,一维标记是一条线,二维标记是一个区域,三维标记是一个体积。许多视觉通道可以编码信息,包括空间位置、颜色、大小、形状、方向和运动方向。可以使用多个视觉通道同时编码不同的数据维度;例如,图 23.4显示了使用水平和垂直空间位置、颜色和大小来显示四个数据维度。可以使用多个通道对同一维度进行冗余编码,以实现显示信息较少但更清晰的设计。
Figure 23.4. The four visual channels of horizontal and vertical spatial position, color, and size are used to encode information in this scatterplot chart Image courtesy George Robertson (Robertson, Fernandez, Fisher, Lee, & Stasko, 2008), © IEEE 2008.
图 23.4。水平和垂直空间位置、颜色和大小四个视觉通道用于在此散点图中编码信息。图片由 George Robertson (Robertson, Fernandez, Fisher, Lee, & Stasko, 2008) 提供,© IEEE 2008。
Important characteristics of visual channels are distinguishability, separability, and popout.
视觉通道的重要特征是可区分性、可分离性和突出性。
Channels are not all equally distinguishable. Many psychophysical experiments have been carried out to measure the ability of people to make precise distinctions about information encoded by the different visual channels. Our abilities depend on whether the data type is quantitative, ordered, or categorical. Figure 23.5 shows the rankings of visual channels for the three data types. Figure 23.6 shows some of the default mappings for visual channels in the Tableau/Polaris system, which take into account the data type.
并非所有通道都具有同等的可区分性。已经进行了许多心理物理实验来测量人们准确区分不同视觉通道编码信息的能力。我们的能力取决于数据类型是定量的、有序的还是分类的。图 23.5显示了三种数据类型的视觉通道排名。图 23.6显示了 Tableau/Polaris 系统中一些默认的视觉通道映射,这些映射考虑了数据类型。
Figure 23.5. Our ability to perceive information encoded by a visual channel depends on the type of data used, from most accurate at the top to least at the bottom. Redrawn and adapted from (Mackinlay, 1986).
图 23.5。我们感知视觉通道编码信息的能力取决于所使用的数据类型,从顶部最准确到底部最不准确。重绘并改编自 (Mackinlay, 1986) 。
Figure 23.6. The Tableau/Polaris system default mappings for four visual channels according to data type. Image courtesy Chris Stolte (Stolte, Tang, & Hanrahan, 2008), © 2008 IEEE.
图 23.6。Tableau /Polaris 系统根据数据类型默认映射四个视觉通道。图片由 Chris Stolte (Stolte、Tang 和 Hanrahan,2008) 提供,© 2008 IEEE。
Spatial position is the most accurate visual channel for all three types of data, and it dominates our perception of a visual encoding. Thus, the two most important data dimensions are often mapped to horizontal and vertical spatial positions.
空间位置是这三类数据最准确的视觉通道,主导着我们对视觉编码的感知。因此,最重要的两个数据维度通常被映射到水平和垂直空间位置。
However, the other channels differ strongly between types. The channels of length and angle are highly discriminable for quantitative data but poor for ordered and categorical, while in contrast hue is very accurate for categorical data but mediocre for quantitative data.
然而,其他通道在不同类型之间差异很大。长度和角度通道对于定量数据具有高度可辨性,但对于有序和分类数据则较差,而色调对于分类数据非常准确,但对于定量数据则一般。
We must always consider whether there is a good match between the dynamic range necessary to show the data dimension and the dynamic range available in the channel. For example, encoding with line width uses a one-dimensional mark and the size channel. There are a limited number of width steps that we can reliably use to visually encode information: a minimum thinness of one pixel is enforced by the screen resolution (ignoring antialiasing to simplify this discussion), and there is a maximum thickness beyond which the object will be perceived as a polygon rather than a line. Line width can work very well to show three or four different values in a data dimension, but it would be a poor choice for dozens or hundreds of values.
我们必须始终考虑显示数据维度所需的动态范围与通道中可用的动态范围之间是否匹配良好。例如,使用线宽进行编码使用一维标记和尺寸通道。我们可以可靠地使用有限数量的宽度步骤来对信息进行视觉编码:屏幕分辨率强制执行一个像素的最小厚度(忽略抗锯齿以简化此讨论),并且存在最大厚度,超过该厚度,对象将被视为多边形而不是线。线宽可以很好地显示数据维度中的三个或四个不同值,但对于数十个或数百个值来说,它将是一个糟糕的选择。
Some visual channels are integral, fused together at a pre-conscious level, so they are not good choices for visually encoding different data dimensions. Others are separable, without interactions between them during visual processing, and are safe to use for encoding multiple dimensions. Figure 23.7 shows two channel pairs. Color and position are highly separable. We can see that horizontal size and vertical size are not so easy to separate, because our visual system automatically integrates these together into a unified perception of area. Size interacts with many channels: as the size of an object grows smaller, it becomes more difficult to distinguish its shape or color.
一些视觉通道是不可分割的,在前意识层面融合在一起,因此它们不是对不同数据维度进行视觉编码的理想选择。其他视觉通道是可分离的,在视觉处理过程中它们之间不会发生相互作用,可以安全地用于编码多个维度。图 23.7显示了两个通道对。颜色和位置是高度可分离的。我们可以看到,水平尺寸和垂直尺寸不那么容易分离,因为我们的视觉系统会自动将它们整合在一起形成一个统一的面积感知。尺寸与许多通道相互作用:随着物体的尺寸变小,区分其形状或颜色变得越来越困难。
Figure 23.7. Color and location are separable channels well suited to encode different data dimensions, but the horizontal size and and vertical size channels are automatically fused into an integrated perception of area. Redrawn after (Ware, 2000).
图 23.7。颜色和位置是可分离的通道,非常适合编码不同的数据维度,但水平尺寸和垂直尺寸通道会自动融合为对面积的综合感知。根据 (Ware, 2000) 重新绘制。
We can selectively attend to a channel so that items of a particular type “pop out” visually, as discussed in Section 19.4.3. An example of visual popout is when we immediately spot the red item amidst a sea of blue ones, or distinguish the circle from the squares. Visual popout is powerful and scalable because it occurs in parallel, without the need for conscious processing of the items one by one. Many visual channels have this popout property, including not only the list above but also curvature, flicker, stereoscopic depth, and even the direction of lighting. However, in general we can only take advantage of popout for one channel at a time. For example, a white circle does not pop out from a group of circles and squares that can be white or black, as shown in Figure 19.46. When we need to search across more than one channel simultaneously, the length of time it takes to find the target object depends linearly on the number of objects in the scene.
我们可以选择性地关注某个通道,以便某一类型的项目在视觉上“弹出”,如第 19.4.3 节所述。视觉弹出的一个例子是,我们立即在一片蓝色项目中发现红色项目,或者区分出圆圈和正方形。视觉弹出功能强大且可扩展,因为它是并行发生的,而无需有意识地逐一处理项目。许多视觉通道都具有这种弹出属性,不仅包括上面列出的,还包括曲率、闪烁、立体深度,甚至光照方向。然而,一般来说,我们一次只能利用一个通道的弹出功能。例如,一个白色的圆圈不会从一组可以是白色或黑色的圆圈和正方形中弹出,如图 19.46所示。当我们需要同时在多个通道中搜索时,找到目标物体所需的时间长度与场景中的物体数量线性相关。
Color can be a very powerful channel, but many people do not understand its properties and use it improperly. As discussed in Section 19.2.2, we can consider color in terms of three separate visual channels: hue, saturation, and lightness. Region size strongly affects our ability to sense color. Color in small regions is relatively difficult to perceive, and designers should use bright, highly saturated colors to ensure that the color coding is distinguishable. The inverse situation is true when colored regions are large, as in backgrounds, where low saturation pastel colors should be used to avoid blinding the viewer.
颜色可能是一个非常强大的通道,但许多人不了解其属性并滥用它。如第 19.2.2 节所述,我们可以从三个独立的视觉通道来考虑颜色:色调、饱和度和亮度。区域大小对我们感知颜色的能力有很大影响。小区域中的颜色相对难以感知,设计师应使用明亮、高饱和度的颜色来确保颜色编码可区分。当彩色区域较大时,情况正好相反,例如在背景中,应使用低饱和度的柔和颜色以避免使观看者眼花缭乱。
Hue is a very strong cue for encoding categorical data. However, the available dynamic range is very limited. People can reliably distinguish only around a dozen hues when the colored regions are small and scattered around the display. A good guideline for color coding is to keep the number of categories less than eight, keeping in mind that the background and the neutral object color also count in the total.
色调是编码分类数据的有力线索。然而,可用的动态范围非常有限。当彩色区域很小且分散在显示屏周围时,人们只能可靠地区分大约十几种色调。颜色编码的一个好准则是将类别数量保持在 8 个以下,同时记住背景和中性物体颜色也计入总数。
For ordered data, lightness and saturation are effective because they have an implicit perceptual ordering. People can reliably order by lightness, always placing gray in between black and white. With saturation, people reliably place the less saturated pink between fully saturated red and zero-saturation white. However, hue is not as as good a channel for ordered data because it does not have an implicit perceptual ordering. When asked to create an ordering of red, blue, green, and yellow, people do not all give the same answer. People can and do learn conventions, such as green-yellow-red for traffic lights, or the order of colors in the rainbow, but these constructions are at a higher level than pure perception. Ordered data is typically shown with a discrete set of color values.
对于有序数据,亮度和饱和度是有效的,因为它们具有隐含的感知顺序。人们可以可靠地按亮度排序,始终将灰色置于黑色和白色之间。对于饱和度,人们可以可靠地将饱和度较低的粉红色置于完全饱和的红色和零饱和度的白色之间。但是,色调对于有序数据来说并不是一个好的渠道,因为它没有隐含的感知顺序。当被要求创建红色、蓝色、绿色和黄色的顺序时,人们不会都给出相同的答案。人们可以而且确实会学习惯例,例如交通信号灯的绿黄红顺序,或彩虹中颜色的顺序,但这些构造比纯粹的感知处于更高的层次。有序数据通常用一组离散的颜色值显示。
Quantitative data is shown with a colormap, a range of color values that can be continuous or discrete. A very unfortunate default in many software packages is the rainbow colormap, as shown in Figure 23.8. The standard rainbow scale suffers from three problems. First, hue is used to indicate order. A better choice would be to use lightness because it has an implicit perceptual ordering. Even more importantly, the human eye responds most strongly to luminance. Second, the scale is not perceptually linear: equal steps in the continuous range are not perceived as equal steps by our eyes. Figure 23.8 shows an example, where the rainbow colormap obfuscates the data. While the range from –2000 to –1000 has three distinct colors (cyan, green, and yellow), a range of the same size from –1000 to 0 simply looks yellow throughout. The graphs on the right show that the perceived value is strongly tied to the luminance, which is not even monotonically increasing in this scale.
定量数据显示为颜色图,可以是连续的也可以是离散的颜色值范围。许多软件包中一个非常不幸的默认设置是彩虹色图,如图 23.8所示。标准彩虹色阶存在三个问题。首先,色相用于表示顺序。更好的选择是使用亮度,因为它具有隐含的感知顺序。更重要的是,人眼对亮度反应最强烈。其次,该尺度在感知上不是线性的:连续范围内的相等步长在我们眼睛看来并不是相等的步长。图 23.8显示了一个示例,其中彩虹色图混淆了数据。虽然从-2000到-1000的范围有三种不同的颜色(青色、绿色和黄色),但从 -1000 到 0 的相同大小的范围始终看起来都是黄色。右侧的图表显示感知值与亮度密切相关,而亮度在这个尺度上甚至不是单调递增的。
Figure 23.8. The standard rainbow colormap has two defects: it uses hue to denote ordering, and it is not perceptually isolinear. Image courtesy Bernice Rogowitz.
图 23.8。标准彩虹色图有两个缺陷:它使用色调来表示顺序,并且它在感知上不是等线性的。图片由 Bernice Rogowitz 提供。
In contrast, Figure 23.9 shows the same data with a more appropriate colormap, where the lightness increases monotonically. Hue is used to create a semantically meaningful categorization: the viewer can discuss structure in the dataset, such as the dark blue sea, the cyan continental shelf, the green lowlands, and the white mountains.
相比之下,图 23.9显示了具有更合适颜色图的相同数据,其中亮度单调增加。色调用于创建具有语义意义的分类:查看者可以讨论数据集中的结构,例如深蓝色的海洋、青色的大陆架、绿色的低地和白色的山脉。
Figure 23.9. The structure of the same dataset is far more clear with a colormap where monotonically increasing lightness is used to show ordering and hue is used instead for segmenting into categorical regions. Image courtesy Bernice Rogowitz.
图 23.9。使用颜色图可以更清晰地显示同一数据集的结构,其中单调增加的亮度用于显示排序,而色调则用于划分分类区域。图片由 Bernice Rogowitz 提供。
In both the discrete and continuous cases, colormaps should take into account whether the data is sequential or diverging. The ColorBrewer application (www.colorbrewer.org) is an excellent resource for colormap construction (Brewer, 1999).
无论是离散情况还是连续情况,颜色图都应考虑数据是连续的还是发散的。ColorBrewer 应用程序 ( www.colorbrewer.org ) 是构建颜色图的绝佳资源 (Brewer, 1999)。
Another important issue when encoding with color is that a significant fraction of the population, roughly 10% of men, is red-green color deficient. If a coding using red and green is chosen because of conventions in the target domain, redundantly coding lightness or saturation in addition to hue is wise. Tools such as the website http://www.vischeck.com should be used to check whether a color scheme is distinguishable to people with color deficient vision.
使用颜色进行编码时的另一个重要问题是,相当一部分人口(约 10% 的男性)患有红绿色盲。如果由于目标域中的惯例而选择使用红色和绿色进行编码,则除了色调之外,对亮度或饱和度进行冗余编码是明智的。应使用网站http://www.vischeck.com等工具来检查色觉缺陷者是否能够区分配色方案。
The question of whether to use two or three channels for spatial position has been extensively studied. When computer-based visualization began in the late 1980s, and interactive 3D graphics was a new capability, there was a lot of enthusiasm for 3D representations. As the field matured, researchers began to understand the costs of 3D approaches when used for abstract datasets (Ware, 2001).
关于是否使用两个或三个通道来表示空间位置的问题已经得到了广泛的研究。当基于计算机的可视化在 20 世纪 80 年代末开始出现,并且交互式 3D 图形是一项新功能时,人们对 3D 表示法非常热衷。随着该领域的成熟,研究人员开始了解 3D 方法用于抽象数据集时的成本(Ware,2001 年)。
Occlusion, where some parts of the dataset are hidden behind others, is a major problem with 3D. Although hidden surface removal algorithms such as z-buffers and BSP trees allow fast computation of a correct 2D image, people must still synthesize many of these images into an internal mental map. When people look at realistic scenes made from familiar objects, usually they can quickly understand what they see. However, when they see an unfamiliar dataset, where a chosen visual encoding maps abstract dimensions into spatial positions, understanding the details of its 3D structure can be challenging even when they can use interactive navigation controls to change their 3D viewpoint. The reason is once again the limited capacity of human working memory (Plumlee & Ware, 2006).
遮挡(数据集的某些部分隐藏在其他部分后面)是 3D 的一个主要问题。尽管隐藏表面消除算法(例如 z 缓冲区和 BSP 树)可以快速计算出正确的 2D 图像,但人们仍然必须将许多这些图像合成到内部思维导图中。当人们看到由熟悉物体构成的真实场景时,通常他们能够快速理解他们所看到的内容。然而,当他们看到一个不熟悉的数据集时,所选的视觉编码将抽象维度映射到空间位置,理解其 3D 结构的细节可能具有挑战性,即使他们可以使用交互式导航控件来改变他们的 3D 视点。原因再次是人类工作记忆的容量有限(Plumlee & Ware,2006)。
Another problem with 3D is perspective distortion. Although real-world objects do indeed appear smaller when they are further from our eyes, foreshortening makes direct comparison of object heights difficult (Tory, Kirkpatrick, Atkins, & Möller, 2006). Once again, although we can often judge the heights of familiar objects in the real world based on past experience, we cannot necessarily do so with completely abstract data that has a visual encoding where the height conveys meaning. For example, it is more difficult to judge bar heights in a 3D bar chart than in multiple horizontally aligned 2D bar charts.
3D 的另一个问题是透视失真。尽管现实世界中的物体确实在离我们的眼睛较远时看起来较小,但透视缩短使得直接比较物体高度变得困难(Tory、Kirkpatrick、Atkins 和 Möller,2006 年)。同样,虽然我们通常可以根据过去的经验判断现实世界中熟悉物体的高度,但我们不一定能通过完全抽象的数据做到这一点,因为这些数据具有高度传达含义的视觉编码。例如,判断 3D 条形图中的条形高度比判断多个水平对齐的 2D 条形图中的条形高度更困难。
Another problem with unconstrained 3D representations is that text at arbitrary orientations in 3D space is far more difficult to read than text aligned in the 2D image plane (Grossman, Wigdor, & Balakrishnan, 2007).
不受约束的 3D 表示的另一个问题是,3D 空间中任意方向的文本比在 2D 图像平面上对齐的文本更难阅读(Grossman、Wigdor & Balakrishnan,2007)。
Figure 23.10 illustrates how carefully chosen 2D views of an abstract dataset can avoid the problems with occlusion and perspective distortion inherent in 3D views. The top view shows a 3D representation created directly from the original time-series data, where each cross-section is a 2D time-series curve showing power consumption for one day, with one curve for each day of the year along the extruded third axis. Although this representation is straightforward to create, we can only see large-scale patterns such as the higher consumption during working hours and the seasonal variation between winter and summer. To create the 2D linked views at the bottom, the curves were hierarchically clustered, and only aggregate curves representing the top clusters are drawn superimposed in the same 2D frame. Direct comparison between the curve heights at all times of the day is easy because there is no perspective distortion or occlusion. The same color coding is used in the calendar view, which is very effective for understanding temporal patterns.
图 23.10说明了如何精心选择抽象数据集的 2D 视图来避免 3D 视图固有的遮挡和透视失真问题。顶视图显示了直接从原始时间序列数据创建的 3D 表示,其中每个横截面都是一条 2D 时间序列曲线,显示一天的功耗,沿着拉伸的第三轴,一年中的每一天都有一条曲线。虽然这种表示很容易创建,但我们只能看到大规模的模式,例如工作时间内的较高消耗以及冬季和夏季之间的季节性变化。为了在底部创建 2D 链接视图,曲线是分层聚类的,并且只绘制代表顶部聚类的聚合曲线,并叠加在同一 2D 帧中。由于没有透视失真或遮挡,因此很容易直接比较一天中所有时间的曲线高度。日历视图中使用相同的颜色编码,这对于理解时间模式非常有效。
Figure 23.10. Top: A 3D representation of this time series dataset introduces the problems of occlusion and perspective distortion. Bottom: The linked 2D views of derived aggregate curves and the calendar allow direct comparison and show more fine-grained patterns. Image courtesy Jarke van Wijk (van Wijk & van Selow, 1999), © 1999 IEEE.
图 23.10。顶部:此时间序列数据集的 3D 表示引入了遮挡和透视失真问题。底部:导出的聚合曲线和日历的链接 2D 视图允许直接比较并显示更细粒度的模式。图片由 Jarke van Wijk(van Wijk & van Selow,1999 年)提供,© 1999 IEEE。
In contrast, if a dataset consists of inherently 3D spatial data, such as showing fluid flow over an airplane wing or a medical imaging dataset from an MRI scan, then the costs of a 3D view are outweighed by its benefits in helping the user construct a useful mental model of the dataset structure.
相反,如果数据集本质上由 3D 空间数据组成,例如显示飞机机翼上的流体流动或来自 MRI 扫描的医学成像数据集,那么 3D 视图的成本将被其在帮助用户构建数据集结构的有用心理模型方面带来的好处所抵消。
Text in the form of labels and legends is a very important factor in creating visualizations that are useful rather than simply pretty. Axes and tick marks should be labeled. Legends should indicate the meaning of colors, whether used as discrete patches or in continuous color ramps. Individual items in a dataset typically have meaningful text labels associated with them. In many cases showing all labels at all times would result in too much visual clutter, so labels can be shown for a subset of the items using label positioning algorithms that show labels at a desired density while avoiding overlap (Luboschik, Schumann, & Cords, 2008). A straightforward way to choose the best label to represent a group of items is to use a greedy algorithm based on some measure of label importance, but synthesizing a new label based on the characteristics of the group remains a difficult problem. A more interaction-centric approach is to only show labels for individual items based on an interactive indication from the user.
标签和图例形式的文本对于创建实用的可视化效果(而不仅仅是美观)非常重要。轴和刻度标记应带有标签。图例应指示颜色的含义,无论是用作离散色块还是连续色阶。数据集中的单个项目通常具有与之关联的有意义的文本标签。在许多情况下,始终显示所有标签会导致视觉混乱,因此可以使用标签定位算法显示项目子集的标签,该算法以所需的密度显示标签,同时避免重叠(Luboschik、Schumann 和 Cords,2008)。选择最佳标签来表示一组项目的直接方法是使用基于某种标签重要性度量的贪婪算法,但根据组的特征合成新标签仍然是一个难题。一种更以交互为中心的方法是仅根据用户的交互指示显示单个项目的标签。
Several principles of interaction are important when designing a visualization. Low-latency visual feedback allows users to explore more fluidly, for example by showing more detail when the cursor simply hovers over an object rather than requiring the user to explicitly click. Selecting items is a fundamental operation when interacting with large datasets, as is visually indicating the selected set with highlighting. Color coding is a common form of highlighting, but other channels can also be used.
设计可视化时,交互的几个原则非常重要。低延迟视觉反馈让用户可以更流畅地探索,例如,当光标悬停在对象上时,无需用户明确点击即可显示更多细节。选择项目是与大型数据集交互时的基本操作,通过突出显示来直观地指示所选集也是如此。颜色编码是一种常见的突出显示形式,但也可以使用其他通道。
Many forms of interaction can be considered in terms of what aspect of the display they change. Navigation can be considered a change of viewport. Sorting is a change to the spatial ordering; that is, changing how data is mapped to the spatial position visual channel. The entire visual encoding can also be changed.
很多形式的交互都可以从改变显示内容的角度来考虑。导航可以看作是视口的变化。排序是空间顺序的变化;也就是说,改变数据如何映射到空间位置的视觉通道。整个视觉编码也可以改变。
The influential mantra “Overview first, zoom and filter, details on demand”(Shneiderman, 1996) elucidates the role of interaction and navigation in visualization design. Overviews help the user notice regions where further investigation might be productive, whether through spatial navigation or through filtering. As we discuss below, details can be presented in many ways: with popups from clicking or cursor hovering, in a separate window, and by changing the layout on the fly to make room to show additional information.
颇具影响力的口号“概览优先,缩放和过滤,按需显示详细信息”(Shneiderman,1996)阐明了交互和导航在可视化设计中的作用。概览可帮助用户注意到可能有助于进一步调查的区域,无论是通过空间导航还是通过过滤。正如我们下面所讨论的,详细信息可以通过多种方式呈现:通过点击或光标悬停弹出窗口、在单独的窗口中,以及通过动态更改布局以腾出空间来显示更多信息。
Interactivity has both power and cost. The benefit of interaction is that people can explore a larger information space than can be understood in a single static image. However, a cost to interaction is that it requires human time and attention. If the user must exhaustively check every possibility, use of the visualization system may degenerate into human-powered search. Automatically detecting features of interest to explicitly bring to the user’s attention via the visual encoding is a useful goal for the visualization designer. However, if the task at hand could be completely solved by automatic means, there would be no need for a visualization in the first place. Thus, there is always a tradeoff between finding automatable aspects and relying on the human in the loop to detect patterns.
交互既有功能,也有成本。交互的好处是人们可以探索比单个静态图像所能理解的更大的信息空间。然而,交互的代价是它需要人类的时间和注意力。如果用户必须详尽地检查每一种可能性,可视化系统的使用可能会退化为人工搜索。自动检测感兴趣的特征,通过视觉编码明确引起用户的注意,对于可视化设计师来说是一个有用的目标。然而,如果手头的任务可以完全通过自动化手段解决,那么就根本不需要可视化了。因此,在寻找可自动化的方面和依靠循环中的人类来检测模式之间总是存在权衡。
Animation shows change using time. We distinguish animation, where successive frames can only be played, paused, or stopped, from true interactive control. There is considerable evidence that animated transitions can be more effective than jump cuts, by helping people track changes in object positions or camera viewpoints (Heer & Robertson, 2007). Although animation can be very effective for narrative and storytelling, it is often used ineffectively in a visualization context (Tversky, Morrison, & Betrancourt, 2002). It might seem obvious to show data that changes over time by using animation, a visual modality that changes over time. However, people have difficulty in making specific comparisons between individual frames that are not contiguous when they see an animation consisting of many frames. The very limited capacity of human visual memory means that we are much worse at comparing memories of things that we have seen in the past than at comparing things that are in our current field of view. For tasks requiring comparison between up to several dozen frames, side-by-side comparison is often more effective than animation. Moreover, if the number of objects that change between frames is large, people will have a hard time tracking everything that occurs (Robertson et al., 2008). Narrative animations are carefully designed to avoid having too many actions occurring simultaneously, whereas a dataset being visualized has no such constraint. For the special case of just two frames with a limited amount of change, the very simple animation of flipping back and forth between the two can be a useful way to identify the differences between them.
动画通过时间显示变化。我们将动画与真正的交互式控制区分开来,动画中连续的帧只能播放、暂停或停止。有大量证据表明,动画过渡比跳跃切换更有效,因为它可以帮助人们跟踪物体位置或摄像机视点的变化(Heer & Robertson,2007)。尽管动画对于叙事和讲故事非常有效,但它在可视化环境中使用时通常效果不佳(Tversky、Morrison 和 Betrancourt,2002)。使用动画(一种随时间变化的视觉模式)显示随时间变化的数据似乎很明显。但是,当人们看到由许多帧组成的动画时,他们很难对不连续的单个帧进行具体比较。人类视觉记忆的容量非常有限,这意味着我们在比较过去见过的事物的记忆方面比比较当前视野中的事物要差得多。对于需要比较多达几十帧的任务,并排比较通常比动画更有效。此外,如果帧间变化的对象数量很大,人们将很难跟踪发生的所有事情(Robertson 等人,2008 年)。叙事动画经过精心设计,以避免同时发生太多动作,而可视化的数据集则没有这样的限制。对于只有两帧且变化量有限的特殊情况,在两帧之间来回翻转的非常简单的动画可以成为识别它们之间差异的有效方法。
A very fundamental visual encoding choice is whether to have a single composite view showing everything in the same frame or window, or to have multiple views adjacent to each other.
一个非常基本的视觉编码选择是是否使用单个复合视图在同一框架或窗口中显示所有内容,或者使用多个彼此相邻的视图。
When there are only one or two data dimensions to encode, then horizontal and vertical spatial position are the obvious visual channel to use, because we perceive them most accurately and position has the strongest influence on our internal mental model of the dataset. The traditional statistical graphics displays of line charts, bar charts, and scatterplots all use spatial ordering of marks to encode information. These displays can be augmented with additional visual channels, such as color and size and shape, as in the scatterplot shown in Figure 23.4.
当只有一两个数据维度需要编码时,水平和垂直空间位置显然是要使用的视觉通道,因为我们对它们的感知最准确,而且位置对数据集的内部心理模型影响最大。传统的统计图形显示,如折线图、条形图和散点图,都使用标记的空间顺序来编码信息。这些显示可以通过其他视觉通道(如颜色、大小和形状)来增强,如图 23.4所示的散点图。
The simplest possible mark is a single pixel. In pixel-oriented displays, the goal is to provide an overview of as many items as possible. These approaches use the spatial position and color channels at a high information density, but preclude the use of the size and shape channels. Figure 23.11 shows the Tarantula software visualization tool (Jones et al., 2002), where most of the screen is devoted to an overview of source code using one-pixel high lines (Eick, Steffen, & Sumner, 1992). The color and brightness of each line shows whether it passed, failed, or had mixed results when executing a suite of test cases.
最简单的标记是单个像素。在面向像素的显示中,目标是提供尽可能多的项目的概览。这些方法使用高信息密度的空间位置和颜色通道,但排除使用大小和形状通道。图 23.11显示了 Tarantula 软件可视化工具 (Jones 等人,2002),其中大部分屏幕都用于使用一像素高的线条 (Eick、Steffen 和 Sumner,1992) 概述源代码。每条线的颜色和亮度显示在执行一组测试用例时它是通过、失败还是结果好坏参半。
Figure 23.11. Tarantula shows an overview of source code using one-pixel lines color coded by execution status of a software test suite. Image courtesy John Stasko (Jones, Harrold, & Stasko, 2002).
图 23.11。Tarantula使用按软件测试套件的执行状态进行颜色编码的单像素线条显示源代码概览。图片由 John Stasko (Jones、Harrold 和 Stasko,2002 年) 提供。
Multiple items can be superimposed in the same frame when their spatial position is compatible. Several lines can be shown in the same line chart, and many dots in the same scatterplot, when the axes are shared across all items. One benefit of a single shared view is that comparing the position of different items is very easy. If the number of items in the dataset is limited, then a single view will often suffice. Visual layering can extend the usefulness of a single view when there are enough items that visual clutter becomes a concern. Figure 23.12 shows how a redundant combination of the size, saturation, and brightness channels serves to distinguish a foreground layer from a background layer when the user moves the cursor over a block of words.
当多个项目的空间位置兼容时,它们可以叠加在同一帧中。当所有项目共享轴时,可以在同一个折线图中显示多条线,在同一个散点图中显示许多点。单一共享视图的一个好处是比较不同项目的位置非常容易。如果数据集中的项目数量有限,那么单个视图通常就足够了。当项目太多以至于视觉混乱成为问题时,视觉分层可以扩展单个视图的实用性。图 23.12显示了当用户将光标移到一组单词上时,大小、饱和度和亮度通道的冗余组合如何区分前景层和背景层。
Figure 23.12. Visual layering with size, saturation, and brightness in the Constellation system (Munzner, 2000).
图 23.12.星座系统中大小、饱和度和亮度的视觉分层(Munzner,2000 年)。
We have been discussing the idea of visual encoding using simple marks, where a single mark can only have one value for each visual channel used. With more complex marks, which we will call glyphs, there is internal structure where sub-regions have different visual channel encodings.
我们一直在讨论使用简单标记进行视觉编码的想法,其中单个标记对于所使用的每个视觉通道只能有一个值。对于更复杂的标记(我们将其称为字形) ,存在内部结构,其中子区域具有不同的视觉通道编码。
Designing appropriate glyphs has the same challenges as designing visual encodings. Figure 23.13 shows a variety of glyphs, including the notorious faces originally proposed by Chernoff. The danger of using faces to show abstract data dimensions is that our perceptual and emotional response to different facial features is highly nonlinear in a way that is not fully understood, but the variability is greater than between the visual channels that we have discussed so far. We are probably far more attuned to features that indicate emotional state, such as eyebrow orientation, than other features, such as nose size or face shape.
设计合适的字形与设计视觉编码面临同样的挑战。图 23.13显示了各种字形,包括 Chernoff 最初提出的臭名昭著的面孔。使用面孔来显示抽象数据维度的危险在于,我们对不同面部特征的感知和情感反应是高度非线性的,这种非线性目前尚不完全清楚,但这种变化比我们迄今为止讨论过的视觉通道之间的变化更大。我们可能对表示情绪状态的特征(如眉毛方向)的适应性远高于其他特征(如鼻子大小或脸型)。
Figure 23.13. Complex marks, which we call glyphs, have subsections that visually encode different data dimensions. Image courtesy Matt Ward (M. O. Ward, 2002).
图 23.13。复杂标记(我们称之为字形)具有子部分,这些子部分在视觉上对不同的数据维度进行编码。图片由 Matt Ward (MO Ward, 2002) 提供。
Complex glyphs require significant display area for each glyph, as shown in Figure 23.14 where miniature bar charts show the value of four different dimensions at many points along a spiral path. Simpler glyphs can be used to create a global visual texture, the glyph size is so small that individual values cannot be read out without zooming, but region boundaries can be discerned from the overview level. Figure 23.15 shows an example using stick figures of the kind in the upper right in Figure 23.13. Glyphs may be placed at regular intervals, or in data-driven spatial positions using an original or derived data dimension.
复杂的字形需要为每个字形提供足够的显示区域,如图 23.14所示,其中的微型条形图显示了螺旋路径上许多点的四个不同维度的值。可以使用更简单的字形来创建全局视觉纹理,字形尺寸非常小,以至于不缩放就无法读出单个值,但可以从概览级别辨别出区域边界。图 23.15显示了使用图 23.13右上角的那种火柴人的示例。字形可以按规则间隔放置,也可以使用原始或派生数据维度放置在数据驱动的空间位置。
Figure 23.14. Complex glyphs require significant display area so that the encoded information can be read. Image courtesy Matt Ward, created with the SpiralGlyphics software (M. O. Ward, 2002).
图 23.14。复杂的字形需要较大的显示区域才能读取编码信息。图片由 Matt Ward 提供,使用 SpiralGlyphics 软件创建(MO Ward,2002 年)。
Figure 23.15. A dense array of simple glyphs. Image courtesy Georges Grinstein (S. Smith, Grinstein, & Bergeron, 1991), © 1991 I EEE.
图 23.15。简单字形的密集阵列。图片由 Georges Grinstein (S. Smith, Grinstein, & Bergeron, 1991) 提供,© 1991 I EEE。
We now turn from approaches with only a single frame to those which use multiple views that are linked together. The most common form of linkage is linked highlighting, where items selected in one view are highlighted in all others. In linked navigation, movement in one view triggers movement in the others.
现在,我们从仅使用单个框架的方法转向使用多个链接在一起的视图的方法。最常见的链接形式是链接突出显示,其中一个视图中选定的项目会在所有其他视图中突出显示。在链接导航中,一个视图中的移动会触发其他视图中的移动。
There are many kinds of multiple-view approaches. In what is usually called simply the multiple-view approach, the same data is shown in several views, each of which has a different visual encoding that shows certain aspects of the dataset most clearly. The power of linked highlighting across multiple visual encodings is that items that fall in a contiguous region in one view are often distributed very differently in the other views. In the small-multiples approach, each view has the same visual encoding for different datasets, usually with shared axes between frames so that comparison of spatial position between them is meaningful. Side-by-side comparison with small multiples is an alternative to the visual clutter of superimposing all the data in the same view, and to the human memory limitations of remembering previously seen frames in an animation that changes over time.
多视图方法有很多种。在通常简称为多视图的方法中,相同的数据显示在多个视图中,每个视图都有不同的视觉编码,可以最清楚地显示数据集的某些方面。跨多个视觉编码的链接突出显示的强大之处在于,在一个视图中位于连续区域的项目在其他视图中的分布通常非常不同。在小倍数方法中,每个视图对不同的数据集具有相同的视觉编码,通常在帧之间共享轴,以便比较它们之间的空间位置是有意义的。与小倍数并排比较是一种替代方法,可以避免在同一视图中叠加所有数据造成的视觉混乱,也可以避免人类记忆力有限,无法记住随时间变化的动画中之前看过的帧。
The overview-and-detail approach is to have the same data and the same visual encoding in two views, where the only difference between them is the level of zooming. In most cases, the overview uses much less display space than the detail view. The combination of overview and detail views is common outside of visualization in many tools ranging from mapping software to photo editing. With a detail-on-demand approach, another view shows more information about some selected item, either as a popup window near the cursor or in a permanent window in another part of the display.
概览和细节方法是指在两个视图中显示相同的数据和相同的视觉编码,它们之间的唯一区别是缩放级别。在大多数情况下,概览视图占用的显示空间比细节视图少得多。概览视图和细节视图的组合在可视化之外的很多工具中很常见,从地图软件到照片编辑。使用按需显示细节的方法,另一个视图会显示有关某些选定项目的更多信息,要么作为光标附近的弹出窗口,要么作为显示屏另一部分的永久窗口。
Determining the most appropriate spatial position of the views themselves with respect to each other can be as significant a problem as determining the spatial position of marks within a single view. In some systems, the location of the views is arbitrary and left up to the window system or the user. Aligning the views allows precise comparison between them, either vertically, horizontally, or with an array for both directions. Just as items can be sorted within a view, views can be sorted within a display, typically with respect to a derived variable measuring some aspect of the entire view as opposed to an individual item within it.
确定视图本身相对于彼此的最合适空间位置可能与确定单个视图内标记的空间位置一样重要。在某些系统中,视图的位置是任意的,由窗口系统或用户决定。对齐视图允许在垂直、水平或使用两个方向的数组之间进行精确比较。就像可以在视图中对项目进行排序一样,也可以在显示中对视图进行排序,通常是根据测量整个视图的某个方面的派生变量而不是其中的单个项目进行排序。
Figure 23.16 shows a visualization of census data that uses many views. In addition to geographic information, the demographic information for each county includes population, density, gender, median age, percent change since 1990, and proportions of major ethnic groups. The visual encodings used include geographic, scatterplot, parallel coordinate, tabular, and matrix views. The same color encoding is used across all the views, with a legend in the bottom middle. The scatterplot matrix shows linked highlighting across all views, where the blue items are close together in some views and scattered in others. The map in the upper-left corner is an overview for the large detail map in the center. The tabular views allow direct sorting by and selection within a dimension of interest.
图 23.16显示了使用多种视图的人口普查数据可视化。除了地理信息外,每个县的人口统计信息还包括人口、密度、性别、平均年龄、自 1990 年以来的百分比变化以及主要族群的比例。使用的视觉编码包括地理、散点图、平行坐标、表格和矩阵视图。所有视图都使用相同的颜色编码,中间底部有一个图例。散点图矩阵显示所有视图中的链接突出显示,其中蓝色项目在某些视图中靠得很近,而在其他视图中则分散。左上角的地图是中心大型详细地图的概览。表格视图允许直接按感兴趣的维度进行排序和选择。
Figure 23.16. The Improvise toolkit was used to create this multiple-view visualization. Image courtesy Chris Weaver.
图 23.16。Improvise工具包用于创建此多视图可视化。图片由 Chris Weaver 提供。
The visual encoding techniques that we have discussed so far show all of the items in a dataset. However, many datasets are so large that showing everything simultaneously would result in so much visual clutter that the visual representation would be difficult or impossible for a viewer to understand. The main strategies to reduce the amount of data shown are overviews and aggregation, filtering and navigation, the focus+context techniques, and dimensionality reduction.
到目前为止,我们讨论过的视觉编码技术显示了数据集中的所有项目。但是,许多数据集非常大,以至于同时显示所有内容会导致视觉混乱,以至于观看者很难或无法理解视觉表示。减少显示数据量的主要策略是概览和聚合、过滤和导航、焦点+上下文技术以及降维。
With tiny datasets, a visual encoding can easily show all data dimensions for all items. For datasets of medium size, an overview that shows information about all items can be constructed by showing less detail for each item. Many datasets have internal or derivable structure at multiple scales. In these cases, a multiscale visual representation can provide many levels of overview, rather than just a single level. Overviews are typically used as a starting point to give users clues about where to drill down to inspect in more detail.
对于小型数据集,视觉编码可以轻松显示所有项目的所有数据维度。对于中等大小的数据集,可以通过显示每个项目的较少细节来构建显示所有项目信息的概览。许多数据集在多个尺度上具有内部或可推导的结构。在这些情况下,多尺度视觉表示可以提供多个层次的概览,而不仅仅是一个层次。概览通常用作起点,为用户提供线索,让他们知道在哪里可以深入研究更详细的内容。
For larger datasets, creating an overview requires some kind of visual summarization. One approach to data reduction is to use an aggregate representation where a single visual mark in the overview explicitly represents many items.
对于较大的数据集,创建概览需要某种视觉总结。数据缩减的一种方法是使用聚合表示,其中概览中的单个视觉标记明确代表许多项目。
The challenge of aggregation is to avoid eliminating the interesting signals in the dataset in the process of summarization. In the cartographic literature, the problem of creating maps at different scales while retaining the important distinguishing characteristics has been extensively studied under the name of cartographic generalization (Slocum, McMaster, Kessler, & Howard, 2008).
聚合的挑战在于避免在汇总过程中消除数据集中有趣的信号。在制图文献中,以不同比例创建地图同时保留重要区别特征的问题已在制图概括的名义下得到广泛研究(Slocum、McMaster、Kessler 和 Howard,2008 年)。
Another approach to data reduction is to filter the data, showing only a subset of the items. Filtering is often carried out by directly selecting ranges of interest in one or more of the data dimensions.
数据缩减的另一种方法是过滤数据,仅显示部分项目。过滤通常是通过直接选择一个或多个数据维度中感兴趣的范围来进行的。
Navigation is a specific kind of filtering based on spatial position, where changing the viewpoint changes the visible set of items. Both geometric and non-geometric zooming are used in visualization. With geometric zooming, the camera position in 2D or 3D space can be changed with standard computer graphics controls. In a realistic scene, items should be drawn at a size that depends on their distance from the camera, and only their apparent size changes based on that distance. However, in a visual encoding of an abstract space, nongeometric zooming can be useful. In semantic zooming, the visual appearance of an object changes dramatically based on the number of pixels available to draw it. For instance, an abstract visual representation of a text file could change from a tiny color-coded box with no label to a medium-sized box containing only the filename as a text label to a large rectangle containing a multi-line summary of the file contents. In realistic scenes, objects that are sufficiently far away from the camera are not visible in the images, for example, after they subtend less than one pixel of screen area. With guaranteed visibility, one of the original or derived data dimensions is used as a measure of importance, and objects of sufficient importance must have some kind of representation visible in the image plane at all times.
导航是一种基于空间位置的特殊过滤,其中改变视点会改变可见的项目集。几何和非几何缩放都用于可视化。使用几何缩放,可以使用标准计算机图形控件更改 2D 或 3D 空间中的相机位置。在现实场景中,项目的大小应取决于它们与相机的距离,并且只有它们的表观大小会根据该距离而变化。然而,在抽象空间的视觉编码中,非几何缩放可能很有用。在语义缩放中,对象的视觉外观会根据可用于绘制它的像素数而发生巨大变化。例如,文本文件的抽象视觉表示可以从一个没有标签的微小颜色编码框变为一个仅包含文件名作为文本标签的中等大小的框,再变为一个包含多行文件内容摘要的大矩形。在现实场景中,距离相机足够远的对象在图像中不可见,例如,当它们占屏幕面积小于一个像素时。在保证可见性的情况下,原始或派生数据维度之一被用作重要性的度量,并且足够重要性的物体必须始终在图像平面上具有某种可见的表示。
Focus+context techniques are another approach to data reduction. A subset of the dataset items are interactively chosen by the user to be the focus and are drawn in detail. The visual encoding also includes information about some or all of the rest of the dataset shown for context, integrated into the same view that shows the focus items. Many of these techniques use carefully chosen distortion to combine magnified focus regions and minified context regions into a unified view.
焦点+上下文技术是另一种数据缩减方法。用户以交互方式选择数据集项的子集作为焦点,并详细绘制。视觉编码还包括有关为上下文显示的其余数据集的部分或全部的信息,这些信息集成到显示焦点项的同一视图中。许多这些技术使用精心选择的失真将放大的焦点区域和缩小的上下文区域组合成统一的视图。
One common interaction metaphor is a moveable fisheye lens. Hyperbolic geometry provides an elegant mathematical framework for a single radial lens that affects all objects in the view. Another interaction metaphor is to use multiple lenses of different shapes and magnification levels that affect only local regions. Stretch and squish navigation uses the interaction metaphor of a rubber sheet where stretching one region squishes the rest, as shown in Figure 23.17. The borders of the sheet stay fixed so that all items are within the viewport, although many items may be compressed to subpixel size. The fisheye metaphor is not limited to a geometric lens used after spatial layout; it can be used directly on structured data, such as a hierarchical document where some sections are collapsed while others are left expanded.
一种常见的交互隐喻是可移动的鱼眼镜头。双曲几何为影响视图中所有对象的单个径向透镜提供了一个优雅的数学框架。另一种交互隐喻是使用不同形状和放大级别的多个透镜,这些透镜只影响局部区域。拉伸和挤压导航使用橡胶片的交互隐喻,拉伸一个区域会挤压其余区域,如图 23.17所示。橡胶片的边框保持固定,以便所有项目都在视口内,尽管许多项目可能被压缩为亚像素大小。鱼眼隐喻不仅限于空间布局后使用的几何透镜;它可以直接用于结构化数据,例如分层文档,其中一些部分被折叠而其他部分保持展开。
Figure 23.17. The TreeJuxtaposer system features stretch and squish navigation and guaranteed visibility of regions marked with colors (Munzner, Guimbretière, Tasiran, Zhang, & Zhou, 2003).
图 23.17.TreeJuxtaposer系统具有拉伸和挤压导航功能,并保证了用颜色标记的区域的可见性(Munzner、Guimbretière、Tasiran、Zhang 和 Zhou,2003 年)。
These distortion-based approaches are another example of nonliteral navigation in the same spirit as nongeometric zooming. When navigating within a large and unfamiliar dataset with realistic camera motion, users can become disoriented at high zoom levels when they can see only a small local region. These approaches are designed to provide more contextual information than a single undistorted view, in hopes that people can stay oriented if landmarks remain recognizeable. However, these kinds of distortion can still be confusing or difficult to follow for users. The costs and benefits of distortion, as opposed to multiple views or a single realistic view, are not yet fully understood. Standard 3D perspective is a particularly familiar kind of distortion and was explicitly used as a form of focus+context in early visualization work. However, as the costs of 3D spatial layout discussed in Section 23.4 became more understood, this approach became less popular.
这些基于失真的方法是非文字导航的另一个例子,其精神与非几何缩放相同。当在大型且不熟悉的数据集中导航时,如果用户只能看到一小块局部区域,那么在高缩放级别下,他们可能会迷失方向。这些方法旨在提供比单个未失真视图更多的上下文信息,希望人们在地标仍然可识别的情况下能够保持方向感。然而,这些类型的失真仍然会让用户感到困惑或难以理解。与多个视图或单个真实视图相比,失真的成本和收益尚未完全了解。标准 3D 透视是一种特别常见的失真,在早期的可视化工作中明确用作焦点+上下文的形式。然而,随着第 23.4 节中讨论的 3D 空间布局的成本越来越被理解,这种方法变得不那么流行了。
Other approaches to providing context around focus items do not require distortion. For instance, the SpaceTree system shown in Figure 23.18 elides most nodes in the tree, showing the path between the interactively chosen focus node and the root of the tree for context.
其他提供焦点项上下文的方法不需要失真。例如,图 23.18所示的 SpaceTree 系统省略了树中的大多数节点,显示了交互选择的焦点节点与树根之间的路径,以提供上下文。
Figure 23.18. The SpaceTree system shows the path between the root and the interactively chosen focus node to provide context (Grosjean, Plaisant, & Bederson, 2002).
图 23.18.SpaceTree系统显示根和交互选择的焦点节点之间的路径以提供上下文(Grosjean、Plaisant 和 Bederson,2002)。
The data reduction approaches covered so far reduce the number of items to draw. When there are many data dimensions, dimensionality reduction can also be effective.
到目前为止介绍的数据缩减方法减少了要绘制的项目数。当数据维度很多时,降维也会很有效。
With slicing, a single value is chosen from the dimension to eliminate, and only the items matching that value for the dimension are extracted to include in the lower-dimensional slice. Slicing is particularly useful with 3D spatial data, for example when inspecting slices through a CT scan of a human head at different heights along the skull. Slicing can be used to eliminate multiple dimensions at once.
使用切片,从要消除的维度中选择一个值,然后仅提取与该维度的值匹配的项目以包含在较低维度的切片中。切片对于 3D 空间数据特别有用,例如,当检查通过 CT 扫描从头骨的不同高度对人体头部进行的切片时。切片可用于一次消除多个维度。
With projection, no information about the eliminated dimensions is retained; the values for those dimensions are simply dropped, and all items are still shown. A familiar form of projection is the standard graphics perspective transformation which projects from 3D to 2D, losing information about depth along the way. In mathematical visualization, the structure of higher-dimensional geometric objects can be shown by projecting from 4D to 3D before the standard projection to the image plane and using color to encode information from the projected-away dimension. This technique is sometimes called dimensional filtering when it is used for nonspatial data.
使用投影时,不会保留有关已消除维度的任何信息;这些维度的值只是被删除,所有项目仍会显示。一种熟悉的投影形式是标准图形透视变换,它从 3D 投影到 2D,在此过程中丢失有关深度的信息。在数学可视化中,可以通过在标准投影到图像平面之前从 4D 投影到 3D 并使用颜色对投影维度中的信息进行编码来显示高维几何对象的结构。这种技术有时称为维度当用于非空间数据时进行过滤。
In some datasets, there may be interesting hidden structure in a much lower-dimensional space than the number of original data dimensions. For instance, sometimes directly measuring the independent variables of interest is difficult or impossible, but a large set of dependent or indirect variables is available. The goal is to find a small set of dimensions that faithfully represent most of the structure or variance in the dataset. These dimensions may be the original ones, or synthesized new ones that are linear or nonlinear combinations of the originals. Principal component analysis is a fast, widely used linear method. Many nonlinear approaches have been proposed, including multidimensional scaling (MDS). These methods are usually used to determine whether there are large-scale clusters in the dataset; the fine-grained structure in the lower-dimensional plots is usually not reliable because information is lost in the reduction. Figure 23.19 shows document collection in a single scatterplot. When the true dimensionality of the dataset is far higher than two, a matrix of scatterplots showing pairs of synthetic dimensions may be necessary.
在某些数据集中,可能存在比原始数据维数低得多的维数空间中有趣的隐藏结构。例如,有时直接测量感兴趣的独立变量很困难或不可能,但可以使用大量因变量或间接变量。目标是找到一小组维度,忠实地表示数据集中的大部分结构或方差。这些维度可能是原始维度,也可能是合成的新维度,它们是原始维度的线性或非线性组合。主成分分析是一种快速、广泛使用的线性方法。已经提出了许多非线性方法,包括多维缩放 (MDS)。这些方法通常用于确定数据集中是否存在大规模集群;低维图中的细粒度结构通常不可靠,因为信息在缩减过程中会丢失。图 23.19显示了单个散点图中的文档集合。当数据集的真实维数远高于 2 时,可能需要一个显示合成维度对的散点图矩阵。
Figure 23.19. Dimensionality reduction with the Glimmer multidimensional scaling approach shows clusters in a document dataset (Ingram, Munzner, & Olano, 2009), © 2009 IEEE.
图 23.19.使用 Glimmer 多维缩放方法进行降维显示了文档数据集中的聚类 (Ingram、Munzner & Olano,2009),© 2009 IEEE。
We conclude this chapter with several examples of visualizing specific types of data using the techniques discussed above.
我们将通过几个使用上面讨论的技术可视化特定类型数据的例子来结束本章。
Tabular data is extremely common, as all spreadsheet users know. The goal in visualization is to encode this information through easily perceivable visual channels rather than forcing people to read through it as numbers and text. Figure 23.20 shows the Table Lens, a focus+context approach where quantitative values are encoded as the length of one-pixel high lines in the context regions, and shown as numbers in the focus regions. Each dimension of the dataset is shown as a column, and the rows of items can be resorted according to the values in that column with a single click in its header.
所有电子表格用户都知道,表格数据非常常见。可视化的目标是通过易于感知的视觉渠道对这些信息进行编码,而不是强迫人们将其作为数字和文本阅读。图 23.20显示了表格镜头,这是一种焦点+上下文方法,其中定量值在上下文区域中被编码为一像素高的线的长度,并在焦点区域中显示为数字。数据集的每个维度都显示为一列,只需单击其标题即可根据该列中的值对项目的行进行重新排序。
Figure 23.20. The Table Lens provides focus+context interaction with tabular data, immediately reorderable by the values in each dimension column. Image courtesy Stuart Card (Rao & Card, 1994), © 1994 ACM, Inc. Included here by permission.
图 23.20。表格镜头提供焦点+上下文与表格数据的交互,可立即根据每个维度列中的值重新排序。图片由 Stuart Card (Rao & Card, 1994) 提供,© 1994 ACM, Inc. 经许可包含在此处。
The traditional Cartesian approach of a scatterplot, where items are plotted as dots with respect to perpendicular axes, is only usable for two and three dimensions of data. Many tables contain far more than three dimensions of data, and the number of additional dimensions that can be encoded using other visual channels is limited. Parallel coordinates are an approach for visualizing more dimensions at once using spatial position, where the axes are parallel rather than perpendicular and an n-dimensional item is shown as a polyline that crosses each of the n axes once (Inselberg & Dimsdale, 1990; Wegman, 1990). Figure 23.21 shows an eight-dimensional dataset of 230,000 items at multiple levels of detail (Fua, Ward, & Rundensteiner, 1999), from a high-level view at the top to finer detail at the bottom. With hierarchical parallel coordinates, the items are clustered and an entire cluster of items is represented by a band of varying width and opacity, where the mean is in the middle and width at each axis depends on the values of the items in the cluster in that dimension. The coloring of each band is based on the proximity between clusters according to a similarity metric.
传统的笛卡尔散点图方法(将项目绘制为相对于垂直轴的点)仅适用于二维和三维数据。许多表格包含的数据远不止三维,而使用其他视觉通道可以编码的额外维度数量有限。平行坐标是一种使用空间位置一次可视化更多维度的方法,其中轴是平行的而不是垂直的,并且n维项目显示为与n 个轴中的每一个相交一次的折线(Inselberg & Dimsdale,1990;Wegman,1990)。图 23.21显示了包含 230,000 个项目的八维数据集,具有多个细节层次(Fua、Ward 和 Rundensteiner,1999),从顶部的高级视图到底部的更精细的细节。使用分层平行坐标,项目被聚类,整个项目集群由不同宽度和不透明度的带状表示,其中平均值位于中间,每个轴的宽度取决于集群中该维度的项目值。每个带状的颜色基于相似性度量中集群之间的接近度。
Figure 23.21. Hierarchical parallel coordinates show high-dimensional data at multiple levels of detail. Image courtesy Matt Ward (Fua et al., 1999), © 1999 IEEE.
图 23.21。分层平行坐标显示多层次细节的高维数据。图片由 Matt Ward (Fua 等,1999) 提供,© 1999 IEEE。
The field of graph drawing is concerned with finding a spatial position for the nodes in a graph in 2D or 3D space and routing the edges between these nodes (Di Battista, Eades, Tamassia, & Tollis, 1999). In many cases the edge-routing problem is simplified by using only straight edges, or by only allowing right-angle bends for the class of orthogonal layouts, but some approaches handle true curves. If the graph has directed edges, a layered approach can be used to show hierarchical structure through the horizontal or vertical spatial ordering of nodes, as shown in Figure 23.2.
图形绘制领域涉及在二维或三维空间中为图形中的节点找到空间位置,并对这些节点之间的边进行路由(Di Battista、Eades、Tamassia 和 Tollis,1999 年)。在许多情况下,边路由问题可以通过仅使用直边或仅允许直角弯曲(正交布局类)来简化,但有些方法可以处理真正的曲线。如果图形有向边,则可以使用分层方法通过节点的水平或垂直空间排序来显示层次结构,如图 23.2所示。
A suite of aesthetic criteria operationalize human judgments about readable graphs as metrics that can be computed on a proposed layout (Ware, Purchase, Colpys, & McGill, 2002). Figure 23.22 shows some examples. Some metrics should be minimized, such as the number of edge crossings, the total area of the layout, and the number of right-angle bends or curves. Others should be maximized, such as the angular resolution or symmetry. The problem is difficult because most of these criteria are individually NP-hard, and moreover they are mutually incompatible (Brandenburg, 1988).
一套美学标准将人类对可读图形的判断操作化为可在拟议布局上计算的指标(Ware、Purchase、Colpys 和 McGill,2002 年)。图 23.22显示了一些示例。一些指标应最小化,例如边交叉数、布局的总面积以及直角弯曲或曲线的数量。其他指标应最大化,例如角度分辨率或对称性。这个问题很困难,因为大多数这些标准都是 NP 难的,而且它们相互不兼容(Brandenburg,1988 年)。
Figure 23.22. Graph layout aesthetic criteria. Top: Edge crossings should be minimized. Middle: Angular resolution should be maximized. Bottom: Symmetry is maximized on the left, whereas crossings are minimized on the right, showing the conflict between the individually NP-hard criteria.
图 23.22。图形布局美学标准。顶部:应尽量减少边交叉。中间:应尽量增加角分辨率。底部:左侧尽量增加对称性,而右侧尽量减少交叉,显示了各个 NP 难题标准之间的冲突。
Many approaches to node-link graph drawing use force-directed placement, motivated by the intuitive physical metaphor of spring forces at the edges drawing together repelling particles at the nodes. Although naive approaches have high time complexity and are prone to being caught in local minima, much work has gone into developing more sophisticated algorithms such as GEM (Frick, Ludwig, & Mehldau, 1994) or IPSep-CoLa (Dwyer, Koren, & Marriott, 2006). Figure 23.23 shows an interactive system using the r-PolyLog energy model, where a focus+context view of the clustered graph is created with both geometric and semantic fisheye (van Ham & van Wijk, 2004).
许多节点链接图绘制方法都使用力导向布局,其动机是边缘处的弹簧力将节点处的排斥粒子拉到一起的直观物理隐喻。尽管简单的方法具有较高的时间复杂度并且容易陷入局部最小值,但人们已经投入了大量精力来开发更复杂的算法,例如 GEM(Frick、Ludwig 和 Mehldau,1994 年)或 IPSep-CoLa(Dwyer、Koren 和 Marriott,2006 年)。图 23.23显示了使用r -PolyLog 能量模型的交互式系统,其中使用几何和语义鱼眼创建了聚类图的焦点+上下文视图(van Ham 和 van Wijk,2004 年)。
Figure 23.23. Force-directed placement showing a clustered graph with both geometric and semantic fisheye. Image courtesy Jarke van Wijk (van Ham & van Wijk, 2004), © 2004 IEEE.
图 23.23。力导向放置显示具有几何和语义鱼眼的聚类图。图片由 Jarke van Wijk 提供(van Ham & van Wijk,2004 年),© 2004 IEEE。
Graphs can also be visually encoded by showing the adjacency matrix, where all vertices are placed along each axis and the cell between two vertices is colored if there is an edge between them. The MatrixExplorer system uses linked multiple views to help social science researchers visually analyze social networks with both matrix and node-link representations (Henry & Fekete, 2006). Figure 23.24 shows the different visual patterns created by the same graph structure in these two views: A represents an actor connecting several communities; B is a community; and C is a clique, or a complete sub-graph. Matrix views do not suffer from cluttered edge crossings, but many tasks including path following are more difficult with this approach.
图形也可以通过显示邻接矩阵进行可视化编码,其中所有顶点都沿着每个轴放置,如果两个顶点之间有边,则两个顶点之间的单元格将被着色。MatrixExplorer 系统使用链接的多个视图来帮助社会科学研究人员以矩阵和节点链接表示形式直观地分析社交网络(Henry & Fekete,2006)。图 23.24显示了这两个视图中相同图形结构创建的不同视觉模式:A 表示连接多个社区的参与者;B 表示社区;C 表示小团体或完整子图。矩阵视图不会受到杂乱的边交叉的影响,但使用这种方法,包括路径跟踪在内的许多任务都更加困难。
Figure 23.24. Graphs can be shown with either matrix or node-link views. Image courtesy Jean-Daniel Fekete (Henry & Fekete, 2006), © 2006 IEEE.
图 23.24。图表可以采用矩阵或节点链接视图显示。图片由 Jean-Daniel Fekete (Henry & Fekete, 2006) 提供,© 2006 IEEE。
Trees are a special case of graphs so common that a great deal of visualization research has been devoted to them. A straightforward algorithm to lay out trees in the two-dimensional plane works well for small trees (Reingold & Tilford, 1981), while a more complex but scalable approach runs in linear time (Buchheim, Jünger, & Leipert, 2002). Figures 23.17 and 23.18 also show trees with different approaches to spatial layout, but all four of these methods visually encode the relationship between parent and child nodes by drawing a link connecting them.
树是一种非常常见的特殊图形,因此有大量的可视化研究致力于此。在二维平面上布局树的简单算法对于小型树非常有效(Reingold & Tilford,1981),而更复杂但可扩展的方法则以线性时间运行(Buchheim、Jünger 和 Leipert,2002)。图 23.17 和 23.18 也显示了采用不同空间布局方法的树,但这四种方法都通过绘制连接父节点和子节点的链接来直观地编码父节点和子节点之间的关系。
Treemaps use containment rather than connection to show the hierarchical relationship between parent and child nodes in a tree (B. Johnson & Shneiderman, 1991). That is, treemaps show child nodes nested within the outlines of the parent node. Figure 23.25 shows a hierarchical filesystem of nearly one million files, where file size is encoded by rectangle size and file type is encoded by color (Fekete & Plaisant, 2002). The size of nodes at the leaves of the tree can encode an additional data dimension, but the size of nodes in the interior does not show the value of that dimension; it is dictated by the cumulative size of their descendants. Although tasks such as understanding the topological structure of the tree or tracing paths through it are more difficult with treemaps than with node-link approaches, tasks that involve understanding an attribute tied to leaf nodes are well supported. Treemaps are space-filling representations that are usually more compact than node-link approaches.
树形图使用包含而不是连接来显示树中父节点和子节点之间的层次关系(B. Johnson & Shneiderman,1991)。也就是说,树形图显示嵌套在父节点轮廓内的子节点。图 23.25显示了近一百万个文件的层次文件系统,其中文件大小由矩形大小编码,文件类型由颜色编码(Fekete & Plaisant,2002)。树叶节点的大小可以编码额外的数据维度,但内部节点的大小并不显示该维度的值;它由其后代的累积大小决定。虽然使用树形图比使用节点链接方法更难理解树的拓扑结构或跟踪树中的路径等任务,但涉及理解与叶节点绑定的属性的任务得到了很好的支持。树形图是空间填充表示,通常比节点链接方法更紧凑。
Figure 23.25. Treemap showing a filesystem of nearly one million files. Image courtesy Jean-Daniel Fekete (Fekete & Plaisant, 2002), © 2002 IEEE.
图 23.25。树状图显示了一个包含近一百万个文件的文件系统。图片由 Jean-Daniel Fekete (Fekete & Plaisant, 2002) 提供,© 2002 IEEE。
Many kinds of analysis such as epidemiology require understanding both geographic and nonspatial data. Figure 23.26 shows a tool for the visual analysis of a cancer demographics dataset that combines many of the ideas described in this chapter (MacEachren, Dai, Hardisty, Guo, & Lengerich, 2003). The top matrix of linked views features small multiples of three types of visual encodings: geographic maps showing Appalachian counties at the lower left, histograms across the diagonal of the matrix, and scatterplots on the upper right. The bottom 2 × 2 matrix, linking scatterplots with maps, includes the color legend for both. The discrete bivariate sequential colormap has lightness increasing sequentially for each of two complementary hues and is effective for color-deficient people.
许多类型的分析(例如流行病学)都需要了解地理和非空间数据。图 23.26展示了一种用于可视化分析癌症人口统计数据集的工具,该工具结合了本章中描述的许多想法(MacEachren、Dai、Hardisty、Guo 和 Lengerich,2003 年)。顶部的链接视图矩阵包含三种类型的可视化编码的小倍数:左下方显示阿巴拉契亚县的地理地图、矩阵对角线上的直方图和右上方的散点图。底部的 2 × 2 矩阵将散点图与地图链接起来,并包含两者的颜色图例。离散二元顺序色图的亮度对于两种互补色调中的每一种都按顺序增加,对色盲人群很有效。
Figure 23.26. Two matrices of linked small multiples showing cancer demographic data (MacEachren et al., 2003), © 2003 IEEE.
图 23.26.两个链接的小倍数矩阵显示癌症人口统计数据 (MacEachren 等,2003),© 2003 IEEE。
Most nongeographic spatial data is modeled as a field, where there are one or more values associated with each point in 2D or 3D space. Scalar fields, for example CT or MRI medical imaging scans, are usually visualized by finding isosurfaces or using direct volume rendering. Vector fields, for example, flows in water or air, are often visualized using arrows, streamlines (McLouglin, Laramee, Peikert, Post, & Chen, 2009), and line integral convolution (LIC) (Laramee et al., 2004). Tensor fields, such as those describing the anisotropic diffusion of molecules through the human brain, are particularly challenging to display (Kindlmann, Weinstein, & Hart, 2000).
大多数非地理空间数据都建模为场,其中二维或三维空间中的每个点都有一个或多个值。标量场(例如 CT 或 MRI 医学成像扫描)通常通过查找等值面或使用直接体积渲染来可视化。矢量场(例如水或空气中的流动)通常使用箭头、流线(McLouglin、Laramee、Peikert、Post 和 Chen,2009 年)和线积分卷积(LIC)(Laramee 等人,2004 年)来可视化。张量场(例如描述分子通过人脑的各向异性扩散的张量场)的显示尤其具有挑战性(Kindlmann、Weinstein 和 Hart,2000 年)。
What conferences and journals are good places to look for further information about visualization?
哪些会议和期刊是寻找有关可视化的更多信息的好地方?
The IEEE VisWeek conference comprises three subconferences: InfoVis (Information Visualization), Vis (Visualization), and VAST (Visual Analytics Science and Technology). There is also a European EuroVis conference and an Asian PacificVis venue. Relevant journals include IEEE TVCG (Transactions on Visualization and Computer Graphics) and Palgrave Information Visualization.
IEEE VisWeek 会议包括三个子会议:InfoVis(信息可视化)、Vis(可视化)和 VAST(可视化分析科学与技术)。此外,还有欧洲 EuroVis 会议和亚太 Vis 会议。相关期刊包括 IEEE TVCG(可视化与计算机图形学学报)和 Palgrave Information Visualization。
What software and toolkits are available for visualization?
有哪些软件和工具包可用于可视化?
The most popular toolkit for spatial data is vtk, a C/C++ codebase available at www.vtk.org. For abstract data, the Java-based prefuse (http://www.prefuse.org) and Processing (processing.org) toolkits are becoming widely used. The ManyEyes site from IBM Research (www.many-eyes.com) allows people to upload their own data, create interactive visualizations in a variety of formats, and carry on conversations about visual data analysis.
最流行的空间数据工具包是vtk ,这是一个 C/C++ 代码库,可从www.vtk.org获取。对于抽象数据,基于 Java 的prefuse ( http://www.prefuse.org ) 和 Processing ( processing.org ) 工具包正得到广泛使用。IBM Research 的 ManyEyes 网站 ( www.many-eyes.com ) 允许人们上传自己的数据、创建各种格式的交互式可视化效果并进行有关可视化数据分析的讨论。
Akenine-Möller, T., Haines, E., & Hoffman, N. (2008). Real-Time Rendering (Third ed.). Wellesley, MA: A K Peters.
Akenine-Möller, T.、Haines, E. 和 Hoffman, N. (2008)。 《实时渲染》 (第三版)。马萨诸塞州韦尔斯利:AK Peters。
Amanatides, J., & Woo, A. (1987). A Fast Voxel Traversal Algorithm for Ray Tracing. In Proceedings of Eurographics (pp. 1–10). Amsterdam: Elsevier Science Publishers.
Amanatides, J. 和 Woo, A. (1987)。用于光线追踪的快速体素遍历算法。 《欧洲图形学会会刊》 (第 1-10 页)。阿姆斯特丹:Elsevier Science 出版社。
American National Standard Institute. (1986). Nomenclature and Definitions for Illumination Engineering. ANSI Report (New York). (ANSI/IES RP-16-1986)
美国国家标准协会。(1986 年)。照明工程术语和定义。ANSI报告(纽约)。(ANSI/IES RP-16-1986)
Angel, E. (2002). Interactive Computer Graphics: A Top-Down Approach with OpenGL (Third ed.). Reading, MA: Addison-Wesley.
Angel, E. (2002)。交互式计算机图形学:采用 OpenGL 自上而下的方法(第三版)。马萨诸塞州雷丁:Addison-Wesley。
Appel, A. (1968). Some Techniques for Shading Machine Renderings of Solids. In Proceedings of the AFIPS Spring Joint Computing Conference (Vol. 32, pp. 37–45). AFIPS.
Appel, A. (1968)。《用于实体着色机渲染的一些技术》。 《AFIPS 春季联合计算会议论文集》 (第 32 卷,第 37-45 页)。AFIPS。
Arvo, J. (1995a). Analytic Methods for Simulated Light Transport (Unpublished doctoral dissertation).
Arvo, J. (1995a).模拟光传输的分析方法(未发表的博士论文)。
Arvo, J. (1995b). Stratified sampling of spherical triangles. In Proceedings SIGGRAPH (pp. 437–438).
Arvo, J. (1995b)。球面三角形的分层采样。在SIGGRAPH 论文集(第 437-438 页)中。
Ashikhmin, M., Premože, S., & Shirley, P. (2000). A Microfacet-Based BRDF Generator. In Proceedings of SIGGRAPH (pp. 65–74). Reading, MA: Addison-Wesley Longman.
Ashikhmin, M.、Premože, S. 和 Shirley, P. (2000)。基于微面片的 BRDF 生成器。SIGGRAPH论文集(第 65-74 页)。马萨诸塞州雷丁:Addison-Wesley Longman。
Baumgart, B. (1974, October). Geometric Modeling for Computer Vision (Tech. Rep. No. AIM-249). Palo Alto, CA: Stanford University AI Laboratory.
Baumgart,B.(1974 年 10 月)。计算机视觉的几何建模(技术报告编号 AIM-249)。加利福尼亚州帕洛阿尔托:斯坦福大学人工智能实验室。
Beck, K., & Andres, C. (2004). Extreme Programming Explained: Embrace Change (Second ed.). Reading, MA: Addison-Wesley.
Beck, K. 和 Andres, C. (2004)。 《极限编程解析:拥抱变化》(第二版)。马萨诸塞州雷丁:Addison-Wesley。
Blinn, J. (1996). Jim Blinn’s Corner. San Francisco, CA: Morgan Kaufmann.
布林,J.(1996)。吉姆·布林角。加利福尼亚州旧金山:摩根·考夫曼。
Blinn, J. F. (1976). Texture and Reflection in Computer Generated Images. Communications of the ACM, 19(10), 542–547.
Blinn, JF (1976)。计算机生成图像中的纹理和反射。ACM通讯, 19 (10),542–547。
Bresenham, J. E. (1965). Algorithm for Computer Control of a Digital Plotter. IBM Systems Journal, 4(1), 25–30.
Bresenham, JE (1965)。数字绘图仪的计算机控制算法。IBM系统杂志, 4 (1),25–30。
Burley, B. (2012). Physically-based shading at disney. In Proceedings of siggraph (pp. 1–7).
Burley, B. (2012)。迪士尼的基于物理的着色。在siggraph 论文集(第 1-7 页)。
Campagna, S., Kobbelt, L., & Seidel, H.-P. (1998). Directed Edges—A Scalable Representation for Triangle Meshes. Journal of Graphics Tools, 3(4), 1–12.
Campagna, S., Kobbelt, L., & Seidel, H.-P. (1998)。有向边——三角网格的可扩展表示。 《图形工具杂志》 , 3 (4),1-12。
Cleary, J., Wyvill, B., Birtwistle, G., & Vatti, R. (1983). A Parallel Ray Tracing Computer. In Proceedings of the Association of Simula Users Conference (pp. 77–80).
Cleary, J.、Wyvill, B.、Birtwistle, G. 和 Vatti, R. (1983)。并行射线追踪计算机。载于Simula 用户协会会议论文集(第 77-80 页)。
Cook, R. L., Carpenter, L., & Catmull, E. (1987). The Reyes Image Rendering Architecture. Proceedings of SIGGRAPH ’87 Computer Graphics, 21(4), 95–102.
Cook, RL, Carpenter, L., & Catmull, E. (1987)。Reyes 图像渲染架构。SIGGRAPH '87 计算机图形学论文集, 21 (4),95–102。
Cook, R. L., & Torrance, K. E. (1982). A Reflectance Model for Computer Graphics. ACM Transactions on Graphics, 1(1), 7–24.
Cook, RL, & Torrance, KE (1982)。计算机图形学的反射模型。ACM图形学汇刊, 1 (1),7-24。
Crow, F. C. (1978). The Use of Grayscale for Improved Raster Display of Vectors and Characters. In SIGGRAPH ’78: Proceedings of the 5th Annual Conference on Computer Graphics and Interactive Techniques (pp. 1–5). New York: ACM Press.
Crow, FC (1978)。《使用灰度改善矢量和字符的光栅显示》。《 SIGGRAPH '78:第五届计算机图形学和交互技术年会论文集》 (第 1-5 页)。纽约:ACM Press。
Crowe, M. J. (1994). A History of Vector Analysis. Mineola, NY: Dover.
Crowe, MJ (1994)。矢量分析的历史。纽约州米尼奥拉:多佛。
DeRose, T. (1989). A Coordinate-Free Approach to Geometric Programming (Tech. Rep. No. 89-09-16). Seattle, WA: University of Washington.
DeRose, T. (1989)。几何规划的无坐标方法(技术报告编号 89-09-16)。西雅图,华盛顿:华盛顿大学。
Dobkin, D. P., & Mitchell, D. P. (1993). Random-Edge Discrepancy of Supersampling Patterns. In Proceedings of Graphics Interface (pp. 62–69). Wellesley, MA: A K Peters & Canadian Human-Computer Communications Society.
Dobkin, DP 和 Mitchell, DP (1993)。超采样模式的随机边缘差异。载于《图形界面论文集》 (第 62-69 页)。马萨诸塞州韦尔斯利:AK Peters 和加拿大人机通信协会。
Doran, C., & Lasenby, A. (2003). Geometric Algebra for Physicists. Cambridge, UK: Cambridge University Press.
Doran, C. 和 Lasenby, A. (2003)。物理学家的几何代数。英国剑桥:剑桥大学出版社。
Duff, T., Burgess, J., Christensen, P., Hery, C., Kensler, A., Liani, M., & Villemin, R. (2017). Building an orthonormal basis, revisited. Journal of Computer Graphics Techniques Vol, 6(1).
Duff, T.、Burgess, J.、Christensen, P.、Hery, C.、Kensler, A.、Liani, M. 和 Villemin, R. (2017)。重新审视建立正交基础。计算机图形技术杂志,第 6 卷(1)。
Eberly, D. (2000). 3D Game Engine Design: A Practical Approach to Real-Time Computer Graphics. San Francisco, CA: Morgan Kaufmann.
Eberly, D. (2000)。3D游戏引擎设计:实时计算机图形的实用方法。加利福尼亚州旧金山:Morgan Kaufmann。
Ershov, S., Kolchin, K., & Myszkowski, K. (2001). Rendering Pearlescent Appearance Based on Paint-Composition Modelling. Computer Graphics Forum, 20(3), 227–238.
Ershov, S., Kolchin, K., & Myszkowski, K. (2001)。基于油漆成分建模呈现珠光外观。计算机图形学论坛, 20 (3),227–238。
Estevez, A. C., Imageworks, S. P., & Kulla, C. (n.d.). Production friendly micro-facet sheen brdf (Tech. Rep.). Technical Report, Sony Imageworks, 2017. Cited on.
Estevez, AC, Imageworks, SP, 和 Kulla, C. (nd)。适合生产的微面光泽 brdf (技术报告)。技术报告,Sony Imageworks,2017 年。引用于。
Farin, G., & Hansford, D. (2004). Practical Linear Algebra: A Geometry Tool-box. Wellesley, MA: A K Peters.
Farin, G. 和 Hansford, D. (2004)。实用线性代数:几何工具箱。马萨诸塞州韦尔斯利:AK Peters。
Foley, J. D., Van Dam, A., Feiner, S. K., & Hughes, J. F. (1990). Computer Graphics: Principles and Practice (Second ed.). Reading, MA: Addison-Wesley.
Foley, JD、Van Dam, A.、Feiner, SK 和 Hughes, JF (1990)。计算机图形学:原理与实践(第二版)。马萨诸塞州雷丁:Addison-Wesley。
Francis S. Hill, J. (2000). Computer Graphics Using OpenGL (Second ed.). Englewood Cliffs, NJ: Prentice Hall.
Francis S. Hill, J. (2000)。使用 OpenGL 的计算机图形学(第二版)。新泽西州恩格尔伍德克利夫斯:Prentice Hall。
Fujimoto, A., Tanaka, T., & Iwata, K. (1986). ARTSccelerated Ray-Tracing System. IEEE Computer Graphics & Applications, 6(4), 16–26.
Fujimoto, A., Tanaka, T., & Iwata, K. (1986)。ARTSc 加速光线追踪系统。IEEE计算机图形学与应用, 6 (4),16-26。
Glassner, A. (1984). Space Subdivision for Fast Ray Tracing. IEEE Computer Graphics & Applications, 4(10), 15–22.
Glassner, A. (1984)。空间细分以实现快速光线追踪。IEEE计算机图形学与应用, 4 (10),15-22。
Glassner, A. (1995). Principles of Digital Image Synthesis. San Francisco, CA: Morgan Kaufmann.
Glassner, A. (1995)。数字图像合成原理。加利福尼亚州旧金山:Morgan Kaufmann。
Goldman, R. (1985). Illicit Expressions in Vector Algebra. ACM Transactions on Graphics, 4(3), 223–243.
Goldman, R. (1985)。向量代数中的非法表达式。ACM图形学学报, 4 (3),223–243。
Goldsmith, J., & Salmon, J. (1987). Automatic Creation of Object Hierarchies for Ray Tracing. IEEE Computer Graphics & Applications, 7(5), 14–20.
Goldsmith, J. 和 Salmon, J. (1987)。自动创建用于光线追踪的对象层次结构。IEEE计算机图形学与应用, 7 (5),14-20。
Gouraud, H. (1971). Continuous Shading of Curved Surfaces. Communications of the ACM, 18(6), 623–629.
Gouraud, H. (1971). 曲面的连续着色. Communications of the ACM , 18 (6), 623–629.
Hammersley, J., & Handscomb, D. (1964). Monte-Carlo Methods. London: Methuen.
Hammersley, J. 和 Handscomb, D. (1964)。蒙特卡罗方法。伦敦:Methuen。
Hanrahan, P., & Lawson, J. (1990). A Language for Shading and Lighting Calculations. In SIGGRAPH ’90: Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques (pp. 289–298). New York: ACM Press.
Hanrahan, P. 和 Lawson, J. (1990)。一种用于着色和照明计算的语言。在SIGGRAPH '90:第 17 届计算机图形学和交互技术年会论文集(第 289-298 页)中。纽约:ACM Press。
Hanson, A. J. (2005). Visualizing Quaternions. San Francisco, CA: Morgan Kaufmann.
Hanson, AJ (2005)。可视化四元数。加利福尼亚州旧金山:Morgan Kaufmann。
Hausner, M. (1998). A Vector Space Approach to Geometry. Mineola, NY: Dover.
Hausner, M. (1998)。几何的向量空间方法。纽约州米尼奥拉:多佛。
Havran, V. (2000). Heuristic Ray Shooting Algorithms (Unpublished doctoral dissertation). Czech Technical University in Prague.
Havran, V. (2000)。启发式射线射击算法(未发表的博士论文)。布拉格捷克技术大学。
He, X. D., Heynen, P. O., Phillips, R. L., Torrance, K. E., Salesin, D. H., & Greenberg, D. P. (1992). A Fast and Accurate Light Reflection Model. Proceedings of SIGGRAPH ’92, Computer Graphics, 26(2), 253–254.
He, XD, Heynen, PO, Phillips, RL, Torrance, KE, Salesin, DH, & Greenberg, DP (1992). 一种快速准确的光反射模型。SIGGRAPH '92 会议论文集,计算机图形学, 26 (2),253–254。
Hearn, D., & Baker, M. P. (1986). Computer Graphics. Englewood Cliffs, NJ: Prentice Hall.
Hearn, D. 和 Baker, MP (1986)。计算机图形学。新泽西州恩格尔伍德克利夫斯:Prentice Hall。
Heidrich, W., & Seidel, H.-P. (1998). Ray-Tracing Procedural Displacement Shaders. In Proceedings of Graphics Interface (pp. 8–16). Wellesley, MA: A K Peters & Canadian Human-Computer Communications Society.
Heidrich, W. 和 Seidel, H.-P. (1998)。光线追踪程序位移着色器。载于《图形界面论文集》 (第 8-16 页)。马萨诸塞州韦尔斯利:AK Peters 和加拿大人机通信协会。
Heitz, E. (2014). Understanding the masking-shadowing function in microfacet-based BRDFs. Journal of Computer Graphics Techniques, 3(2), 32–91.
Heitz, E. (2014)。了解基于微面片的 BRDF 中的遮罩阴影函数。 《计算机图形技术杂志》 , 3 (2),32-91。
Heitz, E., & d’Eon, E. (2014). Importance sampling microfacet-based bsdfs using the distribution of visible normals. In Computer Graphics Forum (Vol. 33, pp. 103–112).
Heitz, E. 和 d'Eon, E. (2014)。使用可见法线分布对基于微面片的重要性采样 bsdf。 《计算机图形学论坛》 (第 33 卷,第 103-112 页)。
Hoffmann, B. (1975). About Vectors. Mineola, NY: Dover.
Hoffmann, B. (1975)。关于向量。纽约州米尼奥拉:多佛。
Hoppe, H. (1994). Surface Reconstruction from Unorganized Points (Unpublished doctoral dissertation). University of Washington.
Hoppe, H. (1994)。无序点的表面重建(未发表的博士论文)。华盛顿大学。
Hughes, J. F., & Möller, T. (1999). Building an Orthonormal Basis from a Unit Vector. Journal of Graphics Tools, 4(4), 33–35.
Hughes, JF 和 Möller, T. (1999)。根据单位向量构建正交基。 《图形工具杂志》 , 4 (4),33–35。
IEEE Standards Association. (1985). IEEE Standard for Binary Floating-Point Arithmetic (Tech. Rep.). New York: IEEE Report. (ANSI/IEEE Std 754-1985)
IEEE 标准协会。(1985 年)。IEEE二进制浮点运算标准(技术报告)。纽约:IEEE 报告。(ANSI/IEEE 标准 754-1985)
Immel, D. S., Cohen, M. F., & Greenberg, D. P. (1986). A Radiosity Method for Non-Diffuse Environments. Proceedings of SIGGRAPH ’86, Computer Graphics, 20(4), 133–142.
Immel, DS, Cohen, MF, & Greenberg, DP (1986)。非漫反射环境的辐射度方法。SIGGRAPH '86 会议论文集,计算机图形学, 20 (4),133–142。
Jansen, F. W. (1986). Data Structures for Ray Tracing. In Proceedings of a Workshop Eurographics Seminars on Data Structures for Raster Graphics (pp. 57–73). New York: Springer-Verlag.
Jansen, FW (1986)。光线追踪的数据结构。欧洲图形学会光栅图形数据结构研讨会论文集(第 57-73 页)。纽约:Springer-Verlag。
Kajiya, J. T. (1986). The Rendering Equation. Proc SIGGRAPH ’86 Computer Graphics, 20(4), 143–150.
Kajiya, JT (1986). 渲染方程。Proc SIGGRAPH '86 计算机图形学, 20 (4),143–150。
Kalos, M., & Whitlock, P. (1986). Monte Carlo Methods, Basics. New York: Wiley-Interscience.
Kalos, M. 和 Whitlock, P. (1986)。蒙特卡罗方法基础。纽约:Wiley-Interscience。
Kay, D. S., & Greenberg, D. (1979). Transparency for Computer Synthesized Images. Proceedings of SIGGRAPH ’79 Computer Graphics, 13(2), 158–164.
Kay, DS, & Greenberg, D. (1979). 计算机合成图像的透明度。SIGGRAPH '79 计算机图形学论文集, 13 (2),158–164。
Kernighan, B. W., & Pike, R. (1999). The Practice of Programming. Reading, MA: Addison-Wesley.
Kernighan, BW 和 Pike, R. (1999)。编程实践。马萨诸塞州雷丁:Addison-Wesley。
Kirk, D., & Arvo, J. (1988). The Ray Tracing Kernel. In Proceedings of Aus-graph. Melbourne, Australia: Australian Computer Graphics Association.
Kirk, D., & Arvo, J. (1988). 光线追踪内核。载于Aus-graph 论文集。澳大利亚墨尔本:澳大利亚计算机图形学协会。
Kollig, T., & Keller, A. (2002). Efficient Multidimensional Sampling. Computer Graphics Forum, 21(3), 557–564.
Kollig, T., & Keller, A. (2002)。高效多维采样。计算机图形学论坛, 21 (3),557–564。
Lafortune, E. P. F., Foo, S.-C., Torrance, K. E., & Greenberg, D. P. (1997). Non-Linear Approximation of Reflectance Functions. In Proceedings of SIGGRAPH ’97 (pp. 117–126). Reading, MA: Addison-Wesley.
Lafortune, EPF、Foo, S.-C.、Torrance, KE 和 Greenberg, DP (1997)。反射函数的非线性近似。SIGGRAPH '97 论文集(第 117-126 页)。马萨诸塞州雷丁:Addison-Wesley。
Lawrence, J., Rusinkiewicz, S., & Ramamoorthi, R. (2004). Efficient BRDF Importance Sampling Using a Factored Representation. ACM Transactions on Graphics (Proceedings of SIGGRAPH ’04), 23(3), 496–505.
Lawrence, J.、Rusinkiewicz, S. 和 Ramamoorthi, R. (2004)。使用因子表示实现高效 BRDF 重要性采样。ACM Transactions on Graphics(SIGGRAPH '04 论文集) , 23 (3),496–505。
Lewis, R. R. (1994). Making Shaders More Physically Plausible. Computer Graphics Forum, 13(2), 109–120.
Lewis, RR (1994)。使着色器在物理上更合理。计算机图形学论坛, 13 (2),109–120。
Loop, C. (2000). Managing Adjacency in Triangular Meshes (Tech. Rep. No. MSR-TR-2000-24). Bellingham, WA: Microsoft Research.
Loop, C. (2000)。管理三角网格中的邻接关系(技术报告编号 MSR-TR-2000-24)。华盛顿州贝灵汉:微软研究院。
Matusik, W., Pfister, H., Brand, M., & McMillan, L. (2003). A Data-Driven Reflectance Model. ACM Transactions on Graphics (Proceedings of SIGGRAPH ’03), 22(3), 759–769.
Matusik, W.、Pfister, H.、Brand, M. 和 McMillan, L. (2003)。数据驱动的反射模型。ACM Transactions on Graphics(SIGGRAPH '03 论文集) , 22 (3),759–769。
McGuire, M., Dorsey, J., Haines, E., Hughes, J. F., Marschner, S., Pharr, M., & Shirley, P. (2020). A taxonomy of bidirectional scattering distribution function lobes for rendering engineers.
McGuire, M., Dorsey, J., Haines, E., Hughes, JF, Marschner, S., Pharr, M., & Shirley, P. (2020). 面向渲染工程师的双向散射分布函数叶分类法。
Meyers, S. (1995). More Effective C++: 35 New Ways to Improve Your Programs and Designs. Reading, MA: Addison-Wesley.
Meyers, S. (1995)。更有效的 C++:改进程序和设计的 35 种新方法。马萨诸塞州雷丁:Addison-Wesley。
Meyers, S. (1997). Effective C++: 50 Specific Ways to Improve Your Programs and Designs (Second ed.). Reading, MA: Addison-Wesley.
Meyers, S. (1997). Effective C++:改进程序和设计的 50 种具体方法(第二版)。马萨诸塞州雷丁:Addison-Wesley。
Mitchell, D. P. (1996). Consequences of Stratified Sampling in Graphics. In SIGGRAPH ’96: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (pp. 277–280). New York: ACM Press.
Mitchell, DP (1996)。分层抽样对图形的影响。SIGGRAPH '96:第 23 届计算机图形和交互技术年会论文集(第 277-280 页)。纽约:ACM 出版社。
Mitchell, D. P., & Netravali, A. N. (1988). Reconstruction filters in computer graphics. Computer Graphics (SIGGRAPH 1988 Proceedings), 22(4), 221–228. doi:10.1145/378456.378514.
Mitchell, DP, & Netravali, AN (1988)。计算机图形学中的重建滤波器。计算机图形学(SIGGRAPH 1988 会议录) ,22(4),221–228。doi: 10.1145/378456.378514 。
Munkres, J. (2000). Topology (Second ed.). Englewood Cliffs, NJ: Prentice Hall.
Munkres, J. (2000)。拓扑学(第二版)。新泽西州恩格尔伍德克利夫斯:Prentice Hall。
Muuss, M. J. (1995). Towards Real-Time Ray-Tracing of Combinatorial Solid Geometric Models. In Proceedings of BRL-CAD Symposium.
Muuss, MJ (1995)。面向组合固体几何模型的实时光线追踪。BRL -CAD 研讨会论文集。
Oren, M., & Nayar, S. K. (1994). Generalization of Lambert’s Reflectance Model. In SIGGRAPH ’94: Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques (pp. 239–246). New York: ACM Press.
Oren, M. 和 Nayar, SK (1994)。Lambert 反射模型的推广。SIGGRAPH '94:第 21 届计算机图形和交互技术年会论文集(第 239-246 页)。纽约:ACM Press。
Paeth, A. W. (1990). A Fast Algorithm for General Raster Rotation. In Graphics Gems (pp. 179–195). Boston, MA: Academic Press.
Paeth, AW (1990)。通用光栅旋转的快速算法。载于《Graphics Gems》 (第 179-195 页)。马萨诸塞州波士顿:Academic Press。
Parker, S., Martin, W., Sloan, P., Shirley, P., Smits, B., & Hansen, C. (1999). Interactive Ray Tracing. In ACM Symposium on Interactive 3D Graphics (pp. 119–126). New York: ACM Press.
Parker, S.、Martin, W.、Sloan, P.、Shirley, P.、Smits, B. 和 Hansen, C. (1999)。交互式光线追踪。在ACM 交互式 3D 图形研讨会上(第 119-126 页)。纽约:ACM 出版社。
Patterson, J., Hoggar, S., & Logie, J. (1991). Inverse Displacement Mapping. Computer Graphics Forum, 10(2), 129–139.
帕特森,J.,霍格,S.,&洛吉,J.(1991)。逆位移映射。计算机图形学论坛, 10 (2), 129–139。
Peachey, D. R. (1985). Solid Texturing of Complex Surfaces. Proceedings of SIGGRAPH ’85, Computer Graphics, 19(3), 279–286.
Peachey, DR (1985)。复杂表面的实体纹理。SIGGRAPH '85 论文集,计算机图形学, 19 (3),279–286。
Penna, M., & Patterson, R. (1986). Projective Geometry and Its Applications to Computer Graphics. Englewood Cliffs, NJ: Prentice Hall.
Penna, M. 和 Patterson, R. (1986)。射影几何及其在计算机图形学中的应用。新泽西州恩格尔伍德克利夫斯:普伦蒂斯霍尔出版社。
Perlin, K. (1985). An Image Synthesizer. Computer Graphics, 19(3), 287–296. (SIGGRAPH ’85)
Perlin, K. (1985)。图像合成器。Computer Graphics , 19 (3),287–296。(SIGGRAPH '85)
Pharr, M., & Hanrahan, P. (1996). Geometry Caching for Ray-Tracing Displacement Maps. In Proceedings of the Eurographics Workshop on Rendering Techniques ’96 (pp. 31–40). London, UK: Springer-Verlag.
Pharr, M. 和 Hanrahan, P. (1996)。光线追踪位移图的几何缓存。载于1996 年欧洲图形协会渲染技术研讨会论文集(第 31-40 页)。英国伦敦:Springer-Verlag。
Pharr, M., Jakob, W., & Humphreys, G. (2016). Physically based rendering: From theory to implementation. Morgan Kaufmann.
Pharr, M.、Jakob, W. 和 Humphreys, G. (2016)。基于物理的渲染:从理论到实现。Morgan Kaufmann。
Pharr, M., Kolb, C., Gershbein, R., & Hanrahan, P. (1997). Rendering Complex Scenes with Memory-Coherent Ray Tracing. In SIGGRAPH ’97: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (pp. 101–108). Reading, MA: Addison-Wesley.
Pharr, M.、Kolb, C.、Gershbein, R. 和 Hanrahan, P. (1997)。使用内存相干光线追踪渲染复杂场景。SIGGRAPH '97:第 24 届计算机图形和交互技术年会论文集(第 101-108 页)。马萨诸塞州雷丁:Addison-Wesley。
Phong, B.-T. (1975). Illumination for Computer Generated Images. Communications of the ACM, 18(6), 311–317.
Phong, B.-T. (1975)。计算机生成图像的照明。ACM通讯, 18 (6),311-317。
Pineda, J. (1988). A Parallel Algorithm for Polygon Rasterization. Proceedings of SIGGRAPH ’88, Computer Graphics, 22(4), 17–20.
Pineda, J. (1988)。多边形光栅化的并行算法。SIGGRAPH '88 论文集,计算机图形学, 22 (4),17-20。
Pitteway, M. L. V. (1967). Algorithm for Drawing Ellipses or Hyperbolae with a Digital Plotter. Computer Journal, 10(3), 282–289.
Pitteway, MLV (1967)。使用数字绘图仪绘制椭圆或双曲线的算法。计算机杂志, 10 (3),282-289。
Plauger, P. J. (1991). The Standard C Library. Englewood Cliffs, NJ: Prentice Hall.
Plauger, PJ (1991)。标准 C 库。新泽西州恩格尔伍德克利夫斯:Prentice Hall。
Porter, T., & Duff, T. (1984). Compositing Digital Images. In SIGGRAPH ’84: Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques (pp. 253–259). New York: ACM Press.
Porter, T. 和 Duff, T. (1984)。合成数字图像。SIGGRAPH '84:第 11 届计算机图形和交互技术年会论文集(第 253-259 页)。纽约:ACM 出版社。
Reinhard,E.,Khan,E. A., Akyüz, A. O., & Johnson, G. (2008). Color Imaging: Fundamentals and Applications. Wellesley, MA: A K Peters.
Reinhard,E.、Khan,EA、Akyüz,AO 和 Johnson,G. (2008)。彩色成像:基础和应用。马萨诸塞州韦尔斯利:AK Peters。
Riesenfeld, R. F. (1981, January). Homogeneous Coordinates and Projective Planes in Computer Graphics. IEEE Computer Graphics & Applications, 1(1), 50–55.
Riesenfeld, RF (1981 年 1 月)。计算机图形学中的齐次坐标和射影平面。IEEE计算机图形学与应用, 1 (1),50–55。
Roberts, L. (1965, May). Homogenous Matrix Representation and Manipulation of N-Dimensional Constructs (Tech. Rep. No. MS-1505). Lexington, MA: MIT Lincoln Laboratory.
Roberts, L. (1965 年 5 月)。N维结构的同质矩阵表示和操作(技术报告编号 MS-1505)。马萨诸塞州列克星敦:麻省理工学院林肯实验室。
Rogers, D. F. (1985). Procedural Elements for Computer Graphics. New York: McGraw Hill.
Rogers, DF (1985)。计算机图形学的程序元素。纽约:McGraw Hill。
Rogers, D. F. (1989). Mathematical Elements for Computer Graphics.NewYork: McGraw Hill.
Rogers, DF (1989).计算机图形学的数学元素.纽约: McGraw Hill。
Rubin, S. M., & Whitted, T. (1980). A 3-Dimensional Representation for Fast Rendering of Complex Scenes. Proceedings of SIGGRAPH ’80, Computer Graphics, 14(3), 110–116.
Rubin, SM 和 Whitted, T. (1980)。复杂场景快速渲染的三维表示。SIGGRAPH '80 会议论文集,计算机图形学, 14 (3),110-116。
Salomon, D. (1999). Computer Graphics and Geometric Modeling. New York: Springer-Verlag.
Salomon, D. (1999)。计算机图形学和几何建模。纽约:Springer-Verlag。
Sbert, M. (1997). The Use of Global Random Directions to Compute Radiosity. Global Monte Carlo Techniques (PhD. Thesis). Universitat Politènica de Catalunya.
斯伯特,M.(1997)。使用全局随机方向来计算光能传递。全球蒙特卡罗技术(博士论文)。加泰罗尼亚理工大学。
Schlick, C. (1994). An Inexpensive BRDF Model for Physically-Based Rendering. Computer Graphics Forum, 13(3), 233–246.
Schlick, C. (1994)。基于物理的渲染的廉价 BRDF 模型。计算机图形学论坛, 13 (3),233–246。
Segal, M., Korobkin, C., van Widenfelt, R., Foran, J., & Haeberli, P. (1992). Fast Shadows and Lighting Effects Using Texture Mapping. Proceedings of SIGGRAPH ’92, Computer Graphics, 26(2), 249–252.
Segal, M.、Korobkin, C.、van Widenfelt, R.、Foran, J. 和 Haeberli, P. (1992)。使用纹理映射实现快速阴影和照明效果。SIGGRAPH '92 论文集,计算机图形学, 26 (2),249–252。
Shannon, C. E., & Weaver, W. (1964). The Mathematical Theory of Communication. Urbana, IL: University of Illinois Press.
Shannon, CE 和 Weaver, W. (1964)。 《通信的数学理论》 。伊利诺伊州厄巴纳:伊利诺伊大学出版社。
Shene, C.-K. (2003). CS 3621 Introduction to Computing with Geometry Notes. Available from World Wide Web. (http://www.cs.mtu.edu/shene/COURSES/cs3621/NOTES/notes.html)
Shene, C.-K. (2003)。CS 3621 几何计算入门笔记。可从万维网获取。( http://www.cs.mtu.edu/shene/COURSES/cs3621/NOTES/notes.html )
Shreiner, D., Neider, J., Woo, M., & Davis, T. (2004). OpenGL Programming Guide (Fourth ed.). Reading, MA: Addison-Wesley.
Shreiner, D.、Neider, J.、Woo, M. 和 Davis, T. (2004)。 《OpenGL 编程指南》 (第四版)。马萨诸塞州雷丁:Addison-Wesley。
Smith, A. R. (1995). A Pixel is Not a Little Square! (Technical Memo No. 6). Bellingham, WA: Microsoft Research.
Smith, AR (1995)。像素不是小方块! (技术备忘录第 6 号)。华盛顿州贝灵厄姆:微软研究院。
Smits, B. E., Shirley, P., & Stark, M. M. (2000). Direct Ray Tracing of Displacement Mapped Triangles. In Proceedings of the Eurographics Workshop on Rendering Techniques 2000 (pp. 307–318). London, UK: Springer-Verlag.
Smits, BE、Shirley, P. 和 Stark, MM (2000)。位移映射三角形的直接光线追踪。2000年欧洲图形协会渲染技术研讨会论文集(第 307-318 页)。英国伦敦:Springer-Verlag。
Snyder, J. M., & Barr, A. H. (1987). Ray Tracing Complex Models Containing Surface Tessellations. Proceedings of SIGGRAPH ’87, Computer Graphics, 21(4), 119–128.
Snyder, JM 和 Barr, AH (1987)。包含表面镶嵌的光线追踪复杂模型。SIGGRAPH '87 会议论文集,计算机图形学, 21 (4),119–128。
Sobel, I., Stone, J., & Messer, R. (1975). The Monte Carlo Method. Chicago, IL: University of Chicago Press.
Sobel, I.、Stone, J. 和 Messer, R. (1975)。蒙特卡罗方法。伊利诺伊州芝加哥:芝加哥大学出版社。
Solomon, H. (1978). Geometric Probability. Philadelphia, PA: SIAM Press.
Solomon, H. (1978)。几何概率。宾夕法尼亚州费城:SIAM Press。
Stam, J. (1999). Diffraction Shaders. In SIGGRAPH ’99: Proceedings of the 26th Annual Conference On Computer Graphics And Interactive Techniques (pp. 101–110). Reading, MA: Addison-Wesley.
Stam, J. (1999)。衍射着色器。SIGGRAPH '99:第 26 届计算机图形学和交互技术年会论文集(第 101-110 页)。马萨诸塞州雷丁:Addison-Wesley。
Stark, M. M., Arvo, J., & Smits, B. (2005). Barycentric Parameterizations for Isotropic BRDFs. IEEE Transactions on Visualization and Computer Graphics, 11(2), 126–138.
Stark, MM, Arvo, J. 和 Smits, B. (2005)。各向同性 BRDF 的重心参数化。IEEE可视化与计算机图形学学报, 11 (2),126–138。
Strang, G. (1988). Linear Algebra and Its Applications (Third ed.). Florence, KY: Brooks Cole.
Strang, G. (1988)。线性代数及其应用(第三版)。Florence, KY: Brooks Cole。
Turkowski, K. (1990). Properties of Surface-Normal Transformations. In Graphics Gems (pp. 539–547). Boston, MA: Academic Press.
Turkowski,K.(1990 年)。《表面法线变换的性质》。载于《Graphics Gems》 (第 539-547 页)。马萨诸塞州波士顿:Academic Press。
van Aken, J., & Novak, M. (1985). Curve-Drawing Algorithms for Raster Displays. ACM Transactions on Graphics, 4(2), 147–169.
van Aken, J.,& Novak, M. (1985)。光栅显示的曲线绘制算法。ACM Transactions on Graphics , 4 (2),147–169。
Veach, E., & Guibas, L. J. (1997). Metropolis light transport. In Proceedings of SIGGRAPH 1997 (pp. 65–76). doi:10.1145/258734.258775.
Veach, E. 和 Guibas, LJ (1997)。大都市轻型交通。SIGGRAPH 1997 论文集(第 65-76 页)。doi: 10.1145/258734.258775 。
Wald, I., Slusallek, P., Benthin, C., & Wagner, M. (2001). Interactive Distributed Ray Tracing of Highly Complex Models. In Proceedings of the 12th Euro-graphics Workshop on Rendering Techniques (pp. 277–288). London, UK: Springer-Verlag.
Wald, I.、Slusallek, P.、Benthin, C. 和 Wagner, M. (2001)。高度复杂模型的交互式分布式光线追踪。第 12 届欧洲图形渲染技术研讨会论文集(第 277-288 页)。英国伦敦:Springer-Verlag。
Walter, B., Marschner, S. R., Li, H., & Torrance, K. E. (2007). Microfacet models for refraction through rough surfaces. Rendering Symposium on Rendering.
Walter, B.、Marschner, SR、Li, H. 和 Torrance, KE (2007)。粗糙表面折射的微表面模型。渲染研讨会。
Watt, A. (1991). Advanced Animation and Rendering Techniques. Reading, MA: Addison-Wesley.
Watt, A. (1991)。高级动画和渲染技术。马萨诸塞州雷丁:Addison-Wesley。
Watt, A. (1993). 3D Computer Graphics. Reading, MA: Addison-Wesley.
Watt, A. (1993)。3D计算机图形学。马萨诸塞州雷丁:Addison-Wesley。
Whitted, T. (1980). An Improved Illumination Model for Shaded Display. Communications of the ACM, 23(6), 343–349.
Whitted, T. (1980). 一种改进的阴影显示照明模型。 《ACM 通讯》 , 23 (6),343–349。
Williams, A., Barrus, S., Morley, R. K., & Shirley, P. (2005). An Efficient and Robust Ray-Box Intersection Algorithm. Journal of Graphics Tools, 10(1), 49–54.
Williams, A., Barrus, S., Morley, RK, & Shirley, P. (2005)。一种高效且稳健的射线盒相交算法。 《图形工具杂志》 , 10 (1),49–54。
Note: Bold page numbers refer to tables and italic page numbers refer to figures.
注:粗体页码表示表格,斜体页码表示图表。
adaptation, chromatic 520–523, 521, 522
适应性,半音520 –523 , 521 , 522
Akenine-Möller, T. 200, 381, 643
Akenine- Möller , T.200,381,643
algebra 107–125 see also linear algebra
代数107 – 125另请参阅线性代数
aliasing
混叠
analog-to-digital converter (A/D converter) 207, 207
模拟数字转换器(A/D转换器) 207,207
animation; see also computer animation
动画;另请参阅计算机动画
antialiasing 199–200, 199, 232–233, 256
抗锯齿199 –200 , 199 , 232 –233 , 256
arrays
数组
axis-aligned binary space partitioning 319–320
轴对齐二进制空间划分319 –320
B-splines 416–422, 418, 419, 420, 421, 422, 424–425, 425
B 样条416 –422 、 418 、 419 、 420 、 421 、 422 、 424 –425 、 425
base spectrum 247–250, 247, 250, 251
基谱247 –250 , 247 , 250 , 251
Bézier curves 409–416, 410, 412, 413
贝塞尔曲线409 –416 , 410 , 412 , 413
bidirectional reflectance distribution function 102, 371–374
双向反射分布函数102 , 371 –374
binary space partitioning (BSP) tree 319–329, 319, 322, 323, 324 see also BSP tree algorithm
二叉空间分割 (BSP) 树319 –329 , 319 , 322 , 323 , 324参见BSP 树算法
blending functions 398, 596–606, 611–612, 612
混合函数398 , 596 –606 , 611 –612 , 612
Blinn-Phong shading 488–492, 493
Blinn-Phong 着色488 –492 , 493
BlobTree structure 618–620, 619, 620, 621
BlobTree 结构618 –620 , 619 , 620 , 621
Boolean operation 90, 600, 607, 612–613
布尔运算90 , 600 , 607 , 612 –613
box filter 199, 212–213, 212, 217–218, 217, 223–224, 252
箱式滤波器199 , 212 –213 , 212 , 217 –218 , 217 , 223 –224 , 252
camera transformation 158, 162–164, 162, 163
相机变换158 , 162 –164 , 162 , 163
chromatic adaptation 520–523, 521, 522
半音适应520 –523 , 521 , 522
chromaticity coordinates 511–514, 513, 514 see also color
色度坐标511 –514 , 513 , 514参见颜色
CIE color matching functions 509–514, 511
CIE 颜色匹配函数509 –514 , 511
chromatic adaptation 520–523, 521, 522
半音适应520 –523 , 521 , 522
chromaticity coordinates 511–514, 513, 514
色度坐标511 –514 , 513 , 514
color spaces 514–520, 518, 535–537, 536
颜色空间514 –520 , 518 , 535 –537 , 536
colormaps 655–657, 656, 657, 677
颜色图655 –657 , 656 , 657 , 677
cone space 517–518, 534–535, 534
锥空间517 –518 , 534 –535 , 534
RGB color 70–71, 74–75, 74, 75, 472, 517–518, 518, 534–537
RGB 颜色70 –71 , 74 –75 , 74 , 75 , 472 , 517 –518 , 518 , 534 –537
spectrum 503–504, 504, 533–534, 533, 537
频谱503 –504 , 504 , 533 –534 , 533 , 537
color spaces 514–520, 518, 535–537, 536
颜色空间514 –520 , 518 , 535 –537 , 536
color spectrum 503–504, 504, 533–534, 533, 537
颜色光谱503 –504 , 504 , 533 –534 , 533 , 537
colormaps 655–657, 656, 657, 677
颜色图655 –657 , 656 , 657 , 677
computer animation 429–430 see also animation
计算机动画429 – 430参见动画
computer graphics
计算机图形学
constructive solid geometry (CSG) 612–614, 613, 614
构造立体几何(CSG) 612 –614 , 613 , 614
convolution 209–230, 211 see also convolution filters
卷积209 –230 , 211另请参阅卷积滤波器
convolution filters
卷积滤波器
box filter 212–213, 212, 217–218, 217, 223–224
盒式滤波器212 –213 , 212 , 217 –218 , 217 , 223 –224
Gaussian filter 224, 224, 226, 228–229, 228
高斯滤波器224 , 224 , 226 , 228 –229 , 228
separable filters 227–229, 228, 229
可分离滤波器227 –229 , 228 , 229
tent filter 217–218, 217, 224, 224, 226, 228
帐篷过滤器217 –218 , 217 , 224 , 224 , 226 , 228
coordinate system 27, 47, 47, 49, 152–153, 153
坐标系27 , 47 , 47 , 49 , 152 –153 , 153
coordinate transformations 151–154, 151, 153
坐标变换151 –154 , 151 , 153
cube-surface intersections 608–609, 609, 610
立方体表面交点608 –609 , 609 , 610
cubemaps 262–263, 263, 283, 284
立方体贴图262 –263 , 263 , 283 , 284
curves
曲线
B-splines 416–422, 418, 419, 420, 421, 422, 424–425, 425
B 样条416 –422 、 418 、 419 、 420 、 421 、 422 、 424 –425 、 425
Bézier curves 409–416, 410, 412, 413
贝塞尔曲线409 –416 , 410 , 412 , 413
parametric curve 44–46, 384–388, 388
参数曲线44 –46 , 384 –388 , 388
subdivision scheme for 384, 413–415, 413
384 , 413 –415 , 413的细分方案
de Casteljau algorithm 414–415, 415
de Casteljau 算法414 –415 , 415
determinants 107–109, 107 see also linear algebra
行列式107 –109 , 107参见线性代数
digital-to-analog converter (D/A converter) 207, 207
数模转换器(D/A转换器) 207,207
Dirac impulse/Dirac delta function 218, 218, 245, 245
狄拉克脉冲/狄拉克 delta 函数218 , 218 , 245 , 245
field-of-view 172–173, 173, 538–541, 538, 540
视野172 –173 , 173 , 538 –541 , 538 , 540
fisheye lens 81, 81, 669, 674–675, 674
鱼眼镜头81 , 81 , 669 , 674 –675 , 674
forward kinematics (FK) 444–447, 444
正向运动学(FK) 444 –447 , 444
Fourier transform 239–245, 240, 241, 242, 243, 244, 247
傅里叶变换239 –245 , 240 , 241 , 242 , 243 , 244 , 247
fragment shading 488–492, 493 see also per-fragment shading
片段着色488 –492 , 493另请参阅逐片段着色
Gaussian filter 224, 224, 226, 228–229, 228, 252, 582–583, 583, 586–588, 587
高斯滤波器224 , 224 , 226 , 228 –229 , 228 , 252 , 582 –583 , 583 , 586 –588 , 587
geometric transformations 127 see also transformation matrices
几何变换127参见变换矩阵
geometry
几何学
graphics data structures 291
图形数据结构291
instancing 491–493, 491, 499–500, 499
实例491 –493 , 491 , 499 –500 , 499
OpenGL programming 464–465, 468–469, 475
OpenGL编程464 –465 , 468 –469 , 475
programming 464–465, 468–469, 475, 499–500
编程464 –465 , 468 –469 , 475 , 499 –500
shaders 465, 471–474, 478, 501, 501
着色器465 , 471 –474 , 478 , 501 , 501
shading 478, 481–492, 484, 485, 493
阴影478 , 481 –492 , 484 , 485 , 493
graphics programs
图形程序
images
图片
implicit curves 35–36, 384 see also curves
隐式曲线35 –36 , 384参见曲线
implicit functions 35–36, 35, 40, 596–604, 597
隐函数35 –36 , 35 , 40 , 596 –604 , 597
implicit modeling 595–596
隐式建模595 –596
blending techniques 596–606, 611–612, 612
混合技术596 –606 , 611 –612 , 612
BlobTree method 618–620, 619, 620, 621
BlobTree 方法618 –620 , 619 , 620 , 621
convolution surfaces 602–603, 602, 603
卷积曲面602 –603 , 602 , 603
cube-surface intersections 608–609, 609, 610
立方体表面交点608 –609 , 609 , 610
precise contact modeling 616–618, 616, 617
精确接触建模616 –618 , 616 , 617
independent identity distributed (iid) variables 344
独立身份分布 (iid) 变量344
information visualization 4 see also visualization
信息可视化4另请参阅可视化
instancing 491–493, 491, 499–500, 499
实例491 –493 , 491 , 499 –500 , 499
inverse kinematics (IK) 445–446, 446
逆运动学 (IK) 445 –446 , 446
lattice 206, 232, 285–286, 442–443, 443, 453, 605–607, 607
格子206 , 232 , 285 –286 , 442 –443 , 443 , 453 , 605 –607 , 607
level of detail (LOD) 5, 609–610, 610
细节层次(LOD) 5 , 609 –610 , 610
light; see also visual perception
光;另请参阅视觉感知
line segments 46, 393–395, 400, 401–402, 403
线段46 , 393 –395 , 400 , 401 –402 , 403
lines 37–41, 38, 44–46, 44
第 37行 –41行、第 38 行、第 44 行–46 行、第 44 行
matrices; see also transformation matrices
矩阵;另请参阅变换矩阵
meshes 491–492, 491, 615–616, 616–620
网格491 –492 , 491 , 615 –616 , 616 – 620
modeling
造型
Monte Carlo integration 57–59, 335, 344–346
蒙特卡洛积分57 –59 , 335 , 344 –346
motion capture 205–206, 430, 448–449, 449
动作捕捉205 –206 , 430 , 448 –449 , 449
multidimensional arrays, tiling 329–332, 330
多维数组,平铺329 –332 , 330
object recognition 557–560, 559 see also visual perception
物体识别557 –560 , 559参见视觉感知
OpenGL programming 464–465, 468–469
OpenGL编程464 –465 , 468 –469
optic flow 550–551, 551, 563–565, 564, 565
光流550 –551 , 551 , 563 –565 , 564 , 565
orthographic projection transformation 160–162, 480–481
正交投影变换160 –162 , 480 –481
orthographic views 84–85, 85, 160–161, 161
正交视图84 –85 , 85 , 160 –161 , 161
parallelogram 107–109, 107, 108, 109, 114–115, 115
平行四边形107 –109 , 107 , 108 , 109 , 114 –115 , 115
parametric curves 44–46, 384–388, 388 see also curves
参数曲线44 –46 , 384 –388 , 388另请参阅曲线
per-fragment shading 197, 197, 485–492, 485, 493
每个片段着色197 , 197 , 485 –492 , 485 , 493
per-vertex shading 196–197, 197, 481–485, 484
每个顶点着色196 –197 , 197 , 481 –485 , 484
perspective projection 82, 82, 167–171, 169
透视投影82 , 82 , 167 –171 , 169
Phong exponent 93, 102 see also Blinn-Phong shading
Phong 指数93 , 102另请参阅Blinn-Phong 阴影
physics-based rendering
基于物理的渲染
pictorial cues 545, 552–553, 556–557, 557
图形提示545 , 552 –553 , 556 –557 , 557
pixels
像素
precise contact modeling (PCM) 616–618, 616, 617
精确接触建模(PCM) 616 –618 , 616 , 617
probability
可能性
projective transformations 158, 164–167, 164, 166, 167
射影变换158 , 164 –167 , 164 , 166 , 167
random points, choosing 347–355, 349, 353, 354
随机点,选择347 –355 、 349 、 353 、 354
ray tracing 79–80, 307–308, 308, 309
射线追踪79 –80 , 307 –308 , 308 , 309
reconstruction 219, 220, 233–238, 234, 235, 236, 238, 249–250, 251
重建219 , 220 , 233 –238 , 234 , 235 , 236 , 238 , 249 –250 , 251
reflection
反射
rendering
渲染
resampling 233–238, 234, 235, 236, 238, 249–250, 251
重采样233 –238 , 234 , 235 , 236 , 238 , 249 –250 , 251
RGB color 70–75, 74, 75, 517–518, 518, 534–537 see also color
RGB 颜色70 –75 , 74 , 75 , 517 –518 , 518 , 534 –537另请参阅颜色
rotation 130–132, 131, 132, 142–144
旋转130 –132 , 131 , 132 , 142 –144
sampling
采样
Scaling transformations 128–129, 128, 129
缩放变换128 –129 , 128 , 129
separable filters 227–229, 228, 229
可分离滤波器227 –229 , 228 , 229
shaders 465, 471–474, 478, 501, 501
着色器465 , 471 –474 , 478 , 501 , 501
shading
阴影
Blinn-Phong shading 488–492, 493
Blinn-Phong 着色488 –492 , 493
per-fragment shading 197, 197, 485–492, 485, 493
每个片段着色197 , 197 , 485 –492 , 485 , 493
per-vertex shading 196–197, 197, 481–485, 484
每个顶点着色196 –197 , 197 , 481 –485 , 484
textures and 493–499, 495, 498, 554–555, 555, 556
纹理和493 –499 , 495 , 498 , 554 –555 , 555 , 556
sigmoid functions 583–588, 584, 585
S 型函数583 –588 , 584 , 585
signal processing
信号处理
singular value decomposition (SVD) 121–123, 138–140, 141
奇异值分解(SVD) 121 –123 , 138 –140 , 141
size-distance relationships 560–563, 561, 562
尺寸-距离关系560 –563 , 561 , 562
software
软件
space partitioning 309, 310, 319–320, 319, 605–611
空间分区309 , 310 , 319 –320 , 319 , 605 –611
spatial layouts 657–658 see also visualization
空间布局657 –658另见可视化
spatial vision 544–557 see also visual perception
空间视觉544 – 557另见视觉感知
spectral tristimulus values 509–510, 509
光谱三刺激值509 –510 , 509
spectrum 247–250, 247, 250, 251, 503–504, 504, 533, 537
光谱247 –250 , 247 , 250 , 251 , 503 –504 , 504 , 533 , 537
spherical coordinate system 47, 47, 261, 268
球面坐标系47 , 47 , 261 , 268
symmetric eigenvalue decomposition 136–138, 136, 137, 138
对称特征值分解136 –138 , 136 , 137 , 138
tent filter 217–218, 217, 224, 224, 226, 228
帐篷过滤器217 –218 , 217 , 224 , 224 , 226 , 228
texture coordinate functions
纹理坐标函数
textures
纹理
tiling multidimensional arrays 329–332, 330
平铺多维数组329 –332 , 330
tone reproduction operators 569–572
音调再现算子569 –572
dynamic range 570, 571, 572, 573–575, 574, 588–590, 589, 590
动态范围570 , 571 , 572 , 573 –575 , 574 , 588 –590 , 589 , 590
filters 578, 578, 582–583, 583, 586–588, 586, 587, 588, 589
过滤器578 , 578 , 582 –583 , 583 , 586 –588 , 586 , 587 , 588 , 589
night tonemapping 591–592, 591, 592
夜间色调映射591 –592 , 591 , 592
sigmoid functions 583–588, 584, 585
S 型函数583 –588 , 584 , 585
transformations
转型
triangle-neighbor structure 299–301, 299
三角邻域结构299 –301 , 299
video games 3 see also game development
视频游戏3另请参阅游戏开发
viewing transformations 157–164, 159 see also transformations
查看变换157 –164 , 159另请参阅变换
visual acuity 528, 528, 538–541, 538
视力528 , 528 , 538 –541 , 538
visual channel characteristics 652–655, 653, 654 see also visualization
视觉通道特征652 –655 , 653 , 654参见可视化
visual encoding principles 652–660 see also visualization
视觉编码原理652 – 660另见可视化
visualization 645–648, 660, 666–668
可视化645 –648 , 660 , 666 –668