How Computers Create Realistic 3D Worlds
Spring 2024
Rivers’ computer science club is a new club we (Daniel
Connelly ‘25 and I) started this year. Meeting on
Tuesday once every two weeks, we typically discuss
computer related topics like code debugging, data
mining, ChatGPT (and how it works from a technical
perspective), or 3D graphics rendering. These
discussion-based meetings do not require prior
experience with computer programming. In the future, we
hope to do more project focused work, where members use
their programming skills to explore and develop things
they are passionate about. This article explains the
details of computer graphics, a topic we discussed in a
February meeting.
In today’s world, video games contain stunning 3D worlds with realistic grass, trees, water, and interiors with reflections and immersive post processing effects. Many of the processes used to create these worlds have existed since 3D rendering was invented in the 1970s, but compared to how many people interact with computer graphics every day, few actually know what these processes are and how they are used.
The first step to rendering shapes in 3D is defining the geometry of those shapes. A 3D shape, often referred to as a mesh, contains vertices (points in space) and faces that connect these vertices. Faces are almost always triangles because triangles only require the connection of three vertices and are the most efficient to render. Triangles are defined by the vertices that make them up – so if a triangle connects a mesh’s first, second and third vertex, it would be defined as 0, 1, 2 (computers start counting at 0). A cube, for example, has six vertices and twelve triangles – two for each side. More complicated shapes have more triangles and vertices and therefore take longer to render. It is important to note that vertex positions and triangles are not manually defined, unless a mesh is extremely simple. Rather, meshes are produced in 3D modeling softwares (e.g. Blender, Maya, 3ds Max) and exported to text files (often in the .obj or .fbx format) containing long lists of vertex positions and triangles that computers are able to read and process.
In today’s world, video games contain stunning 3D worlds with realistic grass, trees, water, and interiors with reflections and immersive post processing effects. Many of the processes used to create these worlds have existed since 3D rendering was invented in the 1970s, but compared to how many people interact with computer graphics every day, few actually know what these processes are and how they are used.
The first step to rendering shapes in 3D is defining the geometry of those shapes. A 3D shape, often referred to as a mesh, contains vertices (points in space) and faces that connect these vertices. Faces are almost always triangles because triangles only require the connection of three vertices and are the most efficient to render. Triangles are defined by the vertices that make them up – so if a triangle connects a mesh’s first, second and third vertex, it would be defined as 0, 1, 2 (computers start counting at 0). A cube, for example, has six vertices and twelve triangles – two for each side. More complicated shapes have more triangles and vertices and therefore take longer to render. It is important to note that vertex positions and triangles are not manually defined, unless a mesh is extremely simple. Rather, meshes are produced in 3D modeling softwares (e.g. Blender, Maya, 3ds Max) and exported to text files (often in the .obj or .fbx format) containing long lists of vertex positions and triangles that computers are able to read and process.
If meshes are placed into a scene (meaning 3D world)
straight from the lists containing their geometry, they
would all overlap right at that scene’s origin. They
must be transformed – translated (moved around),
rotated, or scaled – in order to be in their proper
positions. Moving around a mesh is done by multiplying
each of its vertices’ positions by a matrix that
corresponds to a specific translation, rotation, and
scale. Matrices are grids of numbers that, in the
context of graphics programming, are generally four by
four in size and can be multiplied with vectors
(quantities with both direction and magnitude, like a 3D
coordinate) to modify them. While you can interact with
matrices in a linear algebra class, professional
programmers typically rely on math libraries to do the
math for them.
Creating the effect of perspective, where distant objects appear smaller than nearby objects, is done with more matrix math. The analogy of a camera in a scene with a position, orientation, and field of view is used commonly when creating perspective. All vertices within the camera’s frustum – or the physical area in which it can see – are multiplied by a matrix derived from the camera’s properties. This transformation projects these vertices onto a 2D plane, and this 2D plane is the computer screen. The camera is just an analogy because only the scene’s vertex positions are changing. The position of the viewpoint is technically stationary, and the scene is moving around it.
. Computers need to be able to turn mathematical
representations of shapes into pixels that humans can see.
One of the final steps in simple rendering is called
rasterization. Computers need to be able to turn
mathematical representations of shapes into pixels that
humans can see. A pixel is a tiny part of a larger screen
that is made up of just one color. Rasterization means
going through each pixel and determining which triangle in
the perspective projection (2D image of a scene) it fits
into and coloring it accordingly. Fortunately, graphics
APIs like OpenGL or Vulkan already have rasterization
algorithms and can do rasterization for us.
Creating the effect of perspective, where distant objects appear smaller than nearby objects, is done with more matrix math. The analogy of a camera in a scene with a position, orientation, and field of view is used commonly when creating perspective. All vertices within the camera’s frustum – or the physical area in which it can see – are multiplied by a matrix derived from the camera’s properties. This transformation projects these vertices onto a 2D plane, and this 2D plane is the computer screen. The camera is just an analogy because only the scene’s vertex positions are changing. The position of the viewpoint is technically stationary, and the scene is moving around it.
Of course, there are more steps to this process. 3D objects have to be sorted by depth in order to determine what should be visible to the camera. Each pixel of a 3D object (sometimes called a fragment) is colored in a separate program called a fragment shader to give it realistic lighting. While there is always more to learn, we hope this article gives you a good introduction on this topic and inspires you to learn more about computer science in the future.
Sources:
https://www.techspot.com/article/1851-3d-game-rendering-explained/