94 Matching Annotations

Nov 2022
mrxiao.net mrxiao.net

理解Python内建对象共享内存的Copy-On-Write: 记一次排查PyTorch DataLoader内存泄漏问题的经过 - MrXiao

1
1. sherlockliao 10 Nov 2022
  
  in Public
  
  在基于fork的多进程实现中，每次fork会让子进程得到不同的虚拟空间地址，但此时其映射的还是父进程的物理内存空间，可以让子进程高效率读取父进程的memory数据。一旦子进程有写操作就会触发操作系统的copy-on-write异常，系统会拷贝出另一块空间供子进程使用。
  
  什么情况下会触发 copy-on-write?
  
  dataloader copy-on-write
Visit annotations in context

Tags

dataloader

copy-on-write

Annotators

sherlockliao

URL

mrxiao.net/do-not-use-built-in-pyobjects-in-python-multiprocessing.html
Jun 2022
www.cs.sfu.ca www.cs.sfu.ca

Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson)Global Edition.pdf

4
1. sherlockliao 10 Jun 2022
  
  in Public
  
  Once a compiler must resort to register spilling, any advantage of maintainingmultiple accumulators will most likely be lost.
  
  register spilling 是指什么现象？
  
  register register spilling unrolling
2. sherlockliao 10 Jun 2022
  
  in Public
  
  a reassociation transformation can reduce the number of opera-tions along the critical path in a computation, resulting in better performance bybetter utilizing the multiple functional units and their pipelining capabilities.
  
  reassociation 具体的做法是什么？
  
  pipeline reassociation critical path
3. sherlockliao 09 Jun 2022
  
  in Public
  
  Loop unrolling can improve performance in two ways. First,it reduces the number of operations that do not contribute directly to the programresult, such as loop indexing and conditional branching. Second, it exposes waysin which we can further transform the code to reduce the number of operationsin the critical paths of the overall computation.
  
  loop unrolling 为什么可以提高 performance？
  
  unrolling
4. sherlockliao 09 Jun 2022
  
  in Public
  
  For an operation with latency L and capacity C, thisrequires an unrolling factor k ≥ C
  
  unrolling factor 需要如何设置来保证满流水？
  
  pipeline unrolling
Visit annotations in context

Tags

pipeline

unrolling

critical path

register

reassociation

register spilling

Annotators

sherlockliao

URL

cs.sfu.ca/~ashriram/Courses/CS295/assets/books/CSAPP_2016.pdf
Apr 2022
www.cs.sfu.ca www.cs.sfu.ca

Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson)Global Edition.pdf

9
1. sherlockliao 30 Apr 2022
  
  in Public
  
  are evaluated simultaneously, a phenomenon referred to as instruction-level paral-lelism.
  
  什么是指令并行?
  
  instruction-level parallelism
2. sherlockliao 29 Apr 2022
  
  in Public
  
  If a compiler cannotdetermine whether or not two pointers may be aliased, it must assume that eithercase is possible, limiting the set of possible optimizations.
  
  pointer alias 的 optimization block 怎么理解？
  
  memory aliasing pointer optimization block
3. sherlockliao 29 Apr 2022
  
  in Public
  
  The case where two pointers may designate the same memory location isknown as memory aliasing.
  
  什么是 memory alias？
  
  pointer memory location memory aliasing
4. sherlockliao 24 Apr 2022
  
  in Public
  
  Focus your attention on the inner loops, where the bulk of the computationsand memory accesses occur.. Try to maximize the spatial locality in your programs by reading data objectssequentially, with stride 1, in the order they are stored in memory.. Try to maximize the temporal locality in your programs by using a data objectas often as possible once it has been read from memory.
  
  为了写出有效率的程序，应该考虑哪些因素？
  
  Memory spatial locality temporal locality stride
5. sherlockliao 21 Apr 2022
  
  in Public
  
  Repeated references to local variables are good because the compiler cancache them in the register file (temporal locality).. Stride-1 reference patterns are good because caches at all levels of the memoryhierarchy store data as contiguous blocks (spatial locality).
  
  重复使用 local variable 以及 stride-1 pattern 为什么是 cache-friendly 的？
  
  cache cache-friendly cache hits cache misses
6. sherlockliao 13 Apr 2022
  
  in Public
  
  we suggest adopting a mental model that assumeswrite-back, write-allocate caches.
  
  write hit 和 write miss 建议采用哪种模式进行思考？
  
  cache write write hit write miss
7. sherlockliao 13 Apr 2022
  
  in Public
  
  fully associative caches are only appropriate for small caches
  
  fully associative caches 适合什么场景？
  
  cache fully associative cache
8. sherlockliao 10 Apr 2022
  
  in Public
  
  A copy of w is contained in the line if and only if the valid bit is setand the tag in the cache line matches the tag in the address of w.
  
  如何判断要读取的 w 在 cache line 里面？
  
  cache hits cache misses
9. sherlockliao 10 Apr 2022
  
  in Public
  
  The process that a cache goes through of determining whether a request is ahit or a miss and then extracting the requested word consists of three steps: (1) setselection, (2) line matching, and (3) word extraction.
  
  process 请求内存，有哪三个步骤？
  
  cache cache hits cache misses Memory
Visit annotations in context

Tags

write

write miss

instruction-level parallelism

optimization block

Memory

write hit

memory location

spatial locality

cache

fully associative cache

memory aliasing

stride

cache hits

pointer

temporal locality

cache-friendly

cache misses

Annotators

sherlockliao

URL

cs.sfu.ca/~ashriram/Courses/CS295/assets/books/CSAPP_2016.pdf
www.learncpp.com www.learncpp.com

M.8 — Circular dependency issues with std::shared_ptr, and std::weak_ptr – Learn C++

1
1. sherlockliao 18 Apr 2022
  
  in Public
  
  std::shared_ptr can be used when you need multiple smart pointers that can co-own a resource. The resource will be deallocated when the last std::shared_ptr goes out of scope. std::weak_ptr can be used when you want a smart pointer that can see and use a shared resource, but does not participate in the ownership of that resource.
  
  weak_ptr 的适用场景
  
  smart pointer weak pointer shared pointer ownership
Visit annotations in context

Tags

shared pointer

ownership

smart pointer

weak pointer

Annotators

sherlockliao

URL

learncpp.com/cpp-tutorial/circular-dependency-issues-with-stdshared_ptr-and-stdweak_ptr/
www.learncpp.com www.learncpp.com

15.6 — std::shared_ptr | Learn C++

1
1. sherlockliao 18 Apr 2022
  
  in Public
  
  Always make a copy of an existing std::shared_ptr if you need more than one std::shared_ptr pointing to the same resource.
  
  如果要创建多个 shared_ptr，推荐的做法是什么？
  
  smart pointer shared pointer
Visit annotations in context

Tags

shared pointer

smart pointer

Annotators

sherlockliao

URL

learncpp.com/cpp-tutorial/stdshared_ptr/
Local file Local file

Memory-efficient array redistribution through portable collective communicationMemory-efficient array redistribution through portable collective communication

2
1. sherlockliao 05 Apr 2022
  
  in Public
  
  Redistribution can easily become a bottleneck due to the bandwidthof cross-device links usually being magnitudes smaller than that of the on-device memory bus.
  
  redistribution arrays 可能会遇到什么问题？
  
  redistribution communication bandwidth Memory
2. sherlockliao 05 Apr 2022
  
  in Public
  
  Modern large-scale deep learning workloads highlight the need for parallel execution across many devicesin order to fit model data into hardware accelerator memories. In these settings, array redistribution maybe required during a computation, but can also become a bottleneck if not done efficiently
  
  为什么需要 array redistribution?
  
  autoparallel redistribution communication
Tags

Memory

communication

redistribution

bandwidth

autoparallel

Annotators

sherlockliao
www.learncpp.com www.learncpp.com

15.5 — std::unique_ptr | Learn C++

4
1. sherlockliao 03 Apr 2022
  
  in Public
  
  Second, don’t manually delete the resource out from underneath the std::unique_ptr.
  
  有什么误用 std::unique_ptr 的情况？
  
  unique_ptr smart pointer deallocate memory
2. sherlockliao 03 Apr 2022
  
  in Public
  
  Use std::make_unique() instead of creating std::unique_ptr and using new yourself.
  
  推荐的创建 std::unique_ptr 的方式是什么？有什么好处？
  
  unique_ptr smart pointer
3. sherlockliao 03 Apr 2022
  
  in Public
  
  Favor std::array, std::vector, or std::string over a smart pointer managing a fixed array, dynamic array, or C-style string.
  
  对于固定的 array，动态 array 和字符串，更推荐使用哪种类型？
  
  smart pointer unique_ptr array string
4. sherlockliao 03 Apr 2022
  
  in Public
  
  Because std::unique_ptr is designed with move semantics in mind, copy initialization and copy assignment are disabled. If you want to transfer the contents managed by std::unique_ptr, you must use move semantics.
  
  std::unique_ptr 可以使用 copy 初始化吗？
  
  smart pointer unique_ptr copy constructor move semantics move constructor
Visit annotations in context

Tags

string

smart pointer

move semantics

move constructor

deallocate memory

copy constructor

array

unique_ptr

Annotators

sherlockliao

URL

learncpp.com/cpp-tutorial/stdunique_ptr/
www.learncpp.com www.learncpp.com

M.5 — std::move_if_noexcept – Learn C++

1
1. sherlockliao 03 Apr 2022
  
  in Public
  
  std::move_if_noexcept will return a movable r-value if the object has a noexcept move constructor, otherwise it will return a copyable l-value. We can use the noexcept specifier in conjunction with std::move_if_noexcept to use move semantics only when a strong exception guarantee exists (and use copy semantics otherwise).
  
  如果在 move 过程中遇到异常，有什么办法可以处理？
  
  move semantics exception r-value l-value
Visit annotations in context

Tags

move semantics

r-value

exception

l-value

Annotators

sherlockliao

URL

learncpp.com/cpp-tutorial/stdmove_if_noexcept/
www.learncpp.com www.learncpp.com

M.4 — std::move – Learn C++

1
1. sherlockliao 03 Apr 2022
  
  in Public
  
  std::move can be used whenever we want to treat an l-value like an r-value for the purpose of invoking move semantics instead of copy semantics.
  
  std::move 在什么情况下可以使用？
  
  move semantics l-value r-value copy semantics
Visit annotations in context

Tags

move semantics

r-value

l-value

copy semantics

Annotators

sherlockliao

URL

learncpp.com/cpp-tutorial/stdmove/
www.learncpp.com www.learncpp.com

15.3 — Move constructors and move assignment | Learn C++

2
1. sherlockliao 03 Apr 2022
  
  in Public
  
  the goal of the move constructor and move assignment is to move ownership of the resources from one object to another (which is typically much less expensive than making a copy).
  
  move constructor 和 move assignment 的目的是什么？
  
  constructor move constructor move assignment move semantics
2. sherlockliao 03 Apr 2022
  
  in Public
  
  By default, C++ will provide a copy constructor and copy assignment operator if one is not explicitly provided. These compiler-provided functions do shallow copies, which may cause problems for classes that allocate dynamic memory. So classes that deal with dynamic memory should override these functions to do deep copies.
  
  c++ 默认提供什么样的 copy constructor，这会导致什么问题？
  
  cpp copy constructor constructor shallow copy dynamic memory
Visit annotations in context

Tags

move constructor

move semantics

shallow copy

dynamic memory

constructor

copy constructor

cpp

move assignment

Annotators

sherlockliao

URL

learncpp.com/cpp-tutorial/move-constructors-and-move-assignment/
www.learncpp.com www.learncpp.com

15.2 — R-value references | Learn C++

1
1. sherlockliao 03 Apr 2022
  
  in Public
  
  First, r-value references extend the lifespan of the object they are initialized with to the lifespan of the r-value reference (l-value references to const objects can do this too). Second, non-const r-value references allow you to modify the r-value!
  
  R-value references 有什么性质非常有用？
  
  r-value reference reference r-value
Visit annotations in context

Tags

r-value

reference

r-value reference

Annotators

sherlockliao

URL

learncpp.com/cpp-tutorial/rvalue-references/
Mar 2022
www.learncpp.com www.learncpp.com

M.1 — Intro to smart pointers and move semantics – Learn C++

2
1. sherlockliao 22 Mar 2022
  
  in Public
  
  Move semantics means the class will transfer ownership of the object rather than making a copy.
  
  move semantics 是什么意思？
  
  cpp move semantics ownership copy
2. sherlockliao 22 Mar 2022
  
  in Public
  
  A Smart pointer is a composition class that is designed to manage dynamically allocated memory and ensure that memory gets deleted when the smart pointer object goes out of scope.
  
  smart pointer 是什么？有什么好处？
  
  smart pointer cpp pointer memory
Visit annotations in context

Tags

smart pointer

move semantics

copy

cpp

memory

pointer

ownership

Annotators

sherlockliao

URL

learncpp.com/cpp-tutorial/introduction-to-smart-pointers-move-semantics/
www.cs.cmu.edu www.cs.cmu.edu

ppt session 2

10
1. sherlockliao 20 Mar 2022
  
  in Public
  
  1. Multiple strong symbols are not allowed○ Each item can be defined only once2. Given a strong symbol and multiple weak symbols, choose the strong symbol○ References to the weak symbol resolve to the strong symbol3. If there are multiple weak symbols, pick an arbitrary one
  
  linker 如何解决重复符号定义的问题？
  
  strong symbols week symbols linking linker symbol definitions
2. sherlockliao 20 Mar 2022
  
  in Public
  
  ● Relocatable object file (.o file)○ Code and data that can be combined with other relocatable object files to form executable object file■ Each .o file is produced from exactly one source (.c) file● Executable object file (a.out file)○ Code and data that can be copied directly into memory and then executed● Shared object file (.so file)○ Special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run-time
  
  compile 之后的 object files 有哪几种类型？
  
  compile object file relocation shared object file
3. sherlockliao 20 Mar 2022
  
  in Public
  
  ● Static Linking○ Executable files and running memory images contain only the library code they actually use● Dynamic linking○ Executable files contain no library code○ During execution, single copy of library code can be shared across all executing processes
  
  static linking 和 dynamic linking 分别是什么？
  
  linking static linking dynamic linking
4. sherlockliao 20 Mar 2022
  
  in Public
  
  ● Modularity○ Program can be written as a collection of smaller source files, rather than one monolithic mass.● Efficiency○ Time: Separate compilation■ Change one source file, compile, and then relink. No need to recompile other source files.○ Space: Libraries■ Common functions can be aggregated into a single file...
  
  linker 有什么好处？
  
  linker compile
5. sherlockliao 20 Mar 2022
  
  in Public
  
  ● Global symbols○ Symbols defined by module m that can be referenced by other modules.■ e.g., non-static C functions and non-static global variables.● External symbols○ Global symbols that are referenced by module m but defined by some other module.● Local symbols○ Symbols that are defined and referenced exclusively by module m.■ e.g., C functions and global variables defined with the static attribute.○ Local linker symbols are not local program variables
  
  分别有哪些 linker symbol？
  
  linking linker symbol global symbols external symbols local symbols cpp
6. sherlockliao 20 Mar 2022
  
  in Public
  
  ● Symbol resolution○ Programs define and reference symbols (global variables and functions)○ Linker associates each symbol reference with exactly 1 symbol definition● Relocation○ Merges separate code and data sections into single sections○ Relocates symbols from relative locations in .o files to final memory locations○ Updates all references to symbols to reflect new positions
  
  linker 到底做了什么？
  
  linking cpp symbol resolution relocation
7. sherlockliao 20 Mar 2022
  
  in Public
  
  ● Aggregates multiple independently compiled files containing machine code● Fills in those unknown addresses● The goal is to create 1 file with all of the needed code to run the program
  
  linker 的流程是什么？
  
  compile linking cpp
8. sherlockliao 20 Mar 2022
  
  in Public
  
  ○ This changes the format and structure of the code but preserves the semantics (what it does)○ Can change lots of details for optimization, as long as the overall effect is the same
  
  compiler 部分的流程是什么？
  
  compile cpp assemble-code
9. sherlockliao 20 Mar 2022
  
  in Public
  
  ● Processes #include, #define, #if, macros○ Combines main source file with headers (textually)○ Defines and expands macros (token-based shorthand)○ Conditionally removes parts of the code (e.g. specialize for Linux, Mac, ...)● Removes all comments
  
  Pre-Processor 部分的流程是什么？
  
  compile cpp preprocessor
10. sherlockliao 20 Mar 2022
  
  in Public
  
  Four steps for C: preprocessing, compiling, assembling, linking
  
  compile code 有哪 4 步？
  
  cpp compile linking
Visit annotations in context

Tags

symbol resolution

symbol definitions

linking

assemble-code

preprocessor

cpp

strong symbols

relocation

shared object file

dynamic linking

external symbols

linker

week symbols

compile

linker symbol

local symbols

global symbols

static linking

object file

Annotators

sherlockliao

URL

cs.cmu.edu/afs/cs/academic/class/15213-f21/www/bootcamps/lab3_slides.pdf
www.cs.sfu.ca www.cs.sfu.ca

Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson)Global Edition.pdf

13
1. sherlockliao 20 Mar 2022
  
  in Public
  
  Restrictive placement policies of this kind lead to a type of miss known asa conflict miss, in which the cache is large enough to hold the referenced dataobjects, but because they map to the same cache block, the cache keeps missing.
  
  如何理解 conflict miss？
  
  cache misses cache conflict miss memory
2. sherlockliao 20 Mar 2022
  
  in Public
  
  When the size of the working set exceedsthe size of the cache, the cache will experience what are known as capacity misses.
  
  什么是 capacity miss？
  
  cache misses cache capacity miss memory
3. sherlockliao 20 Mar 2022
  
  in Public
  
  For caches high in the memory hierarchy (close tothe CPU) that are implemented in hardware and where speed is at a premium,this policy is usually too expensive to implement because randomly placed blocksare expensive to locate.
  
  cache 等级高的 memory 为什么不要实现最灵活的 placement policy？
  
  cache cache misses cache replacement cache placement policy
4. sherlockliao 20 Mar 2022
  
  in Public
  
  The decision about which block to replace is governed by the cache’s replacementpolicy.
  
  当 cache misses 发生的时候，需要做什么事情，有哪些方式？
  
  cache misses cache replacement memory
5. sherlockliao 20 Mar 2022
  
  in Public
  
  a program needs a particular data object d from level k + 1, it first looksfor d in one of the blocks currently stored at level k. If d happens to be cachedat level k, then we have what is called a cache hit.
  
  什么是 cache hits？什么是 cache misses？
  
  cache Memory memory hierarchy cache hits cache misses
6. sherlockliao 20 Mar 2022
  
  in Public
  
  It is important to realize that while the block size is fixedbetween any particular pair of adjacent levels in the hierarchy, other pairs of levelscan have different block sizes.
  
  在 memory hierarchy 之间的 block size 有什么特点？
  
  memory hierarchy cache block size
7. sherlockliao 20 Mar 2022
  
  in Public
  
  The central idea of a memory hierarchy is that for each k, the faster and smallerstorage device at level k serves as a cache for the larger and slower storage device
  
  memory hierarchy 的中心想法是什么？该如何理解？
  
  Memory memory hierarchy cache
8. sherlockliao 20 Mar 2022
  
  in Public
  
  Programs that repeatedly reference the same variables enjoy good temporallocality..For programs with stride-k reference patterns, the smaller the stride, thebetter the spatial locality. Programs with stride-1 reference patterns have goodspatial locality. Programs that hop around memory with large strides havepoor spatial locality..Loops have good temporal and spatial locality with respect to instructionfetches. The smaller the loop body and the greater the number of loop it-erations, the better the locality.
  
  locality 总结起来的特点是什么？
  
  locality spatial locality temporal locality instruction fetch
9. sherlockliao 20 Mar 2022
  
  in Public
  
  Visiting every kth element of a contiguous vector is called a stride-kreference pattern. Stride-1 reference patterns are a common and important sourceof spatial locality in programs. In general, as the stride increases, the spatial localitydecreases.
  
  stride-k reference pattern 是指什么？
  
  stride locality spatial locality
10. sherlockliao 17 Mar 2022
  
  in Public
  
  Their alignment rule is based on the principle that any primitiveobject of K bytes must have an address that is a multiple of K.
  
  data alignment 的原则是什么？
  
  data alignment Memory
11. sherlockliao 02 Mar 2022
  
  in Public
  
  The disadvantage of the two-dimensional array organization isthat addresses must be sent in two distinct steps, which increases the access time.
  
  two-dimensional array 的缺点是什么？
  
  RAM DRAM
12. sherlockliao 02 Mar 2022
  
  in Public
  
  One reason circuit designers organize DRAMs as two-dimensional arraysinstead of linear arrays is to reduce the number of address pins on the chip.
  
  DRAMs 被设计成 two-dimensional array 的原因是什么？
  
  RAM DRAM
13. sherlockliao 02 Mar 2022
  
  in Public
  
  The memory system must periodically refresh every bit of memory byreading it out and then rewriting it.
  
  DRAM 不稳定，在计算机中如何防止其变化？
  
  RAM Cache Main Memory Memory
Visit annotations in context

Tags

RAM

DRAM

instruction fetch

cache replacement

Memory

memory

cache

spatial locality

cache capacity miss

Cache

memory hierarchy

cache conflict miss

data alignment

cache hits

stride

locality

block size

cache placement policy

temporal locality

Main Memory

cache misses

Annotators

sherlockliao

URL

cs.sfu.ca/~ashriram/Courses/CS295/assets/books/CSAPP_2016.pdf
github.com github.com

json-tutorial/tutorial03 at master · miloyip/json-tutorial

1
1. sherlockliao 17 Mar 2022
  
  in Public
  
  初始大小以宏 LEPT_PARSE_STACK_INIT_SIZE 的形式定义，使用 #ifndef X #define X ... #endif 方式的好处是，使用者可在编译选项中自行设置宏，没设置的话就用缺省值。
  
  letjson c++ macro
Visit annotations in context

Tags

macro

letjson

c++

Annotators

sherlockliao

URL

github.com/miloyip/json-tutorial/blob/master/tutorial03/tutorial03.md
Feb 2022
cs.brown.edu cs.brown.edu

CS 131/CSCI 1310: Fundamentals of Computer Systems

1
1. sherlockliao 23 Feb 2022
  
  in Public
  
  The %rip register on x86-64 is a special-purpose register that always holds the memory address of the next instruction to execute in the program's code segment.
  
  %rip 有什么作用？
  
  register assembly code
Visit annotations in context

Tags

register

assembly code

Annotators

sherlockliao

URL

cs.brown.edu/courses/csci1310/
www.cs.cmu.edu www.cs.cmu.edu

recitation04-bomblab.pptx

1
1. sherlockliao 23 Feb 2022
  
  in Public
  
  • %rax: return value• %rsp: stack pointer• %rdi: 1st argument• %rsi: 2nd argument• %rdx: 3rd argument• %rcx: 4th argument• %r8: 5th argument• %r9: 6th argument
  
  有那几个常用且重要的 register?
  
  register assembly code
Visit annotations in context

Tags

register

assembly code

Annotators

sherlockliao

URL

cs.cmu.edu/afs/cs/academic/class/15213-f21/www/recitations/rec04_slides.pdf
www.cs.sfu.ca www.cs.sfu.ca

Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson)Global Edition.pdf

7
1. sherlockliao 20 Feb 2022
  
  in Public
  
  To manage a variable-size stack frame, x86-64 code uses register %rbp to serveas a frame pointer
  
  frame pointer 什么情况下会使用？
  
  array variable-size register stack
2. sherlockliao 20 Feb 2022
  
  in Public
  
  The techniques we have outlined—randomization, stack protection, and lim-iting which portions of memory can hold executable code—are three of the mostcommon mechanisms used to minimize the vulnerability of programs to bufferoverflow attacks
  
  有什么技术可以保护程序免收攻击？
  
  stack memory attack
3. sherlockliao 05 Feb 2022
  
  in Public
  
  The array elements areordered in memory in row-major order, meaning all elements of row 0, whichcan be written A[0], followed by all elements of row 1 (A[1]), and so on.
  
  array 在 memory 中的排列顺序是怎么样的？
  
  array memory memory location
4. sherlockliao 05 Feb 2022
  
  in Public
  
  The final example shows that one cancompute the difference of two pointers within the same data structure, with theresult being data having type long and value equal to the difference of the twoaddresses divided by the size of the data type.
  
  如何计算两个 pointers 的差？
  
  pointer memory location
5. sherlockliao 03 Feb 2022
  
  in Public
  
  if p is a pointer to dataof type T , and the value of p is xp, then the expression p+i has value xp + L . i,where L is the size of data type T
  
  指针运算如何进行？
  
  pointer
6. sherlockliao 02 Feb 2022
  
  in Public
  
  convention, registers %rbx, %rbp, and %r12–%r15 are classified as callee-saved registers. When procedure P calls procedure Q, Q must preserve the valuesof these registers, ensuring that they have the same values when Q returns to P asthey did when Q was called
  
  callee-saved register 有什么作用，应该如何理解？
  
  callee register
7. sherlockliao 01 Feb 2022
  
  in Public
  
  At times, however, local data mustbe stored in memory. Common cases of this include these:.There are not enough registers to hold all of the local data..The address operator ‘&’ is applied to a local variable, and hence we must beable to generate an address for it..Some of the local variables are arrays or structures and hence must be accessedby array or structure references.
  
  什么时候 local data 必须要被存放在 memory 里面？
  
  memory local variable
Visit annotations in context

Tags

attack

variable-size

register

callee

memory

memory location

pointer

array

local variable

stack

Annotators

sherlockliao

URL

cs.sfu.ca/~ashriram/Courses/CS295/assets/books/CSAPP_2016.pdf
Jan 2022
www.cs.sfu.ca www.cs.sfu.ca

Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson)Global Edition.pdf

7
1. sherlockliao 26 Jan 2022
  
  in Public
  
  When an x86-64 procedure requires storage beyond what it can hold in reg-isters, it allocates space on the stack. This region is referred to as the procedure’s
  
  什么是 stack frame？
  
  stack
2. sherlockliao 17 Jan 2022
  
  in Public
  
  The advantage of usinga jump table over a long sequence of if-else statements is that the time taken toperform the switch is independent of the number of switch cases.
  
  jump table 对比 if-else 的优势是什么？
  
  jump table switch statement
3. sherlockliao 13 Jan 2022
  
  in Public
  
  If one of those two expressions couldpossibly generate an error condition or a side effect, this could lead to invalidbehavior. Such is the case for our earlier example
  
  有什么情况下必须使用 branching 方式，而不能使用 conditional move?
  
  assembly-move conditional move assembly code conditional branch
4. sherlockliao 11 Jan 2022
  
  in Public
  
  The testinstructions behave in the same manner as the and instructions, except that theyset the condition codes without altering their destinations.
  
  test 指令的作用是什么？
  
  instruction set
5. sherlockliao 11 Jan 2022
  
  in Public
  
  The cmp instructions set the condition codes according to the differences of theirtwo operands. They behave in the same way as the sub instructions, except thatthey set the condition codes without updating their destinations.
  
  cmp 指令集的作用是什么？
  
  instruction set
6. sherlockliao 07 Jan 2022
  
  in Public
  
  By using a PC-relativeencoding of the jump targets, the instructions can be compactly encoded (requiringjust 2 bytes), and the object code can be shifted to different positions in memorywithout alteration.
  
  pc-relative encoding 的计算方式是什么，有什么优势？
  
  jump instruction instruction set
7. sherlockliao 06 Jan 2022
  
  in Public
  
  It is important to recognize that the suffixes forthese instructions denote different conditions and not different operand sizes. Forexample, instructions setl and setb denote “set less” and “set below,” not “setlong word” or “set byte.”
  
  set 指令的后缀代表的含义是什么？
  
  instruction set condition code set instruction
Visit annotations in context

Tags

jump instruction

assembly-move

conditional branch

jump table

instruction set

set instruction

conditional move

condition code

switch statement

stack

assembly code

Annotators

sherlockliao

URL

cs.sfu.ca/~ashriram/Courses/CS295/assets/books/CSAPP_2016.pdf
Dec 2021
www.cs.sfu.ca www.cs.sfu.ca

Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson)Global Edition.pdf

2
1. sherlockliao 09 Dec 2021
  
  in Public
  
  one for unsigned (mulq) and one for two’s-complement (imulq) multiplication.For both of these instructions, one argument must be in register %rax, and theother is given as the instruction source operand.
  
  mulq 和 imulq 分别表示什么指令集，他们的操作数有什么要求？
  
  instruction set arithmetic operations multiply operand
2. sherlockliao 08 Dec 2021
  
  in Public
  
  The different shift instructions can specify the shift amount either asan immediate value or with the single-byte register %cl.
  
  shift 指令可以接受哪些操作数？
  
  register instruction set shift operand
Visit annotations in context

Tags

shift

instruction set

register

arithmetic operations

operand

multiply

Annotators

sherlockliao

URL

cs.sfu.ca/~ashriram/Courses/CS295/assets/books/CSAPP_2016.pdf
Nov 2021
www.cs.sfu.ca www.cs.sfu.ca

Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson)Global Edition.pdf

14
1. sherlockliao 30 Nov 2021
  
  in Public
  
  As with themov instructions, the two operands cannot both be memory locations.
  
  binary operation 的两个操作数可以是 memory location 吗？
  
  memory location binary operation instruction set register
2. sherlockliao 30 Nov 2021
  
  in Public
  
  This operand can be either a register ora memory location.
  
  unary 的操作数可以是什么？
  
  register memory unary operation
3. sherlockliao 24 Nov 2021
  
  in Public
  
  The destination operand must be a register.
  
  load effective address 的 destination 需要是什么？
  
  register instruction set assembly code load effective address
4. sherlockliao 24 Nov 2021
  
  in Public
  
  The ability of the leaq instruction to perform addition and limited forms ofmultiplication proves useful when compiling simple arithmetic expressions suchas this example.
  
  leaq 在什么情况下有用？
  
  register instruction set assembly code load effective address
5. sherlockliao 22 Nov 2021
  
  in Public
  
  local variables such as x are often kept in registers rather thanstored in memory locations. Register access is much faster than memory access.
  
  local variables 通过会存在哪里，为什么？
  
  assembly code register memory local variable
6. sherlockliao 22 Nov 2021
  
  in Public
  
  we see that whatwe call “pointers” in C are simply addresses. Dereferencing a pointer involvescopying that pointer into a register, and then using this register in a memoryreference.
  
  dereference pointer 在 assembly code 中如何实现？
  
  #assembly-code pointer deference pointer c++
7. sherlockliao 10 Nov 2021
  
  in Public
  
  One important feature is that memoryreferences in x86-64 are always given with quad word registers, such as %rax, evenif the operand is a byte, single word, or double word.
  
  memory reference 属于那种 register 类型？
  
  memory reference register assembly-move
8. sherlockliao 10 Nov 2021
  
  in Public
  
  logicallybe named movzlq, but this instruction does not exist. Instead, this type of datamovement can be implemented using a movl instruction having a register as thedestination. This technique takes advantage of the property that an instructiongenerating a 4-byte value with a register as the destination will fill the upper 4bytes with zeros.
  
  为什么在 movz 的指令中缺少 movzlq?
  
  register assembly-move instruction set
9. sherlockliao 10 Nov 2021
  
  in Public
  
  in memory, to a register destination. Instructions in the movz class fill out theremaining bytes of the destination with zeros, while those in the movs class fillthem out by sign extension, replicating copies of the most significant bit of thesource operand.
  
  那两种 move 指令针对 copy smaller source 到 larger destination，他们的做法分别是什么？
  
  assembly-move instruction set
10. sherlockliao 10 Nov 2021
  
  in Public
  
  The source operand designates a value that is immediate, stored in a register,or stored in memory. The destination operand designates a location that is either aregister or a memory address. x86-64 imposes the restriction that a move instruc-tion cannot have both operands refer to memory locations. Copying a value fromone memory location to another requires two instructions—the first to load thesource value into a register, and the second to write this register value to the des-tination.
  
  move 的 source operand 和 destination operand 分别可以是哪些类型？
  
  assembly-move operand instruction set
11. sherlockliao 09 Nov 2021
  
  in Public
  
  The most general form is shown at the bottomof the table with syntax Imm(rb,ri,s). Such a reference has four components: animmediate offset Imm, a base register rb, an index register ri, and a scale factors, where s must be 1, 2, 4, or 8. Both the base and index must be 64-bit registers.The effective address is computed as Imm + R[rb] + R[ri] . s.
  
  访问 $$Imm(r_b, r_i, s)$$ 的内存应该如何计算，有哪些限制条件？
  
  instruction set operand assemble-code
12. sherlockliao 09 Nov 2021
  
  in Public
  
  C declaration Intel data type Assembly-code suffix Size (bytes)
  
  不同数据类型的 size 以及在 assembly 中的后缀？
  
  assemble-code c
13. sherlockliao 09 Nov 2021
  
  in Public
  
  A final difference is that we see two additional lines of code (lines8–9). These instructions will have no effect on the program, since they occur afterthe return instruction (line 7). They have been inserted to grow the code for thefunction to 16 bytes, enabling a better placement of the next block of code in termsof memory system performance.
  
  为什么有时候通过 disassembly 生成的 assembly 代码会在 ret 之后通过 nop 增加一些空格？
  
  assemble-code
14. sherlockliao 08 Nov 2021
  
  in Public
  
  Its main feature isthat it is in a more readable textual format, as compared to the binary format ofmachine code.
  
  assembly code 和 machine code 相比最大的区别是什么？
  
  assemble-code machine-code
Visit annotations in context

Tags

assembly-move

c

instruction set

load effective address

register

c++

unary operation

assemble-code

memory

memory location

local variable

memory reference

binary operation

assembly code

deference pointer

#assembly-code

pointer

operand

machine-code

Annotators

sherlockliao

URL

cs.sfu.ca/~ashriram/Courses/CS295/assets/books/CSAPP_2016.pdf
zhuanlan.zhihu.com zhuanlan.zhihu.com

C++类型转换之reinterpret_cast

1
1. sherlockliao 24 Nov 2021
  
  in Public
  
  reinterpret_cast 运算符并不会改变括号中运算对象的值，而是对该对象从位模式上进行重新解释
  
  reinterpret_cast 在 c++ 中如何理解？
  
  c++ typecasting
Visit annotations in context

Tags

typecasting

c++

Annotators

sherlockliao

URL

zhuanlan.zhihu.com/p/33040213
www.cnblogs.com www.cnblogs.com

C++ 名字空间namespace的使用 - 王陸 - 博客园

1
1. sherlockliao 19 Nov 2021
  
  in Public
  
  A namespace is a scope.C++ provides namespaces to prevent name conflicts.
  
  namespace 有什么作用？
  
  namespace cpp
Visit annotations in context

Tags

cpp

namespace

Annotators

sherlockliao

URL

cnblogs.com/wkfvawl/p/10500594.html
www.learncpp.com www.learncpp.com

6.17 — Unnamed and inline namespaces | Learn C++

1
1. sherlockliao 19 Nov 2021
  
  in Public
  
  But the other effect of unnamed namespaces is that all identifiers inside an unnamed namespace are treated as if they had internal linkage, which means that the content of an unnamed namespace can’t be seen outside of the file in which the unnamed namespace is defined.
  
  unnamed namespace 有什么作用？
  
  namespace cpp unnamed namespace internal linkage
Visit annotations in context

Tags

unnamed namespace

cpp

namespace

internal linkage

Annotators

sherlockliao

URL

learncpp.com/cpp-tutorial/unnamed-and-inline-namespaces/
www.learncpp.com www.learncpp.com

M.1 — Intro to smart pointers and move semantics – Learn C++

1
1. sherlockliao 12 Nov 2021
  
  in Public
  
  One of the best things about classes is that they contain destructors that automatically get executed when an object of the class goes out of scope. So if you allocate (or acquire) memory in your constructor, you can deallocate it in your destructor, and be guaranteed that the memory will be deallocated when the class object is destroyed (regardless of whether it goes out of scope, gets explicitly deleted, etc…).
  
  smart pointer 的原理是什么？
  
  cpp smart pointer pointer constructor destructor
Visit annotations in context

Tags

constructor

cpp

smart pointer

destructor

pointer

Annotators

sherlockliao

URL

learncpp.com/cpp-tutorial/introduction-to-smart-pointers-move-semantics/
lilianweng.github.io lilianweng.github.io

How to Train Really Large Models on Many GPUs?

5
1. sherlockliao 09 Nov 2021
  
  in Public
  
  Three techniques to avoid losing critical information at half-precision: Full-precision master copy of weights. Maintain a full precision (FP32) copy of model weights that accumulates gradients. The numbers are rounded up to half-precision for forward & backward passes. The motivation is that each gradient update (i.e. gradient times the learning rate) might be too small to be fully contained within the FP16 range (i.e. 2−242−242^{-24} becomes zero in FP16). Loss scaling. Scale up the loss to better handle gradients with small magnitudes (See Fig. 16). Scaling up the gradients helps shift them to occupy a larger section towards the right section (containing larger values) of the representable range, preserving values that are otherwise lost. Arithmetic precision. For common network arithmetic (e.g. vector dot-product, reduction by summing up vector elements), we can accumulate the partial results in FP32 and then save the final output as FP16 before saving into memory. Point-wise operations can be executed in either FP16 or FP32.
  
  混合精度中是通过哪些方式保证精度不会损失的？
  
  amp
2. sherlockliao 09 Nov 2021
  
  in Public
  
  two major memory consumption of large model training: The majority is occupied by model states, including optimizer states (e.g. Adam momentums and variances), gradients and parameters. Mixed-precision training demands a lot of memory since the optimizer needs to keep a copy of FP32 parameters and other optimizer states, besides the FP16 version. The remaining is consumed by activations, temporary buffers and unusable fragmented memory (named residual states in the paper).
  
  深度网络训练中的显存开销主要是哪些？
  
  GPU memory amp
3. sherlockliao 09 Nov 2021
  
  in Public
  
  It partitions optimizer state, gradients and parameters across multiple data parallel processes via a dynamic communication schedule to minimize the communication volume.
  
  ZeRO-DP 的原理是什么？
  
  ZeRO GPU memory Data Parallel
4. sherlockliao 09 Nov 2021
  
  in Public
  
  Asynchronous parallel (ASP): Every GPU worker processes the data asynchronously, no waiting or stalling. However, it can easily lead to stale weights being used and thus lower the statistical learning efficiency. Even though it increases the computation time, it may not speed up training time to convergence.
  
  ASP 是什么以及其优缺点？
  
  ASP Parallel
5. sherlockliao 09 Nov 2021
  
  in Public
  
  Bulk synchronous parallels (BSP): Workers sync data at the end of every minibatch. It prevents model weights staleness and good learning efficiency but each machine has to halt and wait for others to send gradients.
  
  BSP 是什么以及其优缺点？
  
  BSP Parallel
Visit annotations in context

Tags

GPU memory

Data Parallel

Parallel

ASP

BSP

ZeRO

amp

Annotators

sherlockliao

URL

lilianweng.github.io/lil-log/2021/09/24/train-large-neural-networks.html

sherlockliao

Annotations: 94

Joined: September 9, 2021

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL