Biological and Machine Intelligence

英文と和文では、語順がちがうので、日本語直訳で S|A|B|、は、V|C|D|。 となっている場合は、

の後に、、や。がついて、|、|。 となっているところから、折り返して

日本語的には、B A S は、D C V。 となります。


2016.9.17  更新2016.9.19


●The 21st century is a watershed in human evolution.  

We are solving the mystery of how the brain works and starting to build machines that work on the same principles as the brain

We see this time as the beginning of the era of machine intelligence, which will enable an explosion of beneficial applications and scientific advances

●Most people intuitively see the value in understanding how the human brain works

It is easy to see how brain theory could lead to the cure and prevention of mental disease or how it could lead to better methods for educating our children.  

These practical benefits justify the substantial efforts underway to reverse engineer the brain

However, the benefits go beyond the near-term and the practical

The human brain defines our species

In most aspects we are an unremarkable species, but our brain is unique. 

The large size of our brain, and its unique design, is the reason humans are the most successful species on our planet. 

Indeed, the human brain is the only thing we know of in the universe that can create and share knowledge

Our brains are capable of discovering the past, foreseeing the future, and unravelling the mysteries of the present

Therefore, if we want to understand who we are, if we want to expand our knowledge of the universe, and if we want to explore new frontiers, we need to have a clear understanding of how we know, how we learn, and how to build intelligent machines to help us acquire more knowledge.  

The ultimate promise of brain theory and machine intelligence is the acquisition and dissemination of new knowledge.  

Along the way there will be innumerous benefits to society.  

The beneficial impact of machine intelligence in our daily lives will equal and ultimately exceed that of programmable computers.

●But exactly how will intelligent machines work and what will they do?  

If you suggest to a lay person that the way to build intelligent machines is to first understand how the human brain works and then build machines that work on the same principles as the brain, they will typically say, “That makes sense”.  

However, if you suggest this same path to artificial intelligence ("AI") and machine learning scientists, many will disagree.  

The most common rejoinder you hear is “airplanes don’t flap their wings”, suggesting that it doesn’t matter how brains work, or worse, that studying the brain will lead you down the wrong path, like building a plane that flaps its wings.

●This analogy is both misleading and a misunderstanding of history.  

The Wright brothers and other successful pioneers of aviation understood the difference between the principles of flight and the need for propulsion. 

Bird wings and airplane wings work on the same aerodynamic principles, and those principles had to be understood before the Wright brothers could build an airplane. 

Indeed, they studied how birds glided and tested wing shapes in wind tunnels to learn the principles of lift.  

Wing flapping is different; it is a means of propulsion, and the specific method used for propulsion is less important when it comes to building flying machines

In an analogous fashion, we need to understand the principles of intelligence before we can build intelligent machines.  

Given that the only examples we have of intelligent systems are brains, and the principles of intelligence are not obvious, we must study brains to learn from them.  

However, like airplanes and birds, we don't need to do everything the brain does, nor do we need to implement the principles of intelligence in the same way as the brain.  

We have a vast array of resources in software and silicon to create intelligent machines in novel and exciting ways

The goal of building intelligent machines is not to replicate human behavior, nor to build a brain, nor to create machines to do what humans do. 

The goal of building intelligent machines is to create machines that work on the same principles as the brain — machines that are able to learn, discover, and adapt in ways that computers can’t and brains can.

●Consequently, the machine intelligence principles we describe in this book are derived from studying the brain. 

We use neuroscience terms to describe most of the principles, and we describe how these principles are implemented in the brain.  

The principles of intelligence can be understood by themselves, without referencing the brain, but for the foreseeable future it is easiest to understand these principles in the context of the brain because the brain continues to offer suggestions and constraints on the solutions to many open issues.

●This approach to machine intelligence is different than that taken by classic AI and artificial neural networks.  

AI technologists attempt to build intelligent machines by encoding rules and knowledge in software and human-designed data structures.  

This AI approach has had many successes solving specific problems but has not offered a generalized approach to machine intelligence and, for the most part, has not addressed the question of how machines can learn. 

Artificial neural networks (ANNs) are learning systems built using networks of simple processing elements.  

In recent years ANNs, often called “deep learning networks”, have succeeded in solving many classification problems. 

However, despite the word “neural”, most ANNs are based on neuron models and network architectures that are incompatible with real biological tissue. 

More importantly, ANNs, by deviating from known brain principles, don't provide an obvious path to building truly intelligent machines.

●Classic AI and ANNs generally are designed to solve specific types of problems rather than proposing a general theory of intelligence.  

In contrast, we know that brains use common principles for vision, hearing, touch, language, and behavior. 

This remarkable fact was first proposed in 1979 by Vernon Mountcastle. 

He said there is nothing visual about visual cortex and nothing auditory about auditory cortex. 

Every region of the neocortex performs the same basic operations. 

What makes the visual cortex visual is that it receives input from the eyes; what makes the auditory cortex auditory is that it receives input from the ears. 

From decades of neuroscience research, we now know this remarkable conjecture is true. 

Some of the consequences of this discovery are surprising. 

For example, neuroanatomy tells us that every region of the neocortex has both sensory and motor functions. 

Therefore, vision, hearing, and touch are integrated sensory-motor senses; 

we can’t build systems that see and hear like humans do without incorporating movement of the eyes, body, and limbs.

●The discovery that the neocortex uses common algorithms for everything it does is both elegant and fortuitous.

It tells us that to understand how the neocortex works, we must seek solutions that are universal in that they apply to every sensory modality and capability of the neocortex. 

To think of vision as a “vision problem” is misleading. 

Instead we should think about vision as a “sensory motor problem” and ask how vision is the same as hearing, touch or language. 

Once we understand the common cortical principles, we can apply them to any sensory and behavioral systems, even those that have no biological counterpart. 

The theory and methods described in this book were derived with this idea in mind. 

Whether we build a system that sees using light or a system that “sees” using radar or a system that directly senses GPS coordinates, the underlying learning methods and algorithms will be the same.

●Today we understand enough about how the neocortex works that we can build practical systems that solve valuable problems today. 

Of course, there are still many things we don’t understand about the brain and the neocortex. 

It is important to define our ignorance as clearly as possible so we have a roadmap of what needs to be done. 

This book reflects the state of our partial knowledge. 

The table of contents lists all the topics we anticipate we need to understand, but only some chapters have been written.  

Despite the many topics we don’t understand, we are confident that we have made enough progress in understanding some of the core principles of intelligence and how the brain works that the field of machine intelligence can move forward more rapidly than in the past. 

The field of machine intelligence is poised to make rapid progress. 

Hierarchical Temporal Memory  階層的時間記憶

●Hierarchical Temporal Memory, or HTM, is the name we use to describe the overall theory of how the neocortex functions.  
 階層的時間記憶 (HTM) は、私達が、大脳新皮質がいかに機能するかの全体理論を記述するために使う名前です。

It also is the name we use to describe the technology used in machines that work on neocortical principles.  

HTM is therefore a theoretical framework for both biological and machine intelligence. 

●The term HTM incorporates three prominent features of the neocortex.  
 用語 HTM は、大脳新皮質の三つの顕著な特徴を包含します。

First, it is best to think of the neocortex as a "memory" system. 
 最初に、HTM は、大脳新皮質を記憶システムとして考えるのに最適です。

The neocortex must learn the structure of the world from the sensory patterns that stream into the brain. 

Each neuron learns by forming connections, and each region of the neocortex is best understood as a type of memory system.  

Second, the memory in the neocortex is primarily a memory of time-changing, or "temporal", patterns.  

The inputs and outputs of the neocortex are constantly in motion, usually changing completely several times a second.  

Each region of the neocortex learns a time-based model of its inputs, it learns to predict the changing input stream, and it learns to play back sequences of motor commands.  

And finally, the regions of the neocortex are connected in a logical "hierarchy". 

Because all the regions of the neocortex perform the same basic memory operations, the detailed understanding of one neocortical region leads us to understand how the rest of the neocortex works. 

These three principles,  “hierarchy”, “temporal” patterns, and “memory”, are not the only essential principles of an intelligent system, but they suffice as a moniker to represent the overall approach.

●Although HTM is a biologically constrained theory, and is perhaps the most biologically realistic theory of how the neocortex works, it does not attempt to include all biological details. 
 HTM は、生物学的に制限された理論で、大脳新皮質がいかに働くかについての生物学的に最もリアリスティックな理論ですが、生物学的なすべての詳細を含むことを試みているわけではありません。

For example, the biological neocortex exhibits several types of rhythmic behavior in the firing of ensembles of neurons. 

There is no doubt that these rhythms are essential for biological brains. 

But HTM theory does not include these rhythms because we don’t believe they play an information-theoretic role. 
しかし、HTM 理論は、これらのリズムは含みません。それらが情報理論的な役割を果たすとは思えないからです。

Our best guess is that these rhythms are needed in biological brains to synchronize action potentials, 

but we don’t have this issue in software and hardware implementations of HTM.  
しかし、私達は、HTM のソフトウェア的、ハードウェア的実装においてこの問題を持ちません。

If in the future we find that rhythms are essential for intelligence, and not just biological brains, then we would modify HTM theory to include them. 

There are many biological details that similarly are not part of HTM theory.  
同様に HTM 理論の部分になっていない多くの生物学的な詳細事項があります。

Every feature included in HTM is there because we have an information-theoretical need that is met by that feature.

●HTM also is not a theory of an entire brain; it only covers the neocortex and its interactions with some closely related structures such as the thalamus and hippocampus. 
 HTM はまた、脳全体の理論ではありません; それは、カバーするだけです|大脳新皮質と、{その相互作用|いくつかの密接に関係した構造物(視床や海馬のような)との|}|。

The neocortex is where most of what we think of as intelligence resides but it is not in charge of emotions, homeostasis, and basic behaviors. 

Other, evolutionarily older, parts of the brain perform these functions

These older parts of the brain have been under evolutionary pressure for much longer time, and although they consist of neurons, they are heterogeneous in architecture and function. 

We are not interested in emulating entire brains or in making machines that are human-like, with human-like emotions and desires. 

Therefore intelligent machines, as we define them, are not likely to pass the Turing test or be like the humanoid robots seen in science fiction. 

This distinction does not suggest that intelligent machines will be of limited utility. 

Many will be simple, tirelessly sifting through vast amounts of data looking for unusual patterns. 

Others will be fantastically fast and smart, able to explore domains that humans are not well suited for. 

The variety we will see in intelligent machines will be similar to the variety we see in programmable computers. 

Some computers are tiny and embedded in cars and appliances, and others occupy entire buildings or are distributed across continents. 

Intelligent machines will have a similar diversity of size, speed, and applications, but instead of being programmed they will learn.

●HTM theory cannot be expressed succinctly in one or a few mathematical equations.  
 HTM 理論は、一行もしくは数行の数式で、簡潔に表現できるものではありません。

HTM is a set of principles that work together to produce perception and behavior.  
HTM 理論は、です|一組の原理|一緒に働いて知覚や行動を産む|。

In this regard, HTMs are like computers. 
この観点から、HTM 理論は、コンピュータのようなものです。

Computers can’t be described purely mathematically.  

We can understand how they work, we can simulate them, and subsets of computer science can be described in formal mathematics, but ultimately we have to build them and test them empirically to characterize their performance.  

Similarly, some parts of HTM theory can be analyzed mathematically. 
同様に、HTM 理論の或る部分は、数学的に解析できます。

For example, the chapter in this book on sparse distributed representations is mostly about the mathematical properties of sparse representations. 

But other parts of the HTM theory are less amenable to formalism. 
しかし、HTM 理論の他の部分は、形式主義にそれほど従順ではありません。

If you are looking for a succinct mathematical expression of intelligence, you won’t find it. 

In this way, brain theory is more like genetic theory and less like physics.

What is Intelligence?  知能とは何か

●Historically, intelligence has been defined in behavioral terms.  

For example, if a system can play chess, or drive a car, or answer questions from a human, then it is exhibiting intelligence.  

The Turing Test is the most famous example of this line of thinking.  

We believe this approach to defining intelligence fails on two accounts.  

First, there are many examples of intelligence in the biological world that differ from human intelligence and would fail most behavioral tests.  

For example, dolphins, monkeys, and humans are all intelligent, yet only one of these species can play chess or drive a car.  

Similarly, intelligent machines will span a range of capabilities from mouse-like to super-human and, more importantly, we will apply intelligent machines to problems that have no counterpart in the biological world.  

Focusing on human-like performance is limiting. 

●The second reason we reject behavior-based definitions of intelligence is that they don’t capture the incredible flexibility of the neocortex.  

The neocortex uses the same algorithms for all that it does, giving it flexibility that has enabled humans to be so successful. 

Humans can learn to perform a vast number of tasks that have no evolutionary precedent because our brains use learning algorithms that can be applied to almost any task. 

The way the neocortex sees is the same as the way it hears or feels.  

In humans, this universal method creates language, science, engineering, and art. 

When we define intelligence as solving specific tasks, such as playing chess, we tend to create solutions that also are specific. 

The program that can win a chess game cannot learn to drive.  

It is the flexibility of biological intelligence that we need to understand and embed in our intelligent machines, not the ability to solve a particular task. 

Another benefit of focusing on flexibility is network effects.

The neocortex may not always be best at solving any particular problem, but it is very good at solving a huge array of problems. 

Software engineers, hardware engineers, and application engineers naturally gravitate towards the most universal solutions. 

As more investment is focused on universal solutions, they will advance faster and get better relative to other more dedicated methods. 

Network effects have fostered adoption many times in the technology world; this dynamic will unfold in the field of machine intelligence, too.

●Therefore we define the intelligence of a system by the degree to which it exhibits flexibility: flexibility in learning and flexibility in behavior.  
 それ故、私達は、定義します|あるシステムの知能を|それが柔軟性 (学習の柔軟性と行動の柔軟性) を示す程度によって|

Since the neocortex is the most flexible learning system we know of, we measure the intelligence of a system by how many of the neocortical principles that system includes.  

This book is an attempt to enumerate and understand these neocortical principles.  

Any system that includes all the principles we cover in this book will exhibit cortical-like flexibility, and therefore cortical-like intelligence.  

By making systems larger or smaller and by applying them to different sensors and embodiments, we can create intelligent machines of incredible variety.  

Many of these systems will be much smaller than a human neocortex and some will be much larger in terms of memory size, but they will all be intelligent.

About this Book  本書について

●The structure of this book may be different from those you have read in the past. 

First, it is a “living book”. 

We are releasing chapters as they are written, covering first the aspects of the theory that are best understood.

Some chapters may be published in draft form, whereas others will be more polished. 

For the foreseeable future this book will be a work in progress. 

We have a table of contents for the entire book, but even this will change as research progresses.

●Second, the book is intended for a technical but diverse audience. 

Neuroscientists should find the book helpful as it provides a theoretical framework to interpret many biological details and guide experiments. 

Computer scientists can use the material in the book to develop machine intelligence hardware, software, and applications based on neuroscience principles. 

Anyone with a deep interest in how brains work or machine intelligence will hopefully find the book to be the best source for these topics. 

Finally, we hope that academics and students will find this material to be a comprehensive introduction to an emerging and important field that offers opportunities for future research and study.
最後に、アカデミー(大学研究機関 )の研究者や学生さんにも、この本の材料が、包括的な導入になることを期待します|将来の研究や学習の機会を与えてくれる今出現中の重要な分野の|。

●The structure of the chapters in this book varies depending on the topic. 

Some chapters are overview in nature. 

Some chapters include mathematical formulations and problem sets to exercise the reader’s knowledge. 

Some chapters include pseudo-code. 

Key citations will be noted, but we do not attempt to have a comprehensive set of citations to all work done in the field. 

As such, we gratefully acknowledge the many pioneers whose work we have built upon who are not explicitly mentioned.

●We are now ready to jump into the details of biological and machine intelligence.


2016.9.22  更新2016.9.24

Hierarchical Temporal Memory: Overview  階層的時間記憶: 概観

●In the September 1979 issue of Scientific American, Nobel Prize-winning scientist Francis Crick wrote about the state of neuroscience. 
 Scientific American の1979年9月号に、ノーベル賞受賞科学者のフランシス・クリックが、神経科学の現状について書きました。

He opined that despite the great wealth of factual knowledge about the brain we had little understanding of how it actually worked. 

His exact words were, “What is conspicuously lacking is a broad framework of ideas within which to interpret all these different approaches” (Crick, 1979). 

Hierarchical Temporal Memory (HTM) is, we believe, the broad framework sought after by Dr. Crick. 

More specifically, HTM is a theoretical framework for how the neocortex works and how the neocortex relates to the rest of the brain to create intelligence. 
もっといいますと、HTM 理論は、です|理論的な枠組み|大脳新皮質がいかに働き、大脳新皮質がいかに脳の残りの部分と関係して知能を生み出すのかの|。

HTM is a theory of the neocortex and a few related brain structures; it is not an attempt to model or understand every part of the human brain. 
HTM は、新皮質といくっかの関係する脳構造の理論です;人間の脳のすべての部分をモデル化し理解する試みではありません。

The neocortex comprises about 75% of the volume of the human brain, and it is the seat of most of what we think of as intelligence. 

It is what makes our species unique.

●HTM is a biological theory, meaning it is derived from neuroanatomy and neurophysiology and explains how the biological neocortex works. 
 HTM は、生物学的理論です。そのことは、神経解剖学や神経生理学から由来し、生物学的な新皮質がいかに働くかを説明することを意味します。

We sometimes say HTM theory is “biologically constrained,” as opposed to “biologically inspired,” which is a term often used in machine learning. 
私達は、時々、HTM 理論は「生物学的制約を受けている」と言います。「生物学的に触発されて」と対抗して。これは、機械学習でよく使われる用語です。

The biological details of the neocortex must be compatible with the HTM theory, and the theory can’t rely on principles that can’t possibly be implemented in biological tissue. 
新皮質の生物学的詳細は、HTM 理論と両立しなければなりません。そして、HTM 理論は、生物学的組織では実行できそうもない原理に依拠することはできません。

For example, consider the pyramidal neuron, the most common type of neuron in the neocortex. 

Pyramidal neurons have tree-like extensions called dendrites that connect via thousands of synapses. 

Neuroscientists know that the dendrites are active processing units, and that communication through the synapses is a dynamic, inherently stochastic process (Poirazi and Mel, 2001). 

The pyramidal neuron is the core information processing element of the neocortex, and synapses are the substrate of memory. 
錐体ニューロンは、新皮質の中核的情報処理要素であり、シナプスは、記憶の substrate (基質、低質) です。

Therefore, to understand how the neocortex works we need a theory that accommodates the essential features of neurons and synapses. 

Artificial Neural Networks (ANNs) usually model neurons with no dendrites and few highly precise synapses, features which are incompatible with real neurons. 
人工ニューラルネット (ANN) は、通常、デンドライト(樹状突起)がなく、高度に正確なシナプスを殆どもたないニューロンをモデル化します。これは、実際のニューロンとは両立しません。

This type of artificial neuron can’t be reconciled with biological neurons and is therefore unlikely to lead to networks that work on the same principles as the brain. 

This observation doesn’t mean ANNs aren’t useful, only that they don’t work on the same principles as biological neural networks. 

As you will see, HTM theory explains why neurons have thousands of synapses and active dendrites. 

We believe these and many other biological features are essential for an intelligent system and can’t be ignored.

Figure 1 Biological and artificial neurons.   図1 生物学的ニューロンと、人工ニューロン

●Figure 1a shows an artificial neuron typically used in machine learning and artificial neural networks. 

Often called a “point neuron” this form of artificial neuron has relatively few synapses and no dendrites.

Learning in a point neuron occurs by changing the “weight” of the synapses which are represented by a scalar value that can take a positive or negative value. 

A point neuron calculates a weighted sum of its inputs which is applied to a non-linear function to determine the output value of the neuron. 

Figure 1b shows a pyramidal neuron which is the most common type of neuron in the neocortex. 

Biological neurons have thousands of synapses arranged along dendrites. 

Dendrites are active processing elements allowing the neuron to recognize hundreds of unique patterns. 

Biological synapses are partly stochastic and therefore are low precision. 

Learning in a biological neuron is mostly due to the formation of new synapses and the removal of unused synapses. 

Pyramidal neurons have multiple synaptic integration zones that receive input from different sources and have differing effects on the cell. 

Figure 1c shows an HTM artificial neuron. 
図1c は、HTM人工ニューロンを示します。

Similar to a pyramidal neuron it has thousands of synapses arranged on active dendrites. 

It recognizes hundreds of patterns in multiple integration zones. 

The HTM neuron uses binary synapses and learns by modeling the growth of new synapses and the decay of unused synapses.

HTM neurons don’t attempt to model all aspects of biological neurons, only those that are essential for information theoretic aspects of the neocortex.

●Although we want to understand how biological brains work, we don’t have to adhere to all the biological details. 

Once we understand how real neurons work and how biological networks of neurons lead to memory and behavior, we might decide to implement them in software or in hardware in a way that differs from the biology in detail, but not in principle. 

But we shouldn’t do that before we understand how biological neural systems work. 

People often ask, “How do you know which biological details matter and which don’t?” 

The answer is: once you know how the biology works, you can decide which biological details to include in your models and which to leave out, but you will know what you are giving up, if anything, if your software model leaves out a particular biological feature. 

Human brains are just one implementation of intelligence; yet today, humans are the only things everyone agrees are intelligent. 
人間の脳は、知能の一つの実装にすぎません; それでも、今日、人間は、知能的であるとみんなが認める唯一のものです。

Our challenge is to separate aspects of brains and neurons that are essential for intelligence from those aspects that are artifacts of the brain’s particular implementation of intelligence principles. 

Our goal isn’t to recreate a brain, but to understand how brains work in sufficient detail so that we can test the theory biologically and also build systems that, although not identical to brains, work on the same principles.

●Sometime in the future designers of intelligent machines may not care about brains and the details of how brains implement the principles of intelligence. 

The field of machine intelligence may by then be so advanced that it has departed from its biological origin. 

But we aren’t there yet. 

Today we still have much to learn from biological brains and therefore to understand HTM principles, to advance HTM theory, and to build intelligent machines, it is necessary to know neuroscience terms and the basics of the brain’s design.

●Bear in mind that HTM is an evolving theory. 

We are not done with a complete theory of neocortex, as will become obvious in the remainder of this chapter and the rest of the book. 

There are entire sections yet to be written, and some of what is written will be modified as new knowledge is acquired and integrated into the theory. 

The good news is that although we have a long way to go for a complete theory of neocortex and intelligence, we have made significant progress on the fundamental aspects of the theory. 

The theory includes the representation format used by the neocortex (“sparse distributed representations” or SDRs), the mathematical and semantic operations that are enabled by this representation format, and how neurons throughout the neocortex learn sequences and make predictions, which is the core component of all inference and behavior. 
理論は、含みます|新皮質が使用する表現のフォーマット (疎分布の表現)、この表現フォーマットによって可能となる数学的・意味論的操作、そして、新皮質中のニューロンが、いかにしてシークエンスを学習し予測を行うか (これは、すべての推測や行動の中核的成分です) を|。

We also understand how knowledge is stored by the formation of sets of new synapses on the dendrites of neurons. 

These are basic elements of biological intelligence analogous to how random access memory, busses, and instruction sets are basic elements of computers. 

Once you understand these basic elements, you can combine them in different ways to create full systems.

●The remainder of this chapter introduces the key concepts of Hierarchical Temporal Memory. 

We will describe an aspect of the neocortex and then relate that biological feature to one or more principles of HTM theory. 

In-depth descriptions and technical details of the HTM principles are provided in subsequent chapters.

Biological Observation: The Structure of the Neocortex 生物学的観察:新皮質の構造

●The human brain comprises several components such as the brain stem, the basal ganglia, and the cerebellum.

These organs are loosely stacked on top of the spinal cord. 

The neocortex is just one more component of the brain, but it dominates the human brain, occupying about 75% of its volume. 
新皮質は、大脳のもう一つの成分ですが、人間の脳で卓越し、体積で約 75% を占めます。

The evolutionary history of the brain is reflected in its overall design. 

Simple animals, such as worms, have the equivalent of the spinal cord and nothing else. 

The spinal cord of a worm and a human receives sensory input and generates useful, albeit simple, behaviors. 

Over evolutionary time scales, new brain structures were added such as the brainstem and basal ganglia. 

Each addition did not replace what was there before. 

Typically the new brain structure received input from the older parts of the brain, and the new structure’s output led to new behaviors by controlling the older brain regions. 

The addition of each new brain structure had to incrementally improve the animal’s behaviors. 

Because of this evolutionary path, the entire brain is physically and logically a hierarchy of brain regions.

Figure 2 a) real brain b) logical hierarchy (placeholder) 図2

●The neocortex is the most recent addition to our brain. 

All mammals, and only mammals, have a neocortex. 

The neocortex first appeared about 200 million years ago in the early small mammals that emerged from their reptilian ancestors during the transition of the Triassic/Jurassic periods. 

The modern human neocortex separated from those of monkeys in terms of size and complexity about 25 million years ago (Rakic, 2009). 

The human neocortex continued to evolve to be bigger and bigger, reaching its current size in humans between 800,000 and 200,000 years ago. 

In humans, the neocortex is a sheet of neural tissue about the size of a large dinner napkin (1,000 square centimeters) in area and 2.5mm thick. 

It lies just under the skull and wraps around the other parts of the brain. 

(From here on, the word “neocortex” will refer to the human neocortex. 

References to the neocortex of other mammals will be explicit.) 

The neocortex is heavily folded to fit in the skull but this isn’t important to how it works, so we will always refer to it and illustrate it as a flat sheet. 

The human neocortex is large both in absolute terms and also relative to the size of our body compared to other mammals. 

We are an intelligent species mostly because of the size of our neocortex.

●The most remarkable aspect of the neocortex is its homogeneity. 

The types of cells and their patterns of connectivity are nearly identical no matter what part of the neocortex you look at. 

This fact is largely true across species as well. 

Sections of human, rat, and monkey neocortex look remarkably the same. 

The primary difference between the neocortex of different animals is the size of the neocortical sheet. 

Many pieces of evidence suggest that the human neocortex got large by replicating a basic element over and over. 

This observation led to the 1978 conjecture by Vernon Mountcastle that every part of the neocortex must be doing the same thing. 
この観測は、導きました|1978年の憶測を|Vernon Mountcastleによる|新皮質のすべての部分は同じ事をしているに違いないという|。

So even though some parts of the neocortex process vision, some process hearing, and other parts create language, at a fundamental level these are all variations of the same problem, and are solved by the same neural algorithms. 

Mountcastle argued that the vision regions of the neocortex are vision regions because they receive input from the eyes and not because they have special vision neurons or vision algorithms (Mountcastle, 1978). 

This discovery is incredibly important and is supported by multiple lines of evidence.

●Even though the neocortex is largely homogenous, some neuroscientists are quick to point out the differences between neocortical regions. 

One region may have more of a certain cell type, another region has extra layers, and other regions may exhibit variations in connectivity patterns. 

But there is no question that neocortical regions are remarkably similar and that the variations are relatively minor. 

The debate is only about how critical the variations are in terms of functionality.

●The neocortical sheet is divided into dozens of regions situated next to each other. 

Looking at a neocortex you would not see any regions or demarcations. 

The regions are defined by connectivity. 

Regions pass information to each other by sending bundles of nerve fibers into the white matter just below the neocortex. 

The nerve fibers reenter at another neocortical region. 

The connections between regions define a logical hierarchy.

Information from a sensory organ is processed by one region, which passes its output to another region, which passes its output to yet another region, etc. 

The number of regions and their connectivity is determined by our genes and is the same for all members of a species. 

So, as far as we know, the hierarchical organization of each human’s neocortex is the same, but our hierarchy differs from the hierarchy of a dog or a whale. 

The actual hierarchy for some species has been mapped in detail (Zingg, 2014). 

They are complicated, not like a simple flow chart. 

There are parallel paths up the hierarchy and information often skips levels and goes sideways between parallel paths. 

Despite this complexity the hierarchical structure of the neocortex is well established.

●We can now see the big picture of how the brain is organized. 

The entire brain is a hierarchy of brain regions, where each region interacts with a fully functional stack of evolutionarily older regions below it. 

For most of evolutionary history new brain regions, such as the spinal cord, brain stem, and basal ganglia, were heterogeneous, adding capabilities that were specific to particular senses and behaviors. 
進化の歴史の大部分において、新しい脳の領域 (脊髄、脳幹、基底核など) は、異質で、個々の感覚や行動に固有な能力を付加しました。

This evolutionary process was slow. 

Starting with mammals, evolution discovered a way to extend the brain’s hierarchy using new brain regions with a homogenous design, an algorithm that works with any type of sensor data and any type of behavior. 

This replication is the beauty of the neocortex. 

Once the universal neocortical algorithms were established, evolution could extend the brain’s hierarchy rapidly because it only had to replicate an existing structure. 

This explains how human brains evolved to become large so quickly.

Figure 3 a) brain with information flowing posterior to anterior b) logical hierarchical stack showing old brain regions and neocortical regions (placeholder) 図3

●Sensory information enters the human neocortex in regions that are in the rear and side of the head. 

As information moves up the hierarchy it eventually passes into regions in the front half of the neocortex. 

Some of the regions at the very top of the neocortical hierarchy, in the frontal lobes and also the hippocampus, have unique properties such as the ability for short term memory, which allows you to keep a phone number in your head for a few minutes. 

These regions also exhibit more heterogeneity, and some of them are older than the neocortex. 

The neocortex in some sense was inserted near the top of the old brain’s hierarchical stack.

Therefore as we develop HTM theory, we first try to understand the homogenous regions that are near the bottom of the neocortical hierarchy. 
それ故、私達が、HTM 理論を発展させるにつれ、私達はまず試みます|新皮質の階層性の底部近くにある均質な領域を理解することを|。

In other words, we first need to understand how the neocortex builds a basic model of the world from sensory data and how it generates basic behaviors.

HTM Principle: Common Algorithms HTM原理: 共通アルゴリズム

●HTM theory focuses on the common properties across the neocortex. 

We strive not to understand vision or hearing or robotics as separate problems, but to understand how these capabilities are fundamentally all the same, and what set of algorithms can see AND hear AND generate behavior. 
私達は、努めます|視覚と聴覚とロボット工学を別個の問題とは考えないように|。しかし、努めます|いかにこれらの能力が根本的に同じで、どのセットのアルゴリズムが、見て AND 聴いて AND 行動を生むことができるのかを理解しようと|。

Initially, this general approach makes our task seem harder, but ultimately it is liberating. 

When we describe or study a particular HTM learning algorithm we often will start with a particular problem, such as vision, to understand or test the algorithm. 
私達が個々のHTM 学習アルゴリズムを記述し学習するときに、いばしば、個々の問題 (例えば、視覚) で始めて、アルゴリズムを理解し試験します。

But we then ask how the exact same algorithm would work for a different problem such as understanding language. 

This process leads to realizations that might not at first be obvious, such as vision being a primarily temporal inference problem, meaning the temporal order of patterns coming from the retina is as important in vision as is the temporal order of words in language. 

Once we understand the common algorithms of the neocortex, we can ask how evolution might have tweaked these algorithms to achieve even better performance on a particular problem. 

But our focus is to first understand the common algorithms that are manifest in all neocortical regions.

HTM Principle: Hierarchy HTM原理:階層性

●Every neocortex, from a mouse to a human, has a hierarchy of regions, although the number of levels and number of regions in the hierarchy varies. 

It is clear that hierarchy is essential to form high-level percepts of the world from low-level sensory arrays such as the retina or cochlea. 

As its name implies, HTM theory incorporates the concept of hierarchy. 
名前が暗示するように、HTM 理論は、階層性の概念を包含します。

Because each region is performing the same set of memory and algorithmic functions, the capabilities exhibited by the entire neocortical hierarchy have to be present in each region. 

Thus if we can understand how a single region works and how that region interacts with its hierarchical neighbors, then we can build hierarchical models of indefinite complexity and apply them to any type of sensory/motor system. 

Consequently most of current HTM theory focuses on how a single neocortical region works and how two regions work together.

Biological Observation: Neurons are Sparsely Activated
生物学的観察: ニューロンは、疎に活性化する

●The neocortex is made up of neurons. 

No one knows exactly how many neurons are in a human neocortex, but recent “primate scale up” methods put the estimate at 86 billion (Herculano-Houzel, 2012). 

The moment-to-moment state of the neocortex, some of which defines our perceptions and thoughts, is determined by which neurons are active at any point in time. 
刻一刻状態の新皮質 (そのいくつかは、我々の感覚や思考を定義します) は、どのニューロンが任意の時点でアクティブかによって決定されます。

An active neuron is one that is generating spikes, or action potentials.

One of the most remarkable observations about the neocortex is that no matter where you look, the activity of neurons is sparse, meaning only a small percentage of them are rapidly spiking at any point in time. 

The sparsity might vary from less than one percent to several percent, but it is always sparse.

HTM Principle: Sparse Distributed Representations HTM原理:疎分布表現

●The representations used in HTM theory are called Sparse Distributed Representations, or SDRs. 
 HTM 理論で使われる表現は、疎分布表現 (SDR) と呼ばれています。

SDRs are vectors with thousands of bits. 

At any point in time a small percentage of the bits are 1’s and the rest are 0’s.

HTM theory explains why it is important that there are always a minimum number of 1’s distributed in the SDR, and also why the percentage of 1’s must be low, typically less than 2%. 
HTM 理論は、説明します|いつも最少数の"1"がSDRに分布していることが何故重要なのか、そして、何故1のパーセントが小さくなければならないか (典型的には2%以下)を|。

The bits in an SDR correspond to the neurons in the neocortex.

●SDRs have some essential and surprising properties. 

For comparison, consider the representations used in programmable computers. 

The meaning of a word in a computer is not inherent in the word itself. 

If you were shown 64 bits from a location in a computer’s memory you couldn’t say anything about what it represents. 

At one moment in the execution of the program the bits could represent one thing and at another moment they might represent something else, and in either case the meaning of the 64 bits can only be known by relying on knowledge not contained in the physical location of the bits themselves. 

With SDRs, the bits of the representation encode the semantic properties of the representation; the representation and its meaning are one and the same. 
SDR で、表現のビットは、表現の意味論的特性をエンコードします。表現とその意味は、一つでかつ同じです。

Two SDRs that have 1 bits in the same location share a semantic property. 
同じ位置に1 ビットを持つ二つのSDRは、意味論的特性を共有します。

The more 1 bits two SDRs share, the more semantically similar are the two representations. 
二つのSDR が、もっと1 ビットを共有すれば、二つのSDR は、意味論的にさらに同様です。

The SDR explains how brains make semantic generalizations; it is an inherent property of the representation method. 
SDR は、いかにして脳が意味論的一般化を行うのかを説明します。それは、表現法の生来の特性です。

Another example of a unique capability of sparse representations is that a set of neurons can simultaneously activate multiple representations without confusion. 

It is as if a location in computer memory could hold not just one value but twenty simultaneous values and not get confused! 

We call this unique characteristic the “union property” and it is used throughout HTM theory for such things as making multiple predictions at the same time.

●The use of sparse distributed representations is a key component of HTM theory. 
 疎分布表現の使用は、HTM 理論の主要成分です。

We believe that all truly intelligent systems must use sparse distributed representations. 

To become facile with HTM theory, you will need to develop an intuitive sense for the mathematical and representational properties of SDRs.
HTM 理論に堪能になるためには、あなたは、SDRの数学的・表現的特性についての直観を開発する必要があります。

Biological Observation: The Inputs and Outputs of the Neocortex
生物学的観察: 新皮質の入力と出力

●As mentioned earlier, the neocortex appeared recently in evolutionary time. 

The other parts of the brain existed before the neocortex appeared. 

You can think of a human brain as consisting of a reptile brain (the old stuff) with a neocortex (literally “new layer”) attached on top of it. 
人間の脳は、爬虫類の脳 (古物) とその上にくっつけられた新皮質 (文字通り「新しい層」) から成ると考えることが出来ます。

The older parts of the brain still have the ability to sense the environment and to act. 

We humans still have a reptile inside of us. 

The neocortex is not a stand-alone system, it learns how to interact with and control the older brain areas to create novel and improved behaviors.

●There are two basic inputs to the neocortex. One is data from the senses. 

As a general rule, sensory data is processed first in the sensory organs such as the retina, cochlea, and sensory cells in the skin and joints. 

It then goes to older brain regions that further process it and control basic behaviors. 

Somewhere along this path the neurons split their axons in two and send one branch to the neocortex. 

The sensory input to the neocortex is literally a copy of the sensory data that the old brain is getting.

●The second input to the neocortex is a copy of motor commands being executed by the old parts of the brain.

For example, walking is partially controlled by neurons in the brain stem and spinal cord. 

These neurons also split their axons in two, one branch generates behavior in the old brain and the other goes to the neocortex. 

Another example is eye movements, which are controlled by an older brain structure called the superior colliculus. 

The axons of superior colliculus neurons send a copy of their activity to the neocortex, letting the neocortex know what movement is about to happen. 

This motor integration is a nearly universal property of the brain. 

The neocortex is told what behaviors the rest of the brain is generating as well as what the sensors are sensing. 

Imagine what would happen if the neocortex wasn’t informed that the body was moving in some way. 

If the neocortex didn’t know the eyes were about to move, and how, then a change of pattern on the optic nerve would be perceived as the world moving. 

The fact that our perception is stable while the eyes move tells us the neocortex is relying on knowledge of eye movements. 

When you touch, hear, or see something, the neocortex needs to distinguish changes in sensation caused by your own movement from changes caused by movements in the world. 

The majority of changes on your sensors are the result of your own movements. 

This “sensory-motor” integration is the foundation of how most learning occurs in the neocortex. 

The neocortex uses behavior to learn the structure of the world.

Figure 4 showing sensory & motor command inputs to the neocortex (block diagram) (placeholder) 図4

●No matter what the sensory data represents - light, sound, touch or behavior - the patterns sent to the neocortex are constantly changing. 
 感覚データが何を表現しようとも -光・音・触覚・行動- 新皮質に送られるパターンは、常時変化しています。

The flowing nature of sensory data is perhaps most obvious with sound, but the eyes move several times a second, and to feel something we must move our fingers over objects and surfaces. 

Irrespective of sensory modality, input to the neocortex is like a movie, not a still image. 

The input patterns completely change typically several times a second. 

The changes in input are not something the neocortex has to work around, or ignore; instead, they are essential to how the neocortex works. 

The neocortex is memory of time-based patterns.

●The primary outputs of the neocortex come from neurons that generate behavior. 

However, the neocortex never controls muscles directly; instead the neocortex sends its axons to the old brain regions that actually generate behavior. 

Thus the neocortex tries to control the old brain regions that in turn control muscles. 

For example, consider the simple behavior of breathing. 

Most of the time breathing is controlled completely by the brain stem, but the neocortex can learn to control the brain stem and therefore exhibit some control of breathing when desired.

●A region of neocortex doesn’t “know” what its inputs represent or what its output might do. 

A region doesn’t even “know” where it is in the hierarchy of neocortical regions. 

A region accepts a stream of sensory data plus a stream of motor commands. 

From these inputs it learns of the changes in the inputs. 

The region will output a stream of motor commands, but it only knows how its output changes its input. 

The outputs of the neocortex are not pre-wired to do anything. 

The neocortex has to learn how to control behavior via associative linking.

HTM Principle: Sensory Encoders

●Every HTM system needs the equivalent of sensory organs. 

We call these “encoders.” 

An encoder takes some type of data –it could be a number, time, temperature, image, or GPS location – and turns it into a sparse distributed representation that can be digested by the HTM learning algorithms. 

Encoders are designed for specific data types, and often there are multiple ways an encoder can convert an input to an SDR, in the same way that there are variations of retinas in mammals. 

The HTM learning algorithms will work with any kind of sensory data as long as it is encoded into proper SDRs.
HTM 学習アルゴリズムは、任意の種類の感覚データで働きます。そのデータが、適切なSDRにエンコードされる限りにおいて。

●One of the exciting aspects of machine intelligence based on HTM theory is that we can create encoders for data types that have no biological counterpart. 
 HTM 理論に基づいた機械知能のすばらしい側面の一つは、生物学的に対応するものがないデータ型についてもエンコーダーを造ることができることです。

For example, we have created an encoder that accepts GPS coordinates and converts them to SDRs. 

This encoder allows an HTM system to directly sense movement through space. 

The HTM system can then classify movements, make predictions of future locations, and detect anomalies in movements. 
HTM システムは、運動を分類し、未来の位置を予測し、運動の異常を見地します。

The ability to use non-human senses offers a hint of where intelligent machines might go. 

Instead of intelligent machines just being better at what humans do, they will be applied to problems where humans are poorly equipped to sense and to act.

HTM Principle: HTM Systems are Embedded Within Sensory-motor Systems
HTM 原理: HTM システムは、感覚-運動システムの中に埋め込まれている

●To create an intelligent system, the HTM learning algorithms need both sensory encoders and some type of behavioral framework. 

You might say that the HTM learning algorithms need a body. 

But the behaviors of the system do not need to be anything like the behaviors of a human or robot. 

Fundamentally, behavior is a means of moving a sensor to sample a different part of the world. 

For example, the behavior of an HTM system could be traversing links on the world-wide web or exploring files on a server.
例えば、HTM システムの行動は、

●It is possible to create HTM systems without behavior. 
 HTM システムを、行動無しに造ることは可能です。

If the sensory data naturally changes over time, then an HTM system can learn the patterns in the data, classify the patterns, detect anomalies, and make predictions of future values. 
もし感覚データが、時間につれて自然に変化すると、HTM システムは、データのパターンを学習し、パターンを分類し、異常を検出し、将来値を予測することができます。

The early work on HTM theory focuses on these kinds of problems, without a behavioral component. 
HTM 理論の初期の仕事は、この種の問題に集中し、行動の成分は持ちませんでした。

Ultimately, to realize the full potential of the HTM theory, behavior needs to be incorporated fully.
究極的に、HTM 理論の全ポテンシャルを実現するためには、行動を完全に組み込むことが必要です。

HTM Principle: HTM Relies On Streaming Data and Sequence Memory
HTM原理: HTMはストリーミング・データとシークェンス記憶を頼る

●The HTM learning algorithms are designed to work with sensor and motor data that is constantly changing.

Sensor input data may be changing naturally such as metrics from a server or the sounds of someone speaking.

Alternately the input data may be changing because the sensor itself is moving such as moving the eyes while looking at a still picture. 

At the heart of HTM theory is a learning algorithm called Temporal Memory, or TM. 
HTM 理論の心には、時間記憶、すなわち、TM と呼ばれる学習アルゴリズムがあります。

As its name implies, Temporal Memory is a memory of sequences, it is a memory of transitions in a data stream. 

TM is used in both sensory inference and motor generation. 
TM は、感覚の推測にも、運動の生成にも、使われます。

HTM theory postulates that every excitatory neuron in the neocortex is learning transitions of patterns and that the majority of synapses on every neuron are dedicated to learning these transitions. 
HTM 理論は、仮定します|新皮質の興奮性ニューロンは、すべてパターンの推移を学習し、すべてのニューロンのシナプスの大多数はこれらの推移の学習に専念していることを|。

Temporal Memory is therefore the substrate upon which all neocortical functions are built. 

TM is probably the biggest difference between HTM theory and most other artificial neural network theories. 
TM は、HTM 理論と、他の殆どの人工ニューラル・ネットの理論の間での最大の違いです。

HTM starts with the assumption that everything the neocortex does is based on memory and recall of sequences of patterns.
HTM は、新皮質がなすすべてのことは、パターンのシークェンスの記憶と思い出しに基づくという仮定から出発します。

HTM Principle: On-line Learning HTM理論:オンライン学習

●HTM systems learn continuously, which is often referred to as “on-line learning”. 

With each change in the inputs the memory of the HTM system is updated. 

There are no batch learning data sets and no batch testing sets as is the norm for most machine learning algorithms. 

Sometimes people ask, “If there are no labeled training and test sets, how does the system know if it is working correctly and how can it correct its behavior?” 

HTM builds a predictive model of the world, which means that at every point in time the HTM-based system is predicting what it expects will happen next. 
HTM は、世界の予測モデルを作ります。それは、意味します|任意の時点でHTMに依拠したモデルは、それが推定することが次に起こると予測します。

The prediction is compared to what actually happens and forms the basis of learning. 

HTM systems try to minimize the error of their predictions.
HTM システムは、その予測誤差を最小化するように努めます。

●Another advantage of continuous learning is that the system will constantly adapt if the patterns in the world change. 

For a biological organism this is essential to survival. 

HTM theory is built on the assumption that intelligent machines need to continuously learn as well. 
HTM 理論は、知能機械も同じく連続的に学習することが必要であるという仮定のうえにたてられています。

However, there will likely be applications where we don’t want a system to learn continuously, but these are the exceptions, not the norm.

Conclusion 結論

●The biology of the neocortex informs HTM theory. 

In the following chapters we discuss details of HTM theory and continue to draw parallels between HTM and the neocortex. 

Like HTM theory, this book will evolve over time. 

At first release there are only a few chapters describing some of the HTM Principles in detail. 

With the addition of this documentation, we hope to inspire others to understand and use HTM theory now and in the future.

References 参考文献

Crick, F. H.C. (1979) Thinking About the Brain. Scientific American September 1979, pp. 229, 230. Ch. 4 27

Poirazi, P. & Mel, B. W. (2001) Impact of active dendrites and structural plasticity on the memory capacity of neural tissue. Neuron, 2001 doi:10.1016/S0896-6273(01)00252-5

Rakic, P. (2009). Evolution of the neocortex: Perspective from developmental biology. Nature Reviews Neuroscience. Retrieved from

Mountcastle, V. B. (1978) An Organizing Principle for Cerebral Function: The Unit Model and he Distributed System, in Gerald M. Edelman & Vernon V. Mountcastle, ed., 'The Mindful Brain' , MIT Press, Cambridge, MA , pp. 7-50

Zingg, B. (2014) Neural networks of the mouse neocortex. Cell, 2014 Feb 27;156(5):1096-111. doi: 10.1016/j.cell.2014.02.023

Herculano-Houzel, S. (2012). The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost. Proceedings of the National Academy of Sciences of the United States of America, 109 (Suppl 1), 10661–10668.




Sparse Distributed Representations 疎分布表現

●In this chapter we introduce Sparse Distributed Representations (SDRs), the fundamental form of information representation in the brain, and in HTM systems. 
 本章で、私達し、導入します|疎分布表現 (SDR) を|情報表現の基本形である|脳や、HTM システムにおける|。

We talk about several interesting and useful mathematical properties of SDRs and then discuss how SDRs are used in the brain.

What is a Sparse Distributed Representation? 疎分布表現とは何か?

●One of the most interesting challenges in AI is the problem of knowledge representation. 

Representing everyday facts and relationships in a form that computers can work with has proven to be difficult with traditional computer science methods. 

The basic problem is that our knowledge of the world is not divided into discrete facts with well-defined relationships. 

Almost everything we know has exceptions, and the relationships between concepts are too numerous and ill-defined to map onto traditional computer data structures.

Brains do not have this problem. 

They represent information using a method called Sparse Distributed Representations, or SDRs. 

SDRs and their mathematical properties are essential for biological intelligence.
SDR と、その数学的特性は、生物学的知能にとって本質的です。

Everything the brain does and every principle described in this book is based on SDRs. 

SDRs are the language of the brain.
SDR は、脳の言語です。

The flexibility and creativity of human intelligence is inseparable from this representation method.

Therefore, if we want intelligent machines to be similarly flexible and creative, they need to be based on the same representation method, SDRs.

An SDR consists of thousands of bits where at any point in time a small percentage of the bits are 1’s and the rest are 0’s. 
 ある一つのSDR は、何千ものビットからなり、それらは、任意の時点で、少ない%のビットが1で、残りは0です。

The bits in an SDR correspond to neurons in the brain, a 1 being a relatively active neuron and a 0 being a relatively inactive neuron. 
SDRのビットは、脳内のニューロンに該当し、1 が、比較的活性なニューロンで、0 が、比較的非活性なニューロンです。

The most important property of SDRs is that each bit has meaning.

Therefore, the set of active bits in any particular representation encodes the set of semantic attributes of what is being represented. 

The bits are not labeled (that is to say, no one assigns meanings to the bits), but rather, the semantic meanings of bits are learned. 
ビットには、ラベルは付いていません (つまり、誰もビットに意味はアサインしません)、むしろ、ビットの意味論的な意味は、学習されるのです。

If two SDRs have active bits in the same locations, they share the semantic attributes represented by those bits. 
もし二つのSDR が、同じ位置に活性なビットを持っているなら、それらは、これらのビットに表現される意味論的属性を共有します。

By determining the overlap between two SDRs (the equivalent bits that are 1 in both SDRs) we can immediately see how two representations are semantically similar and how they are semantically different. 
二つのSDRのオーバーラップ (両方のSDRで1である等値のビット) を決定することにより、私達は二つの表現がいかに意味論的に同様で、いかに意味論的に異なるかを理解できます。

Because of this semantic overlap property, systems based on SDRs automatically generalize based on semantic similarity.

HTM theory defines how to create, store, and recall SDRs and sequences of SDRs. 
 HTM 理論は、SDR とそのシークエンをいかにして創り、保存し、呼び出すかを定義します。

SDRs are not moved around in memory, like data in computers. 
SDR は、メモリー内を、コンピュータ内のデータのように、動きまわることはありません。

Instead the set of active neurons, within a fixed population of neurons, changes over time. 

At one moment a set of neurons represents one thing; the next moment it represents something else. 

Within one set of neurons, an SDR at one point in time can associatively link to the next occurring SDR. 
ニューロのある一組の中で、ある時点のSDR は、次に起こるSDRに、連想的にリンクしています。

In this way, sequences of SDRs are learned. 
このようにして、SDR のシークエンスは、学習されます。

Associative linking also occurs between different populations of cells (layer to layer or region to region). 
連想的リンキングは、細胞全体の様々な個体間 (層から層へ、領域から領域) でも起こります。

The meanings of the neuron encodings in one region are different than the meanings of neuron encodings in another region. 

In this way, an SDR in one modality, such as a sound, can associatively invoke an SDR in another modality, such as vision.
このようにして、あるモダリティ (例えば、音) におけるSDR は、別のモダリティ (例えば、視覚) におけるSDRを、連想的に励起できます。

Any type of concept can be encoded in an SDR, including different types of sensory data, words, locations, and behaviors.

This is why the neocortex is a universal learning machine. 

The individual regions of the neocortex operate on SDRs without “knowing” what the SDRs represent in the real world. 
新皮質の個別の領域は、作用します|SDR に|実世界においてSDRが何を表現しているかを「知る」ことなしに|。

HTM systems work the same way. 
HTM システムも、同様に働きます。

As long as the inputs are in a proper SDR format, the HTM algorithms will work. 
入力が正しいSDRフォーマットである限り、HTM アルゴリズムは、働きます。

In an HTM-based system, knowledge is inherent in the data, not in the algorithms.
HTM に基づいたシステムにおいては、知識は、本来、ダーにあり、アルゴリズムには、ありません。

To better understand the properties of SDRs it can be helpful to think about how information is typically represented in computers and the relative merits of SDRs versus the representation scheme used by computers. 
 SDR の特性を、より良く理解するためには、役立ちます|考えることが|コンピュータ内で情報がいかに表現されているかについて、また、SDR とコンピュータが用いる表現スキームとの相対的な特質について|。

In computers, we represent information with bytes and words. 

For example, to represent information about a medical patient, a computer program might use one byte to store the patient’s age and another byte to store the patient’s gender. 

Data structures such as lists and trees are used to organize pieces of information that are related. 

This type of representation works well when the information we need to represent is well defined and limited in extent. 

However, AI researchers discovered that to make a computer intelligent, it needs to know a huge amount of knowledge, the structure of which is not well defined.
しかし、AI 研究者は、発見しました|コンピュータを知能的にするためには、その構造が良く定義されていない、巨大な量の知識を知らなければならないことを|。

For example, what if we want our intelligent computer to know about cars? 

Think of all the things you know about cars. 

You know what they do, how to drive them, how to get in and out of them, how to clean them, ways they can fail, what the different controls do, what is found under the hood, how to change tires, etc. 
あなたは、知っています|車は何をするのか、どのようにして運転するか、どのようにして乗り降りするか、どのよ、うして掃除するか、どんな失敗をするか、どんな制御法があるか、フード (ボンネット) の下には何があるか、どのようにしてタイヤを変えるか、等等|。

We know the shapes of cars and the sounds they make. 

If you just think about tires, you might recall different types of tires, different brands of tires, how they wear unevenly, the best way to rotate them, etc. 

The list of all the things you know about cars goes on and on.

Each piece of knowledge leads to other pieces of knowledge in an ever-expanding web of associations. 

For example, cars have doors, but other objects have doors too, such as houses, planes, mailboxes, and elevators.

We intuitively know what is similar and different about all these doors and we can make predictions about new types of doors we have never seen before based on previous experience. 

We (our brains) find it easy to recall a vast number of facts and relationships via association. 
私達 (私達の脳) には、連想によって、沢山の事実や関係を思い出すことは簡単です。

But when AI scientists try to encode this type of knowledge into a computer, they find it difficult.

In computers information is represented using words of 8, 32, or 64 bits. 
 コンピュータにおいて、情報は、8, 32, 64 ビットのワードを用いて、表現されます。

Every combination of 1’s and 0’s is used, from all 1’s to all 0’s, which is sometimes called a “dense” representation. 
1と0のすべての組み合わせが用いられます。全て1 だったり、全て0だったりするのも。これらは、「密な」表現と呼ばれます。

An example of a dense representation is the ASCII code for letters of the alphabet. 

In ASCII the letter “m” is represented by:  01101101
ASCIIコードで、文字m は、01101101 で表現されます。

●Notice it is the combination of all eight bits that encodes “m”, and the individual bits in this representation don’t mean anything on their own. 
 注目してください|それは全部で8ビットの組み合わせで、m をエンコードしますが、この表現の中の個々のビットは、それ自身では何も意味しないことを|。

You can’t say what the third bit means; the combination of all eight bits is required. 

Notice also that dense representations are brittle. 

For example, if you change just one bit in the ASCII code for “m” as follows:  01100101
例えば、もし、m のASCII コードの1ビットだけ変えて、01100101 とすると、

you get the representation for an entirely different letter, “e”. 
全く違う文字 e の表現となります。

One wrong bit and the meaning changes completely.

There is nothing about 01101101 that tells you what it represents or what attributes it has. 
 01101101 には、それが何を表現し、どんな属性を持っているかを教えるものは何もありません。

It is a purely arbitrary and abstract representation. 

The meaning of a dense representation must be stored elsewhere.

In contrast, as mentioned earlier, SDRs have thousands of bits. 

At every point in time a small percentage of the bits are 1’s and the rest are 0’s. 
すべての時点で、少数%のビツトが 1 で、、残りは 0 です。

An SDR may have 2,000 bits with 2% (or 40) being 1 and the rest 0, as visualized in Figure 1.
ある一つのSDR は、2000ビットを持ち、2%の 40個が 1 で、残りは 0 です。(図1 に図示)




      Figure 1: An SDR of 2000 bits, where only a few are ON (ones).


In an SDR, each bit has meaning.   SDRにおいては、各ビットが意味を持ちます。

For example if we want to represent letters of the alphabet using SDRs, there may be bits representing whether the letter is a consonant or a vowel, bits representing how the letter sounds, bits representing where in the alphabet the letter appears, bits representing how the letter is drawn (i.e. open or closed shape, ascenders, descenders), etc. 
例えば、アルファベットの文字を、SDR を用いて表そうとすると、その文字が子音か母音かを表現するビットがあり、その文字がどう聞こえるかを表現するビットがあり、その文字がアルファベットのどこに現れるかを表現すねビットがあり、その文字がどう書かれるか (開いているか閉じているか、上に突き出しているか下に突き出しているか) を示すビツトがあり、などなど。

To represent a particular letter, we pick the 40 attributes that best describe that letter and make those bits 1. 
個々の文字を表現するために、私達は、その文字を最も良く記述する属性を40個選定し、そのらのビットを1 にします。

In practice, the meaning of each bit is learned rather than assigned; 

we are using letters and their attributes for illustration of the principles.

With SDRs the meaning of the thing being represented is encoded in the set of active bits. 

Therefore, if two different representations have a 1 in the same location we can be certain that the two representations share that attribute. 
それ故、もし、二つの異なる表現が同じ位置に1 を持っていると、二つの表現は、同じ属性を共有していることがわかります。

Because the representations are sparse, two representations will not share an attribute by chance; a shared bit/attribute is always meaningful. 
表現は、疎なので、二つの表現が偶然に属性を共有することはありません; 共有されるビット/属性は、常に有意味です。

As shown in Figure 2, simply comparing SDRs in this way tells us how any two objects are semantically similar and how they are different.






Figure 2: SDR A and SDR B have matching 1 bits and therefore have shared semantic meaning; SDR C has no matching bits or shared semantic meaning.


●There are some surprising mathematical properties of SDRs that don’t manifest in tradition computer data structures. 

For example, to store an SDR in memory, we don’t have to store all the bits as you would with a dense representation. 

We only have to store the locations of the 1-bits, and surprisingly, we only have to store the locations of a small number of the 1-bits. 

Say we have an SDR with 10,000 bits, of which 2%, or 200, are 1’s.
10000ビットのSDRずあるとしましょう。そのうち、2%, 200個が値1のビットです。

To store this SDR, we remember the locations of the 200 1-bits. 

To compare a new SDR to the stored SDR, we look to see if there is a 1-bit in each of the 200 locations of the new SDR. 

If there is, then the new SDR matches the stored SDR. 

But now imagine we store the location of just 10 of the 1-bits randomly chosen from the original 200. 

To see if a new SDR matches the stored SDR, we look for 1-bits in the 10 locations. 

You might be thinking, “But wait, there are many patterns that would match the 10 bits yet be different in the other bits. 

We could easily have a false positive match!” This is true. 
間違いのホジティブ・マッチ (正値適合) に簡単になるのでは!」と。  その通りです。

However, the chance that a randomly chosen SDR would share the same 10 bits is extremely low; it won’t happen by chance, so storing ten bits is sufficient. 

However, if two SDRs did have ten 1-bits in the same location but differed in the other bits then the two SDRs are semantically similar. 

Treating them as the same is a useful form of generalization. 

We discuss this interesting property, and derive the math behind it, later in this chapter.

●Another surprising and useful property of SDRs is the union property, which is illustrated in Figure 3. 

We can take a set of SDRs and form a new SDR, which is the union of the original set. 

To form a union, we simply OR the SDRs together. 

The resulting union has the same number of bits as each of the original SDRs, but is less sparse. 

Forming a union is a one-way operation, which means that given a union SDR you can’t say what SDRs were used to form the union. 

However, you can take a new SDR, and by comparing it to the union, determine if it is a member of the set of SDRs used to form the union. 

The chance of incorrectly determining membership in the union is very low due to the sparseness of the SDRs.

Figure 3: A union of 10 SDRs is formed by taking the mathematical OR of the bits. New SDR membership is checked by confirming 1 bits match. Note the union SDR is less sparse than the input SDRs.

●These properties, and a few others, are incredibly useful in practice and get to the core of what makes brains different than computers. 

The following sections describe these properties and operations of SDRs in more detail. 

At the end of the chapter we discuss some of the ways SDRs are used in the brain and in HTMs.

Mathematical Properties of SDRs  SDRの数学特性

●In this section, we discuss the following mathematical properties of SDRs with a focus on deriving fundamental scaling laws and error bounds:

• Capacity of SDRs and the probability of mismatches SDRのキャパシティと、ミスマッチの確率

• Robustness of SDRs and the probability of error with noise SDRの頑強性とノイズでの誤差の確率

• Reliable classification of a list of SDR vectors  SDRベクトルのリストの信頼できる分類

• Unions of SDRs SDRの和集合

• Robustness of unions in the presence of noise ノイズ環境下の和集合の頑強性

●These properties and their associated operations demonstrate the usefulness of SDRs as a memory space, which we illustrate in examples relevant to HTMs. 

In our analysis we lean on the intuitions provided by Kanerva (Kanerva, 1988 & 1997) as well as some of the techniques used for analyzing Bloom filters (Bloom, 1970). 

We start each property discussion with a summary description, and then go into the derivation of the mathematics behind the property. 

But first, here are some definitions of terms and notations we use in the following discussion and throughout the text. 

A more comprehensive list of terms can be found in the Glossary at the end of this book.

Mathematical Definitions and Notation 数学的定義と表記法

Binary vectors: For the purposes of this discussion, we consider SDRs as binary vectors

    using the notation x = [b0,...., bn-1] for an SDR x. 

    The values of each element are “0” or “1”, for OFF and ON, respectively.

Vector size: In an SDR x = [b0,...., bn-1] denotes the size of a vector. 

    Equivalently, we say n represents the total number of positions in the vector, or the total number of bits.

Sparsity: At any point in time, a fraction of the n bits in vector x will be ON and the rest will be OFF. 

    Let s denote the percent of ON bits. Generally in sparse representations, s will be substantially less than 50%.

Vector cardinality: Let w denote the vector cardinality, which we define as the total number of ON bits in the vector. 
ベクトル濃度: w  ONビットの総数

    If the percent of ON bits in vector x is s, then �wx = s�× n� = |x|

Overlap: We determine the similarity between two SDRs using an overlap score. 

  The overlap score is simply the number of ON bits in common, or in the same locations, between the vectors. 

  If x and y are two SDRs, then the overlap can be computed as the dot product: overlap (x, y) ≡ x ∙ y

●Notice we do not use a typical distance metric, such as Hamming or Euclidean, to quantify similarity. 

With overlap we can derive some useful properties discussed later, which would not hold with these distance metrics.

Matching: We determine a match between two SDRs by checking if the two encodings overlap sufficiently. 

  For two SDRs x and y:  match�(x, y| θ�) ≡ overlap �x�,y�) ≥θ

If x and y have the same cardinality w, we can determine an exact match by setting θ = w. 

In this case, if θ is less than w, the overlap score will indicate an inexact match.

●Consider an example of two SDR vectors:

     x = [0100000000000000000100000000000110000000]
     y = [1000000000000000000100000000000110000000]

Both vectors have size n = 40, s = 0.1, and w = 4. 

The overlap between x and y is 3; i.e. there are three ON bits in common positions of both vectors. 

Thus the two vectors match when the threshold is set at θ = 3, but they are not an exact match. 

Note that a threshold larger than either vector's cardinality – i.e., θ > w – implies a match is not possible.


Capacity of SDRs and the Probability of Mismatches

●To be useful in practice, SDRs should have a large capacity.  

Given a vector with fixed size n and cardinality w, the number of unique SDR encodings this vector can represent is the combination n choose w:

                                nCw = n! / w!(n-w)!                      (1)

●Note this is significantly smaller than the number of encodings possible with dense representations in the same size vector, which is 2n

This implies a potential loss of capacity, as the number of possible input patterns is much greater than the number of possible representations in the SDR encoding. 

Although SDRs have a much smaller capacity than dense encodings, in practice this is meaningless. 

With typical values such as n = 2048 and w = 40, the SDR representation space is astronomically large at 2.37×1084 encodings; the estimated number of atoms in the observable universe is 〜1080.

●For SDRs to be useful in representing information we need to be able to reliably distinguish between encodings; i.e. SDRs should be distinct such that we don’t confuse the encoded information. 

It is valuable then to understand the probability with which two random SDRs would be identical. 

Given two random SDRs with the same parameters, x and y, the probability they are identical is

                         P (x = y) = 1 / nCw                               (2)

●Consider an example with n=1024 and w=2.

There are 523,776 possible encodings and the probability two random encodings are identical is rather high, i.e. one in 523,776. 

This probability decreases extremely rapidly as w increases. 

With w = 4, the probability dives to less than one in 45 billion. 

For n = 2048 and w = 40, typical HTM values, the probability two random encodings are identical is essentially zero. 

Please note (2) reflects the false positive probability under exact matching conditions, not inexact matching used in most HTM models; this is discussed later in the chapter.

●The equations above show that SDRs, with sufficiently large sizes and densities, have an impressive capacity for unique encodings, and there is almost no chance of different representations ending up with the same encoding.


Overlap Set

●We introduce the notion of an overlap set to help analyze the effects of matching under varying conditions. 

Let x be an SDR encoding of size n with wx bits on. 

The overlap set of x with respect to b is Ωx (n, w, b), defined as the set of vectors of size n with w bits on, that have exactly b bits of overlap with x. 

The number of such vectors is Ωx (n, w, b) , where ∙ denotes the number of elements in a set. 

Assuming b� ≤ wx and b ≤ w,

                                           省略                 (3)

●The first term in the product of (3) the number of subsets of x with b bits ON, and the second term is the number of other patterns containing n − wx bits, of which w − b bits are ON.

●The overlap set is instructive as to how we can compare SDRs reliably; i.e. not get false negatives or false positives, even in the presence of significant noise, where noise implies random fluctuations of ON/OFF bits. 

In the following sections we explore the robustness of SDRs in the presence of noise using two different concepts, inexact matching and subsampling.


Inexact Matching

●If we require two SDRs to have an exact match (i.e. θ� = w) then even a single bit of noise in either of the SDRs’ ON bits would generate a false negative where we fail to recognize matching SDRs. 

In general we would like the system to be tolerant to changes or noise in the input. 

That is, rarely would we require exact matches, where θ = w. 

Lowering θ allows us to use inexact matching, decreasing the sensitivity and increasing the overall noise robustness of the system. 

For example, consider SDR vectors x and x', where x' is x corrupted by random noise.

With n = 40 and θ lowered to 20, the noise can flip 50% of the bits (ON to OFF and vice-versa) and still match x to x'.

●Yet increasing the robustness comes with the cost of more false positives. 

That is, decreasing θ also increases the probability of a false match with another random vector. 

There is an inherent tradeoff in these parameters, as we would like the chance of a false match to be as low as possible while retaining robustness.


Figure 4: This figure illustrates the conceptual difference of lowering the match threshold θ. 

The large grey areas represent the space of all possible SDRs, where the elements x!, x!,…x! are individual SDRs within the space. 

In space A we see the exact matching scenario, where SDRs are single points in the space. 

Notice x!, x!,…x! in space B are now larger circles within the space, implying more SDRs will match to them in B than in A. 

That is, the ratio of white to grey becomes much larger as you decrease θ. 

Spaces A and B are the same size because they have a fixed n. If we increase n—i.e. increase the space of possible SDRs—the ratio of white to grey becomes smaller, as shown in space C. 

The transitions from A to B to C illustrate the tradeoff between the parameters θ and n: decreasing θ gives you more robustness, but also increases your susceptibility to false matches, but increasing n mitigates this effect.

●With appropriate parameter values the SDRs can have a large amount of noise robustness with a very small chance of false positives. 

To arrive at the desired parameter values we need to calculate the false positive likelihood as a function of the matching threshold.

Given an SDR encoding x and another random SDR y, both with size n and cardinality w, what is the probability of a false match, i.e. the chance the overlap�x, y�) ≥θ? 

A match is defined as an overlap of θ bits or greater, up to w. 

With nCw total patterns, the probability of a false positive is:


●What happens when θ = w, or an exact match? The numerator in (4) evaluates to 1, and the equation reduces to (2).

●To gain a better intuition for (4), again suppose vector parameters n = 1024 and w = 4. 

If the threshold is w = 2, corresponding to 50% noise, then the probability of an error is one in 14,587. 

That is, with 50% noise there is a significant chance of false matches. 

If w and θ are increased to 20 and 10, respectively, the probability of a false match decreases drastically to less than one in 1013!

Thus, with a relatively modest increase in w and θ, and holding n fixed, SDRs can achieve essentially perfect robustness with up to 50% noise. 

Figure 5 illustrates this for HTM values used in practice.


Figure 5: This plot illustrates the behavior of Eq. 4. 

The three solid curves show the rapid drop in error rates (i.e. probability of false positives) as you increase the SDR size n. 

Each curve represents a different number of ON bits w, and a constant 50% match threshold θ. 

For all three curves the error drops faster than exponentially as n increases, becoming essentially 0 once n > 2000. The dashed line shows the error rate when half of the bits in the SDR are ON. 

Notice this line maintains a relatively high error rate, implying it is not possible to get robust recognition with a non-sparse representation; both sparsity and high dimensionality are required to achieve low error rates. 

Use this plot interactively online: placeholder for link



●An interesting property of SDRs is the ability to reliably compare against a subsampled version of a vector. 

That is, recognizing a large distributed pattern by matching a small subset of the active bits in the large pattern. 

Let x be an SDR and let x' be a subsampled version of x, such that wx' ≤ wx

It’s self-evident the subsampled vector x' will always match x�, provided θ ≤ wx' that is. 

However, as you increase the subsampling the chance of a false positive increases.

●What is the probability of a false match between x' and a random SDR y? 

Here the overlap set is computed with respect to the subsample x', rather than the full vector x. 

If b ≤ wx' and wx' ≤ wy then the number of patterns with exactly b bits of overlap with x' is:


●Given a threshold θ� ≤ wx'!, the chance of a false positive is then:


●Notice (6) and (7) differ from (3) and (4), respectively, only in the vectors being compared. 

That is, subsampling is simply a variant of the inexact matching properties discussed above.

●For instance, suppose n = 1024 and wy = 8. 

Subsampling half the bits in x and setting the threshold to two (i.e. wx' = 4, θ = 2), we find the probability of an error is one in 3,142. 

However, increasing wy to 20 and the relevant parameter ratios fixed (i.e. wx' = 10, θ = 5) the chance of a false positive drops precipitously to one in 2.5 million. 

Increasing n to 2048, wy = 40 , wx' = 20 , and θ = 10 , more practical HTM parameter values, the probability of a false positive plummets to better than one in 1012

This is remarkable considering that the threshold is about 25% of the original number of ON bits. 

Figure 6 illustrates this reliability in subsampling for varying HTM parameters.

Figure 6: This plot illustrates the behavior of Eq. 6, where we can see the error rates (i.e. probability of false positives) as a function of the size (i.e. number of bits) of the subsampling. 

The three solid curves represent a few dimensionalities and sparsities, showing an exponential improvement in error rates as the number of subsampled bits increases. 

With sufficiently high dimensionality and sparse activity, subsampling values between 15 and 25 can lead to very low error rates. 

Conversely, the dashed line represents the error rates for a relatively dense representation (25% total ON bits); the error remains high, despite the high dimensionality. Use this plot interactively online: placeholder for link

●With practical parameters, it is clear SDRs offer minimal probabilities of false positives. 

The properties of subsampling and inexact matching allow us to use SDRs as reliable mechanisms for classification, discussed in the next section.


Reliable Classification of a List of SDR Vectors

●A useful operation with SDRs is the ability to classify vectors, where we can reliably tell if an SDR belongs to a set of similar SDRs. 

We consider a form of classification similar to nearest neighbor classification. 

Let X be a set of M vectors, X = (x1,...,xM), where each vector xi is an SDR. 

Given any random SDR y we classify it as belonging to this set as follows:


●How reliably can we classify a vector corrupted by noise? 

More specifically, if we introduce noise in a vector xi by toggling ON/OFF a random selection t of the n bits, what is the likelihood of a false positive classification?

Assuming t ≤ w − θ there are no false negatives in this scheme, only false positives. 

Thus the question of interest becomes what is the probability the classification of a random vector y is a false positive? 

Since all vectors in X are unique with respect to matching, the probability of a false positive is by:


●The false positive probability of an individual overlap is extremely small, so it is more practical to use the following bound:


●Consider for example n = 64 and w = 3 for all vectors. 

If θ = 2, 10 vectors can be stored in your list, and the probability of false positives is about one in 22. 

Increasing w to 12 and θ to eight, maintaining the ratio θ/w = 2/3, the chance of a false positive drops to about one in 2363. 

Now increase the parameters to more realistic values: n� = 1024, w = 21, and θ = 14 (i.e. two-thirds of w). 

In this case the chance of a false positive with 10 vectors plummets to about one in 1020

In fact, with these parameters the false positive rate for storing a billion vectors is better than one in 1012 !

●This result illustrates a remarkable property of SDRs. 

Suppose a large set of patterns is encoded in SDRs, and stored in a list. 

A massive number of these patterns can be retrieved almost perfectly, even in the presence of a large amount of noise. 

The main requirement being the SDR parameters n, w, and t need to be sufficiently high. 

As illustrated in the above example, with low values such as n = 64 and w = 3 your SDRs are unable to take advantage of these properties.


Unions of SDRs

●One of the most fascinating properties of SDRs is the ability to reliably store a set of patterns in a single fixed representation by taking the OR of all the vectors. We call this the union property. 

To store a set of M� vectors, the union mechanism is simply the Boolean OR of all the vectors, resulting in a new vector X. 

To determine if a new SDR y is a member of the set, we simply compute the match(X, y).

Figure 7 (top) Taking the OR of a set of M SDR vectors results in the union vector X. 

With each individual vector having a total of 2% ON bits, and M = 10, it follows that the percentage of ON bits in X is at most 20%. The logic is straightforward: if there is no overlap within the set of vectors, each ON bit will correspond to its own ON bit in the union vector, summing the ON bit percentages. 

With overlap, however, ON bits will be shared in the union vector, resulting in a lower percentage of ON bits.

(bottom) Computing the match(X, y) reveals if y is a member of the union set X – i.e. if the ON positions in y are ON in X as well.


●The advantage of the union property is that a fixed-size SDR vector can store a dynamic set of elements. 

As such, a fixed set of cells and connections can operate on a dynamic list. It also provides an alternate way to do classification. 

In HTMs, unions of SDRs are used extensively to make temporal predictions, for temporal pooling, to represent invariances, and to create an effective hierarchy. 

However, there are limits on the number of vectors that can be reliably stored in a set. 

That is, the union property has the downside of increased potential for false positives.

●How reliable is the union property? 

There is no risk of false negatives; if a given vector is in the set, its bits will all be ON regardless of the other patterns, and the overlap will be perfect. 

However, the union property increases the likelihood of false positives. 

With the number of vectors, M, sufficiently large, the union set will become saturated with ON bits, and almost any other random vector will return a false positive match. 

It’s essential to understand this relationship so we can stay within the limits of the union property.

●Let us first calculate the probability of a false positive assuming exact matches, i.e. θ = w. 

In this case, a false positive with a new random pattern y� occurs if all of the bits in y overlap with X. 

When M = 1, the probability any given bit is OFF is given by 1 − s, where s =w/n. 

As M grows, this probability is given by:


●After M union operations, the probability a given bit in X is ON is 1 − p0

The probability of a false positive, i.e. all w bits in y are ON, is therefore:


●The technique used to arrive at (12) is similar to the derivation of the false positive rate for Bloom filters (Bloom, 1970; Broder and Mitzenmacher, 2004). 

The slight difference is that in Bloom filters each bit is chosen independently, i.e. with replacement. 

As such, a given vector could contain less than w ON bits. In this analysis we guarantee that there are exactly w bits ON in each vector.

●The above derivation assures us, under certain conditions, we can store SDRs as unions without much worry for false positives. 

For instance, consider SDR parameters n = 1024 and w = 2. 

Storing M = 20 vectors, the chance of a false positive is about one in 680. 

However, if w is increased to 20, the chance drops dramatically to about one in 5.5 billion. 

This is a remarkable feature of the union property. 

In fact, if increasing M to 40, the chance of an error is still better than 10-5.

●To gain an intuitive sense of the union property, the expected number of ON bits in the union vector is n (1 − p0). 

This grows slower than linearly, as shown in (12); additional union operations contribute fewer and fewer ON bits to the resulting SDR. 

Consider for instance M = 80, where 20% of the bits are 0. 

When we consider an additional vector with 40 ON bits, there is a reasonable chance it will have at least one bit among this 20, and hence it won’t be a false positive. 

That is, only vectors with all of their w bits amongst the 80% ON are false positives. 

As we increase n and w, the number of patterns that can OR together reliably increases substantially.

As illustrated in Figure 8, if n and w are large enough, the probably of false positives remains acceptably low, even as M increases.

Figure 8: This plot shows the classification error rates (i.e. probability of false positives when matching to a union set of SDRs) of Eq. 12. The three lines show the calculated values for a few SDR dimensionalities, where w = 200. 

We see the error increases monotonically with the number of patterns stored. 

More importantly the plot shows that the size of the SDR is a critical factor: a small number of bits (1000) leads to relatively high error rates, while larger vectors (10,000+) are much more robust. Use this plot interactively online: placeholder for link



Robustness of Unions in the Presence of Noise

●As mentioned above, the expected number of ON bits in the union vector X is WX = n(1 − p0), where we use the tilde notation to represent a union vector. 

Assuming n ≥  WX ≥ W, we can calculate the expected size of the overlap set:


●For a match we need an overlap of θ or greater bits (up to W�). 

The probability of a false match is therefore:


●Notice (14) is an approximation of the error, as we’re working with the expected number of ON bits in X.

●As you would expect, the chance of error increases as the threshold is lowered, but the consequences of this tradeoff can be mitigated by increasing n. 

Suppose n = 1024 and w = 20. When storing M = 20 vectors, the chance of a false positive when using perfect matches is about one in 5 billion. Using a threshold of 19 increases the false positive rate to about one in 123 million. 

When θ = 18, the chance increases to one in 4 million. 

However, if you increase n to 2048 with θ = 18, the false positive rate drops dramatically to one in 223 billion! 

This example illustrates the union property’s robustness to noise, and is yet another example of our larger theme: small linear changes in SDR numbers can cause super-exponential improvements in the error rates.


Computational Efficiency

●Although SDR vectors are large, all the operations we've discussed run in time linear with the number of ON bits. 

That is, the computations are dependent on the number of ON bits, w, not on the size of the vector, n�. 

There are no loops or optimization processes required, enabling fast calculations with SDR vectors. 

This would not be the case, however, with more standard distance metrics, which are typically O(n). 

For HTM systems this is important since in practice w ≪ n.

●In the following section we discuss how the brain takes advantage of the mathematical properties of SDRs, and how this is manifested in HTM systems.

SDRs in the Brain and in HTM Systems

●How do sparse distributed representations and their mathematical principles relate to information storage and retrieval in the brain? 

In this section we will list some of the ways SDRs are used in the brain and the corresponding components of HTM theory.

Neuron Activations are SDRs

●If you look at any population of neurons in the neocortex their activity will be sparse, where a low percentage of neurons are highly active (spiking) and the remaining neurons are inactive or spiking very slowly. 

SDRs represent the activity of a set of neurons, with 1- and 0-bits representing active and relatively inactive neurons, respectively. 

All the functions of an HTM system are based on this basic correspondence between neurons and SDRs.

●The activity of biological neurons is more complex than a simple 1 or 0. 

Neurons emit spikes, which are in some sense a binary output, but the frequency and patterns of spikes varies considerably for different types of neurons and under different conditions. 

There are differing views on how to interpret the output of a neuron.

On one extreme are arguments that the timing of each individual spike matters; the inter-spike time encodes information. Other theorists consider the output of a neuron as a scalar value, corresponding to the rate of spiking. 

However, it has been shown that sometimes the neocortex can perform significant tasks so quickly that the neurons involved do not have enough time for even a second spike from each neuron to contribute to the completion of the task. In these tasks inter-spike timing and spike rate can’t be responsible for encoding information. 

Sometimes neurons start spiking with a mini-burst of two to four spikes in quick succession before settling into a steadier rate of spiking. 

These mini-bursts can invoke long lasting effects in the post-synaptic cells, i.e. the cells receiving this input.

●HTM theory says that a neuron can be in one of several states:

- Active (spiking)
- Inactive (not spiking or very slowly spiking)
- Predicted (depolarized but not spiking)
- Active after predicted (a mini-burst followed by spiking)

●These HTM neuron states differ from those of other neural network models and this deserves some explanation. 

First, it is well established that some biological neurons spike at different rates depending on how well their input matches their ideal “receptive field”. 

However, individual neurons are never essential to the performance of the network; the population of active cells is what matters most, and any individual neuron can stop working with little effect to the network. 

It follows that variable spiking rate is non-essential to neocortical function, and thus it is property that we choose to ignore in the HTM model. We can always compensate for lack of variable encoding by using more bits in an SDR. 

All the HTM implementations created to date have worked well without variable rate encoding. 

The second reason for avoiding rate encoding is that binary cell states make software and hardware implementations much simpler. 

HTM systems require almost no floating point operations, which are needed in any system with rate encoding. 

Hardware for implementing HTM will be far simpler without the need for floating point math. There is an analogy to programmable computers. 

When people first started building programmable computers, some designers advocated for decimal logic. 

Binary logic won out because it is much simpler to build.

●Although HTM neurons don’t have variable rate outputs, they do incorporate two new states that don’t exist in other theories. 

When a neuron recognizes a pattern on its distal or apical dendrites, the dendrite generates a local NMDA spike, which depolarizes the cell body without generating a somatic spike. 

In HTM theory, this internal depolarized state of the cell represents a prediction of future activity and plays a critical role in sequence memory. 

And finally, under some conditions a neuron will start firing with a mini-burst of spikes. 

One condition that can cause a mini-burst is when a cell starts firing from a previously depolarized state. 

A miniburst activates metabotropic receptors in the post-synaptic cell, which leads to long lasting depolarization and learning effects. 

Although the neuroscience around mini-bursts is not settled, HTM theory has a need for the system acting differently when an input is predicted from when it isn’t predicted. 

Mini-bursts and metabotropic receptors fill that role. By invoking metabolic effects the neuron can stay active after its input ceases and can exhibit enhanced learning effects. 

These two states, predicted (depolarized) and active after predicted (miniburst) are excellent examples of how HTM theory combines top-down system-level theoretical needs with detailed biological detail to gain new insights into biology.

Figure 9: Visualization of an SDR in the brain (placeholder)


Neural Predictions as Unions of SDRs

●HTM sequence memory makes multiple simultaneous predictions of what will happen next. This capability is an example of the union property of SDRs. The biological equivalent is multiple sets of cells (SDRs) being depolarized at the same time. 

Because each representation is sparse, many predictions can be made simultaneously. For example, consider a layer of neurons implementing an HTM sequence memory. 

If 1% of the neurons in the layer are active and lead to 20 different predictions, then about 20% of the neurons would be in the depolarized/predictive state. 

Even with 20% of neurons depolarized, the system can reliably detect if one of the predictions occurs or not. 

As a human, you would not be consciously aware of these predictions because the predicted cells are not spiking. However, if an unexpected input occurs, the network detects it, and you become aware something is wrong.


Synapses as a Means for Storing SDRs

●Computer memory is often called “random access memory.” 

A byte is stored in a memory location on a hard drive or on a memory chip. To access the byte’s value you need to know its address in memory. 

The word “random” means that you can retrieve information in any order as long as you have the address of the item you want to retrieve. 

Memory in the brain is called “associative memory.” In associative memory, one SDR is linked to another SDR which is linked to another, etc. SDRs are recalled through “association” with other SDRs. 

There is no centralized memory and no random access. 

Every neuron participates in both forming SDRs and in learning the associations.

●Consider a neuron that we want to recognize a particular pattern of activity. 

To achieve this, the neuron forms synapses to the active cells in the pattern. 

As described above, a neuron only needs to form a small number of synapses, typically fewer than twenty, to accurately recognize a pattern in a large population of cells as long as the pattern is sparse. 

Forming new synapses is the basis of almost all memory in the brain.

●But we don’t want just one neuron to recognize a pattern; we want a set of neurons to recognize a pattern. 

This way one SDR will invoke another SDR. We want SDR pattern “A” to invoke SDR pattern “B.” 

This can be achieved if each active cell in pattern “B” forms twenty synapses to a random sample of the cells in pattern “A.”

●If the two patterns “A” and “B” are subsequent patterns in the same population of neurons then the learned association from “A” to “B” is a transition, and forms the basis of sequence memory. 

If the patterns “A” and “B” are in different populations of cells then pattern “A” will concurrently activate pattern “B.” 

If the neurons in pattern “A” connect to the distal synapses of neurons in pattern “B,” then the “B” pattern will be predicted. 

If the neurons in pattern “A” connect to the proximal synapses of neurons in pattern “B,” then the “B” pattern will consist of active neurons.

●All associative memory operations use the same basic memory mechanism, the formation of new synapses on the dendrite segments of neurons. 

Because all neurons have dozens of dendrite segments and thousands of synapses, each neuron does not just recognize one pattern but dozens of independent patterns. 

Each neuron participates in many different SDRs.



●SDRs are the language of the brain, and HTM theory defines how to create, store, and recall SDRs and sequences of SDRs. 

In this chapter we learned about powerful mathematical properties of SDRs, and how these properties enable the brain, and HTM, to learn sequences and to generalize.



Kanerva, P. (1988). Sparse Distributed Memory. Bradford Books of MIT Press.

Kanerva, P. (1997). Fully distributed representation. Proceedings of 1997 Real World Computing Symposium (RWC ’97, Tokyo, Jan. 1997), pp. 358–365. Tsukuba-city, Japan: Real World Computing Partnership.

Bloom, B.H. (1970). Space/ Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, Volume 13, Number 7, July 1970, pp. 422-426.

Broder, A., & Mitzenmacher, M. (2004). Network Applications of Bloom Filters, A Survey. Internet Mathematics, Volume 1, No. 4, pp. 485-509.

Ahmad, S., & Hawkins, J. (2016). How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites. arXiv, 1601.00720. Neurons and Cognition; Artificial Intelligence. Retrieved from