“A remarkably complex yet fascinating scientific exploration that illuminates a particularly thorny area of physics for laypersons and professionals alike… An earnest examination that walks the tightrope between the scientific community and casual readers.…” Read more Kirkus Reviews
“... This book is a Masterpiece… trying to explain in a plain and understandable language a hard-to-explain subject..” Read more Avi Levkovitz, Journalist

Definitions from the book

The Carnot

The Carnot efficiency is the maximum amount of work that can be produced from a given amount of heat transferred between two temperatures. Read more

The Clausius

The Clausius entropy is the heat (energy added or removed) divided by the temperature of its source. Read more

The Boltzmann

The Boltzmann entropy is the logarithm of the possible distinguishable arrangements (microstates) of a system multiplied by Boltzmann constant. Read more

The Gibbs

The Gibbs entropy is the sum over all microstates of the probability of the microstate times its logarithm, multiplied by minus Boltzmann constant. Read more
Chapter 1

Entropy and the Flow of Energy

Carnot's Efficiency

Page 19 from the book

Thus, the road leading to the science of thermodynamics, including the formulation of its second law, began with Carnot and his study of the efficiency of steam engines in 1824. It was not Carnot who formulated the second law, nor was he aware of entropy. Nevertheless, he understood intuitively – in all likelihood, following on the conversations he had had with his illustrious father – that if a machine, of any type, transferred energy from a hot place to a colder one, then the maximum amount of mechanical work that one could expect to obtain would be the amount of energy removed from the hotter object, less the amount absorbed by the colder one. Indeed, this result was identical with the solution to the problem which had intrigued Lazare Carnot: What is the maximum work that can be obtained from water that drives a turbine? The answer is that the amount of work expected cannot be greater than the energy released when the water falls down from a higher level to a lower one.

Mechanical work is defined as a force applied to an object along a given distance: this force will change the object's motion -its speed and/or its direction, namely its velocity. Moving objects can be used for various purposes: in transportation, for example, it is used to change the spatial location of an object; in a mixer, to knead dough; in the circulatory system, to transport the oxygen required for our bodily functions. Indeed, movement is a characteristic of life itself.

Therefore we can say that motion is an expression of energy. In fact, the amount of energy stored in the motion of an object is directly proportional to the product of its mass and the square of its velocity. Today we call this energy of motion “kinetic energy.” The first to define the energy of a moving object, in 1689, was   Gottfreid   Wilhelm   Leibniz,   a   German  mathematician and influential philosopher, who was also the first one (simultaneously with Newton, but independently) to develop calculus. Leibniz called this energy “vis viva” – Latin for “live energy.”

It is not too difficult to understand intuitively that kinetic energy somehow corresponds to mechanical work, but in fact, it also corresponds to heat energy – in this case, not with respect to the object as a whole, but rather to its molecules, atoms, electrons and other particles which constitute matter. In Carnot’s day, however, there was still no knowledge of the particles of the matter; Atomic theory was way in the future; also, the prevailing belief was that heat was some form of primeval liquid that, when absorbed by any material, caused it to be hot (somewhat the way an alcoholic beverage causes a warm sensation in our throats). This liquid was called caloric (by Antoine Lavoisier, in 1787).

There is another form of energy, one that is not linked at all to motion, which is called “potential energy.” This energy is stored in an object as a result of some force that has acted (or is acting) upon it. For example, water situated up high contains more energy than water down low, because work must have been done to overcome earth’s gravity in order to raise the water to a higher level. If we allow the water to flow back down, its potential energy is released and can be used to drive a waterwheel. And even though Carnot was not interested in the amount of work that could be obtained from the potential energy of a waterfall, but rather the work energy that could be extracted from a hotter object, the results that Carnot obtained (with respect to the amount of work that can be obtained when heat passes from a hotter object to a colder one) were, amazingly, similar to those obtained from the study of falling water. (It is worth noting that there are other forms of energy that have no connection to mass, for example the energy of light.)

Carnot’s showed the extent of his genius in the calculation he made to find the efficiency of the process. As we shall see, this calculation is no more complicated than the trivial conclusion that he reached, presumably, intuitively. This efficiency, which is called today Carnot's efficiency, is one way of expressing the second law of thermodynamics.


Chapter 1

Entropy and the Flow of Energy

The Clausius Entropy

Page 38 from the book

Why did Clausius specify that it was Q/T in equilibrium that defined the system’s entropy? The reason is that Q/T can be exactly defined only when it is in a state of equilibrium. When we measure some physical quantity, we prefer that the measurement be unique. That is, no matter when and where the same quantity is measured, its value should be the same. In the case of an ideal gas, the temperature at equilibrium is homogenous throughout its volume, and thus the value of S will be identical throughout. However, in a gas that is not in equilibrium, some areas may be hotter or cooler than others and thus Q/T will not be uniform throughout, and the entropy will not be well defined.

Take, for example, a sugar cube and water in a cup. Each on its own is in equilibrium, and the entropy of each can be calculated. But if we place the sugar into water and wait for a period of time, the sugar will dissolve in the water. Sugar dissolves in water spontaneously because the entropy of the solution is higher than the entropy of the sugar and the entropy of the water – indeed, this is the reason the sugar tends to dissolve in water. However, at this point we have no way of knowing the entropy of the solution itself, at least not until the sugar has completely and uniformly dissolved and the solution is completely homogenous. Then the system will again be in equilibrium and its entropy can be determined based on the difference in temperatures between the solution and pure water. It will be observed that the entropy did, indeed, increase.

Another, somewhat different example is the process of spontaneous separation of milk into cream, which floats on top, and skim milk beneath it. The reason this happens is that the entropies of both cream and skim milk, separately, are higher than that of whole milk. (In this industrial age, natural milk undergoes homogenization to prevent this spontaneous separation.)

Another important property of entropy is that it is additive, or as the physicists call it, “extensive.” This means that the addition of the values of the same property in two different objects yields a value that is the sum of both. Weight, for example, is an extensive property: a kilogram of tomatoes and a kilogram of cucumbers placed together yield a total weight of two kilograms. On the other hand, temperature is not: if we mix water at 50 °C with water at 90 °C, we shall not end up with water at 140 °C. Thus temperature is not an extensive property.

Entropy, on the other hand, is extensive. If two boxes, A and B, are in equilibrium, and if box A has entropy SA and box B has entropy SB, the two boxes together will have an amount of entropy that is equal to Sa+Sb This property gives entropy a physical significance that is preferable to that of temperature, because it describes a well-defined quantity, so to speak, whereas temperature is not more of a property, rather than quantity.


Chapter 2

Statistical Entropy

The Boltzmann Entropy

 It is unbelievable how simple and straightforward each result appears once it has been found, and how difficult, as long as the way which leads to it is unknown.

- Ludwig Boltzmann, 189418

Page 42 from the book

Not long after Clausius isolated entropy in Carnot’s efficiency and discovered its unusual properties – nine years, to be precise – a depressive Austrian physicist by the name of Ludwig Boltzmann gave entropy its modern form, as a statistical quantity. Just as Carnot had realized – long before the three laws of thermodynamics were formulated – that energy is conserved (the first law) and that there is a temperature of absolute zero at which there is no energy in matter (the third law), so it was clear to Boltzmann that matter is made of atoms and molecules – long before this was accepted by the entire scientific community. This intuitive realization probably informed his statistical approach to entropy.

Boltzmann’s expression for entropy is substantially different from Clausius’s: first of all, it is a statistical expression; and second, it cannot be directly measured. While the Clausius entropy is something tangible – defined as an actual amount of heat, divided by an actual measure of temperature – Boltzmann’s expression deals with the uncertainty of a system.


Chapter 2

Statistical Entropy

The Gibbs Entropy

"The whole is simpler than the sum of its parts".

- Josiah Gibbs, 189430

Page 71 from the book

Whereas Boltzmann’s entropy, for a system with W microstates, is the product of the logarithm of that number by a constant now called the Boltzmann constant, Gibbs defined the same entropy as the sum of the entropies of the individual microstates. Since the entropy of each microstate is dependent on its probability, Gibbs showed that entropy can be written as the sum of the probabilities.


Chapter 3

The Entropy of Electromagnetic Radiation

The Distribution of Energy in Oscillators – Planck’s Equation

“A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.”

- Max Planck, 189452

Page 111 from the book

Planck used Boltzmann’s entropy to calculate the distribution of P particles in N states (radiation modes), such that entropy would be maximized. He was the first scientist to use Boltzmann’s equation for a calculation of thermodynamic distribution.* In his derivation, he made two assumptions: first, that radiation is quantized; and second, that the amount of energy embodied in a quantum of electromagnetic radiation depends on its frequency. That is, the higher the energy of a photon, the higher its frequency. No less important, he proved that an oscillator’s energy is proportional to its temperature (that is, E = kT) only when the number of photons in each radiation mode is very high, that is, at very low frequencies. When the frequencies are very high, the number of photons in one radiation mode decreases exponentially and thus the energy emitted is lowered dramatically. Thus Planck solved the problem of infinite radiation at high frequencies that arose from the Rayleigh and Jeans model.


Chapter 4

The Entropy of Information

Shannon's entropy

“My greatest concern was what to call it. I thought of calling it ‘information,’ but the word was overly used, so I decided to call it ‘uncertainty.’ When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, ‘You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.”

- Claude Shannon61

Page 130 from the book

In order to understand Shannon’s entropy, let us return to Bob and Alice and assume that they have a communication channel that is capable of transferring one pulse by seven p. m. What Shannon did was to attempt to quantify the amount of information that Bob transfers to Alice. At first blush, it seems that the amount of information is the sum of probabilities. In other words, if the probability that “Bob is busy” (0) is p, then the probability that “Bob is free” (1), is 1 – p, meaning that the total probability is one unit (one bit), no matter what the value of p actually is.

But Shannon gave a different solution, namely:

I = –[p ln p + (1 – p) ln (1 – p)],

That is to say, the amount of information, I, is a function of the probability of the bit being “0” times the log of this probability, plus the probability of the bit being “1”, times the log of this probability. For this solution, Shannon deserved a place in the pantheon of great scientists.

First we have to understand why I = p+(1 – p), that is, one bit, is not a good answer. It is possible that on the day Bob made his arrangement with Alice, he had already planned another meeting and that the one with Alice (who was aware of this) was a secondary option. If we suppose that the chance for this original meeting to be cancelled, so that Bob would meet with Alice, is ten percent, then the bit with a value of “0” carries less information to Alice (0.1 bits), while the “1” bit carries more information (0.9 bits). If we add these probabilities, the answer is always “1” regardless of the relative probabilities (as known to Alice) of the bit being “0” or “1.” If Alice had not been aware that the probability was 90:10, she might have assumed that the probability was still 50:50, and the bit with the “0” value would carry the same value of information as the “1”. Therefore adding probabilities overlooks the a priori knowledge that is so important in communications.

In the expression that Shannon gave, on the other hand – and he was aware of the fact that this expression was the same as that of the Gibbs entropy – summing up the probabilities does not disregard the 90% chance of the bit being “0” and the 10% chance of it being “1” (or any other value that Alice would assign to her chances of meeting with Bob).

If Alice evaluates the probability that Bob will be free that evening as 50%, the amount of information that Bob sends her is at a maximum, because Alice will be equally surprised by either value of the bit. Formally, in this case both possibilities have the same uncertainty, and the information that the bit carries is:

ln 2

That is, the maximum amount of information that a bit can carry is equal to one half the natural log of 2 (50% chance that Bob will come), plus one half of the natural log of 2 (50% chance Bob will not).* Since engineers prefer to calculate base 2 logarithms rather than use the “natural” base, e, the maximum amount of information that a bit carries is 1.

If the probabilities that Bob will or will not show up are not equal, the bit that he sends to Alice carries less information than one unit; if they are equal, the information carried in the bit is equal to one unit. If Alice knows that the chance of seeing Bob is just 10%, the amount of entropy according to Shannon will be:


That is, the bit carries information that is just 0.32/ln2 = 0.46 of the maximum value.

Shannon’s definition is superior to the one that is independent of the probability. If we send N bits, the maximum information is N bits. The reason for this is that Shannon’s expression acquires its maximum value of one for each and every bit when “0” and “1” have equal probabilities; but when their probabilities are not equal, Shannon’s information is less than one. And this was Shannon’s great insight: for an N-bit file, there is a chance that its content could be transferred via digital communication using less than N bits. In principle, then, it is possible to compress files without losing content, and of course, this is very important for the speed, cost, and ease of transferring information.


Chapter 4

The Entropy of Information

The Distribution of Digits – Benford’s Law

Page 141 from the book

Entropy graph

Figure 6: Benford’s law – the relative frequency of a digit in a file of random numbers in not uniform. The frequency of the digit “1” is 6.5 times greater than that of the digit “9”.

These results are counterintuitive, since we naturally expect a uniform distribution of digits. Yet the “1” digit appears 6.5 times more frequently than the “9” digit! The reason is that in equilibrium, each microstate has an equal probability, and thus every ball will have an equal probability of “drawing” any box. That is, in order to divide fairly P balls into N boxes, P drawings for N boxes must be performed. At each drawing, one box will “win” one ball. Obviously, the probability that in the end of our game one particular box will have won many balls is lower than the probability that it will have won only a few balls. Put simply, it is easier to win one ball than to win nine balls. Life is tough!

Does this imply that the second law of thermodynamics is also applicable to mathematics? Surprisingly, this very distribution –with the very same mathematical expression – was discovered empirically in 1881 by Canadian-American astronomer Simon Newcomb while he was looking at a booklet of logarithmic tables. (Before the era of the calculator, these tables were used for multiplication, division, finding the exponents of numbers, etc.) Newcomb noticed that the pages beginning with the digit “1” were much more tattered than the others. He concluded that if one randomly chose a digit from any string of numbers, the chance of finding the “1” digit is greater than finding any other digit. He examined the charts beginning with the digit “2” and so on, and came up with – as a conjecture – the very same formula that is derived in Appendix B-2.

In 1931, an employee of the General Electric Company, physicist and electrical engineer Frank Benford, demonstrated that Newcomb’s Law applied not only to logarithmic tables, but also to the atomic weights of thousands of chemical compounds, as well as to house numbers, lengths of rivers, and even the heat capacity of materials. Because Benford showed that this distribution is of a general nature, it is called “Benford's Law.” Since then, researchers have shown that this law also applies to prime numbers, company balance books, stock indices, and numerous other instances.

It should be noted, however, that if we were to allocate P balls among N boxes not by “drawing the lucky box” one ball at a time, but rather by using, for example, a top with ten faces bearing the digits “0” to “9”, and allowing each spin of the top to determine one digit in a non-biased way, the distribution of the digits among the boxes will be uniform. Why? It is clear that the digit symbol marked on any face of the top does not affect the probability of it falling on that particular side. This is why the various lottery games use such techniques for drawing numbers as give a uniform distribution, so that gamblers familiar with Benford's law will not have an advantage.

At first glance, Benford’s distribution seems counterintuitive, because we tend to regard digits as symbols rather than physical entities. The fact that most people are not aware of Benford's distribution is used by tax authorities in some countries to detect tax fraud by companies or individuals. If the distribution of digits in a balance sheet does not match Benford's law, the tax authorities suspect that the numbers have been fixed, and may launch a more thorough inspection.

To illustrate briefly the ideas presented more fully in Appendix B-2, let us examine a model of three balls and three boxes. The number of possible microstates (or configurations), according to Planck, is ten, as follows:


In equilibrium, each configuration has an equal probability. Thus we count the number of occurrences for each digit in all configurations. “1” occurs nine times, “2” occurs six times, and “3” only three. This is the distribution expected from the calculations in Appendix B-3 for the more general case (Planck-Benford), as applied to our case.

We therefore conclude that:

The distribution of digits in a decimal file that is compressed to Shannon’s limit obeys Benford's law;

Benford's law is the result of the second law of thermodynamics.

Mathematical operations have a physical significance.

In the next chapter we shall see that this distribution, when expanded from decimal coding to any other numerical coding (as we have just seen in the case of three balls and three boxes), can explain many phenomena commonly present in human existence and culture, such as social networks, distribution of wealth (Pareto's principle), or the results of surveys and polls.

Chapter 5

Entropy in Social Systems

The Distribution of Links in Social Networks – Zipf’s Law

Page 154 from the book

It seems that this distribution is identical for many phenomena common in human society. For example, in a social network, a person is analogous to a node, and the number of people he or she has direct access to is analogous to the links. In an airline network, a node is an airport and the links are the direct-fight destinations. As we shall see below, a node may also be an internet site, or even a given book. In these cases, the surfers or readers are the links. Many other distributions follow this same equation.

This distribution, when plotted on a log-log scale, gives a straight line with a slope of -1 as can be seen on the left hand side of Figure 7. Therefore, this is called a power law distribution.* This distribution is also known as Zipf’s Law, named after George Kingsley Zipf, an American linguist and philologist, who discovered this phenomenon in the frequency of words in texts. What he found was that in sufficiently large texts, the most common word appears twice as often as the second most common word, the second most common word appears twice as often as the fourth most common word, and so on. The Zipf distribution also gives a straight line of -1 slope on a logarithmic scale.

Entropy graph 2

Figure 7: The logarithm of the relative frequency of nodes having n links (horizontal axis) plotted against the logarithm of the number of links, n, (vertical axis) produces a straight line with a slope of -1 for any integer n. This distribution is called a power law distribution. To the right of the vertical axis, the number of links is smaller than the number of nodes (n < 1), leading to exponential decay and a curved line.

That is, the product of the relative frequency of a node with n links by n is constant – exactly what Zipf found empirically with regard to the relative frequency of words in texts.(Zipf’s law is analogue to the classical limit of Planck's equation, namely, E12

The distribution of word frequencies in texts is an example of a poll in which the authors vote for words according to their popularity. Poll distributions in general are discussed later in this chapter.

To sum up, we assumed that a spontaneously-evolves network (without a master plan), like the networks that are formed in nature, is structured according to the second law of thermodynamics and therefore tends to reach equilibrium – the state in which the distribution of links among the nodes provides the maximum number of possible configurations.

We calculated the distribution of links between nodes in networks when the number of microstates is at a maximum, and demonstrated that there are more nodes with fewer links than nodes with many. This distribution is also called the “long-tailed distribution” (see Figure 8). The distribution of links among the nodes is identical with the distribution of frequencies in radiation mode in a black body at the classical limit, as suggested by Rayleigh and Jeans and calculated by Planck (figure 7 above).

If indeed an ideal thermodynamic network exists, a theoretical implication is that any link added may actually reduce the efficiency of the network! This apparent paradox can be expressed even more paradoxically: can the addition of a road to a transportation system increase traffic congestion? Is it possible that adding a road to the system would result in a reduction of the average road speed? Or the other way round, could shutting a road improve the flow of traffic? In fact, this is exactly what was observed in a number of cities, including Stuttgart, Germany, and Seoul, South Korea, where blocking a main road actually led to an improvement in the traffic flow. Even more astonishingly, German mathematician Dietrich Braess predicted this effect in 1969. The explanation given to this phenomenon – the so-called Braess’s paradox – is that it represents a deviation from the Nash equilibrium in games theory, which is based on optimal strategies that individuals choose basing on what they know at the time about the strategies of other players. In the case in point, many drivers who choose, in view of their experience on the road, an optimal route to bring them to their destination, would change their choice upon receiving the information that a certain road was blocked. As a result, the efficiency of the road system may improve. However, a full discussion of the Nash equilibrium exceeds the scope of this book.

This second law explanation has two advantages: it is quantitative, and it is purely statistical – that is, not influenced by non-measurable factors.

Chapter 5

Entropy in Social Systems

The Distribution of Wealth – The Pareto Law

Page 161 from the book

Entropy cannot predict who will get rich and who will be poor. The desire of each and every one of us to increase our links and our number of coins in order to increase entropy is the source of the dynamics and activity that characterizes Realia.

One of the interesting features of the Planck-Benford distribution is its self-similarity – that is, its insensitivity to the order of magnitude of the numbers (this is a property of any function that produces a straight line on a log-log scale.). The Planck-Benford distribution is insensitive to either the actual number of persons or the actual amount of wealth. That is to say, the distribution of wealth between the deciles in countries in equilibrium will be identical, regardless the size or relative wealth of the country.

The distribution of wealth
Figure 9: Full Logarithmic graph of the distribution of wealth
gives the typical straight line of a power law function.

We can calculate the relative distribution of wealth from the Planck-Benford formula,

Planck-Benford formula

When relative wealth ranges from 1 (lowest) to 10, we get the respective percentage of the population with the relative wealth as follows:

Percentage of the population with the relative wealth

That is, 4% of the population will have 10 times the wealth of the 28.9% poorest in the population (rank 1). We see that the strongest group, ranking 6 through 10, comprise 25% of the population and has 72% of the wealth.

This phenomenon is known as Pareto’s law, named after Italian economist Vilfredo Pareto (1848-1923), a contemporary of Max Planck, who empirically suggested the approximate rule to be 80:20. Pareto's rule, as we could expect from any thermodynamic law, is also valid for other social phenomena. For example, it is common to think that 20% of the drivers are responsible for 80% of the traffic accidents; or that 20% of the customers are responsible for 80% of any business's revenues. Pareto himself suggested his empirical rule when he discovered that 80% of Italy’s property was held by 20% of the population. Obviously, his 80:20 ratio was not quite precise. As we saw, the thermodynamic ratio is more like 75:25, and it might be more fitting to call Pareto's rule the “75:25 law.



Misconceptions related to Entropy


heat is energy stored in a body


The energy that is stored in a body is called internal energy. Heat is the energy removed from a body or added to it. Heat is similar to work. There is no work inside a body: work can be applied on a body or can be applied by a body.

Why this error is significant to the understanding of entropy? Entropy is defined as heat divided by temperature. Since temperature is a property of a body, therefore entropy is of the same nature as heat and work.

This is very significant as we will see later.


Entropy is a measure of disorder


Entropy is the logarithm of the number of the microstates of a thermally isolated system.

The number of the microstates is the number of possible distinguishable ways in which a statistical system can be found.

Therefore, entropy is a measure of complexity and uncertainty of a system and NOT disorder.


Entropy production in an irreversible process is higher than in that of a similar process done in reversible way


This mistake comes from our intuition that the entropy is generated in an irreversible operation (which is true). Nevertheless, entropy is defined in equilibrium where Q/T has a maximal value. Namely,

S≡ (Q/T)reversible

and the second law states that:

S ≥ (Q/T)

Which means that heat divided by temperature is biggest in reversible operation and not

(Q/T)irreversible > (Q/T)reversible !

The Second Law

The second law increases disorder


This error comes from our intuition that it is much easier to break things than to fix them. If we put a cube of sugar in a glass of tea, the sugar will dissolve. It requires much work to obtain back sugar cube from a glass of sweeten tea. However, there are opposite examples, i.e. emulsion of oil and water will spontaneously be separated into two nice layers of oil and water.

Many believe that disorder increases spontaneously because it is a common belief. However if we look around us we see that “order” increases all the time. Lord Kelvin, a famous 19th century scientist, claimed that objects heavier than air (namely objects that their specific density is higher than that of air) cannot fly. He made this colossal mistake not because he did not know about Bernoulli law (that is forgivable…) but because he did not look at the birds in the sky! Like most people he saw birds flying but he never related their flight to physics.

Order is generated all around us, and spontaneous generation of order should be explained by physics.

Shannon entropy

Shannon entropy has nothing to do with physical entropy


Everything in our world is energy. Therefore, when a file is transferred from a transmitter to a receiver,  it is bound by the laws of thermodynamics.

If we use pulses for the energetic bits for the file transmission (EM pulse is a classic oscillator), the transferred energy from the hot transmitter to the cold receiver is a thermodynamic process in which Shannon entropy is the amount of the increase of Gibbs entropy of the process.

Negative Entropy

Shannon information is negative entropy


Shannon information IS entropy. The reason for this common error is the confusion between information a-la Shannon that is defined as the logarithm of the number of the possible different transmitted files and our intuition that information is ONE specific file. In our book a specific transmitted file (which is a microstate) is called content. Therefore Shannon information is the logarithm of the number of all possible contents.