Encoding

From ComputingForScientists

Jump to: navigation, search
Encoding

Contents

  1. Encoding
    1. Definitions
    2. Methods of Storing Bits
    3. Encoding Motivation
    4. Encoding Table
    5. 7-bit ASCII Encoding Table
    6. 7-bit ASCII Encoding
    7. Encoding Method
    8. Encoding Example
    9. Unique Combinations
  2. Problems
    1. Prefixes
    2. Bits and Bytes
    3. Bits and Bytes
    4. Bits and Bytes
    5. Bits and Bytes
    6. Bits and Bytes
    7. Bits and Bytes
    8. Encoding
    9. Encoding
    10. Encoding
    11. Encoding
    12. Encoding
    13. Encoding
    14. Encoding
    15. Encoding
    16. Decoding
    17. Decoding
    18. Unique Combinations
    19. Unique Combinations
    20. Unique Combinations
    21. Bit Range
    22. Bit Range
    23. Bit Range
    24. Bit Range
    25. Storage Estimate
    26. Bytes per Dollar Estimate
    27. Estimate
    28. Estimate
    29. Estimate
    30. Estimate
    31. Data Density
    32. Data Density
  3. Activities
    1. Prefixes
    2. Make your own encoding
    3. Discussion Question
    4. Discussion Question
    5. Other Encoding
    6. Experiment
    7. Estimates
    8. Encoding Chinese
  4. Resources

1. Encoding

1.1. Definitions

Bits are the individual zeros and ones that are stored by computers. A bit can have one of two states usually described as one of the following pairs: zero or one; on or off; high or low; or open or closed. A byte is a group of eight bits. A pattern of bits is a list of ones and zeros. Encoding is the translation of a character into a pattern of bits. More generally, encoding is the translation of a symbol or a sequence of characters into a pattern of bits.

The following table shows SI prefixes that are commonly used when describing a quantity of bits or bytes. In the same way that one may need to specify the base of a number such as 101 as being either base 2 or base 10, the system of units that are being used when using a prefix are needed to remove ambiguity. The reason is that in the SI unit system, 1 MB means 106 = 1,000,000 whereas 1 MB in computing could mean 220 = 1,048,576. In this book, we always use the SI prefixes. However one must always be aware of this ambiguity. (To remove the ambiguity, use the IEC prefixes, which are the same as the SI prefix except they have an "i" at the end. For example, 1 MiB = 220.)

Common prefixes used in computing
1018 exa E 1,000,000,000,000,000,000
1015 peta P 1,000,000,000,000,000
1012 terra T 1,000,000,000,000
109 giga G 1,000,000,000
106 mega M 1,000,000
103 kilo k 1,000

1.2. Methods of Storing Bits

There are many methods for storing a list of bits. Examples include placing holes and bumps on a piece of paper, charging a set of objects negative or positive, aligning magnets in the north or south direction, or creating a pit into a piece of plastic or metal.

The following is a primitive method of storing bits.

A simple memory stick with small round magnets attached to a piece of wood. If the person holding the large magnet sees the stick moves toward him, he records a 1.  If it moves away from him, he records a 0.
A simple memory stick with small round magnets attached to a piece of wood. If the person holding the large magnet sees the stick moves toward him, he records a 1. If it moves away from him, he records a 0.


A more advanced method of storing bits is to use a CD or DVD. A CD or a DVD has small pits in it. In locations where there is a pit, the sensor stops receiving a reflected signal and this is interpreted as 0. In locations where there is not a pit (a "land"), there is a reflected signal and this is interpreted as 1. What factors do you think control how many bits can be stored per unit area? What factors do you think control how quickly the bits can be read/written?

From img.tfd.com on May 18 2019 17:06:46.

1.3. Encoding Motivation

You are a "forensic computer scientist" and are given a DVD. You inspect it and find that it contains a list of 1s and 0s. How do you translate the list of bits into something useful?

From upload.wikimedia.org on May 19 2019 08:37:35.

=

001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111 001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111

1.4. Encoding Table

Encoding requires the use of an encoding table - a table that associates a bit pattern with a character (or sequence of characters). Such an encoding table was used previously in the Binary Representation of Numbers. As shown in Table 1, we associated bit patterns of length two with a binary integer.

Encoding Table 1

bit pattern characters
00 0
01 1
10 2
11 3

In this case the bit pattern 1110 is associated with the sequence of characters 32.

Bit patterns can be associated with more than just decimal integers. In Table 2, the four possible bit patterns of length two are associated with a color.

Encoding Table 2

bit pattern characters
00 red
01 green
10 blue
11 black

In this case the bit pattern 1110 is associated with the sequence of characters blackblue.

If Encoding Table 2 was used to write a message on a primitive memory stick and the pattern was 00110011, the decoded message associated with this bit pattern would be redblackredblack.

Encoding Table 3

bit pattern characters
00 zero
01 one
10 two
11 three

If you and I agree to use Encoding Table 3 and I hand you a primitive memory stick with the pattern 00110011, you would decode the bit pattern to mean zerothreezerothree.

1.5. 7-bit ASCII Encoding Table

There is a special encoding table called the 7-bit ASCII table; an excerpt is given below

bit pattern character
1100001 a

If a forensic computer scientist is analyzing a hard drive and knows that your computer encodes information using the 7-bit ASCII Table, and sees the bit pattern 1100001, he will know that you wrote the letter a.

Unlike Encoding Tables 1 and 2 where a long sequence of characters were associated with a short bit pattern, the 7-bit ASCII table associates a single character with a long bit pattern (7 bits). The reason is that with 7 bits there are enough unique bit patterns (27=128) to have a unique bit pattern associated with all of the common characters used in the English language.

Instead of listing character and bit pattern associations, encoding tables often list only character and decimal value associations. To determine the bit pattern, the decimal number must be converted to binary.

bit pattern character decimal value
1100001 a 97

If a forensic computer scientist tells you that he read "ASCII value 97" on a hard drive, he means that he read the bit pattern 1100001. This is sometimes more convenient than saying "I read the binary value 1100001" on a hard drive.

1.6. 7-bit ASCII Encoding

The following is part of the 7-bit ASCII decimal encoding table. (The table is also referred to as the 7-bit ASCII character set.) If I speak ASCII Decimal and say 72 73, you would know that I meant HI after looking at the table. If I speak 7-bit ASCII Binary and say 1001000 1001001, you would need to convert the binary numbers to their decimal representations of 72 73 before using the table.

7-bit ASCII Decimal Encoding Table

32 =   33 = ! 34 = " 35 = # 36 = $ 37 = % 38 = & 39 = '
40 = ( 41 = ) 42 = * 43 = + 44 = , 45 = - 46 = . 47 = /
48 = 0 49 = 1 50 = 2 51 = 3 52 = 4 53 = 5 54 = 6 55 = 7
56 = 8 57 = 9 58 = : 59 = ; 60 = < 61 = = 62 = > 63 = ?
64 = @ 65 = A 66 = B 67 = C 68 = D 69 = E 70 = F 71 = G
72 = H 73 = I 74 = J 75 = K 76 = L 77 = M 78 = N 79 = O
80 = P 81 = Q 82 = R 83 = S 84 = T 85 = U 86 = V 87 = W
88 = X 89 = Y 90 = Z 91 = [ 92 = \ 93 = ] 94 = ^ 95 = _
96 = ` 97 = a 98 = b 99 = c 100 = d 101 = e 102 = f 103 = g
104 = h 105 = i 106 = j 107 = k 108 = l 109 = m 110 = n 111 = o
112 = p 113 = q 114 = r 115 = s 116 = t 117 = u 118 = v 119 = w
120 = x 121 = y 122 = z 123 = } 124 = | 125 = { 126 = ~

1.7. Encoding Method

To encode a sequence of characters, a table-based algorithm may be used:

  1. In the first row, write each character.
  2. In the second row, write the ASCII decimal value of the character obtained by referring to the 7-bit ASCII decimal encoding table.
  3. In the third row, replace the decimal values in the second row with their binary representation. If the binary representation is shorter than seven bits, add zeros to the left side of the binary representation.

1.8. Encoding Example

Consider the character sequence Hello. The first row in the following table is a character. The second row is the 7-bit ASCII decimal value of the character above it. The third row is the binary representation of the decimal number above it.

   H        e       l       l       o	
   72      101     108     108     111
 1001000 1100101 1101100 1101100 1101111

A forensic computer scientist finds the following: 1001000 1100101 1101100 1101100 1101111 (spaces added for readability). This translates to Hello.

If the binary representation is shorter than seven bits, add zeros to the left side of the binary number. For example, if the character sequence is $$, the result is

    $     $
   36    36 
 0100100 0100100

1.9. Unique Combinations

The question of "given N bits, how many unique bit patterns can be created?" was covered previously in the Binary Representation of Numbers. We want to assign a character (or sequence of characters) to each bit pattern. With two bits, I could create four associations (00, 01, 10, 11). Given seven bits, 128 unique bit patterns can be created (the first is 0000000 and the last is 1111111); this is slightly more than the number of characters a through z, A through Z, and 0 through 9 (62), plus 31 other symbols including ,, ., ;, ' ,> ,> ,<, ", ?, ', },[,],|,\,-,=, !, @, #, /, %, ^, &, *, (, ), -, _, +, and =.

Given N bits, how many unique bit patterns can you create?

Npatterns = 2Nbits

To check this formula, write all possible patterns for Nbits = 2. The formula predicts Npatterns = 22 = 4, which is equal to the number of possible unique patterns of length two: 00, 01, 10, and 11.

2. Problems

2.1. Prefixes

  1. Convert 1 MB to kilobytes.
  2. Convert 15 MB to kilobytes.
  3. Convert 20 TB to megabytes.

2.2. Bits and Bytes

One byte equals how many bits?

A. 16
B. 32
C. 8
D. 4
E. None of the above

2.3. Bits and Bytes

How many bytes correspond to 16 bits?

A. 4 bytes
B. 3 bytes
C. 2 bytes
D. 1 byte
E. None of the above

2.4. Bits and Bytes

How many bits correspond to 16 bytes?

A. 96 bits
B. 128 bits
C. 256 bits
D. 512 bits
E. None of the above

2.5. Bits and Bytes

How many bits correspond to 47 bytes?

A. 47
B. 37
C. 376
D. 188
E. None of the above

2.6. Bits and Bytes

How many bytes are represented by 176 bits?

A. 176
B. 44
C. 22
D. 32
E. None of the above

2.7. Bits and Bytes

When we say that computers have "32 bit memory", we mean that each memory slot in that computer is composed of 32 bits. How many bytes of memory are represented by each 32 bit memory slot?

A. 4
B. 2
C. 8
D. 16
E. None of the above

2.8. Encoding

Using the following table, decode the message 00111011.

bit pattern symbol
00
01
10
11

2.9. Encoding

Write mom in binary using 7-bit ASCII binary encoding. (Spaces added to improve readability.)

A. 1101101 1101111 1101101
B. 1101100 0110011 0110011 1110100
C. 1101110 1101111 1110100 1101101 1100101
D. 1101000 1101001
E. 1100010 1111001 1100101
F. None of the above

2.10. Encoding

Write CDS130 in binary using 7-bit ASCII binary encoding.

2.11. Encoding

A forensic computer scientist finds the following list of binary values on a hard drive (spaces added to improve readability): 01001100 01001111 01001100. Assuming that the information is encoded as 7-bit ASCII, what does this series of numbers represent?

A. :-)
B. :-(
C. LOL
D. Woof
E. oNaMoNaPiA

2.12. Encoding

A forensic computer scientist finds the following list of binary values on a hard drive (spaces added to improve readability): 1110010 1101111 1101111 1100110. Assuming that the information is encoded as 7-bit ASCII, and that you do not have access to the ASCII table and have not memorized it, which of the following are possible? Identify the correct one and provide the correct explanation.

A. ABC
B. four
C. nine
D. zoom
E. Woof
F. roof

2.13. Encoding

A forensic computer scientist finds the following list of binary values on a hard drive (spaces added to improve readability): 1001110 1001111 1001110. Assuming that the information is encoded as 7-bit ASCII, and that you do not have access to the ASCII table and have not memorized it, which of the following are possible?

A. abc
B. NON
C. bob
D. zoom
E. Woof
F. mnm

2.14. Encoding

A forensic computer scientist finds the following list of binary values on a hard drive (spaces added to improve readability): 01001110 01011111 01011100. Assuming that the information is encoded as 7-bit ASCII, what three characters does this series of numbers represent?

2.15. Encoding

Suppose that you have a language that consists of only 12 letters and 10 numbers, or 22 characters. What is the minimum number of bits that would be required to uniquely associate each character with a binary number?

2.16. Decoding

Translate each of the following 7-bit binary patterns into their corresponding ASCII characters:

1000011
1000011
1010011
1001001
0101101
0110111
0110000
0110001
0100001

2.17. Decoding

Translate each of the following 7-bit binary patterns into their corresponding ASCII characters:

1000111
1010111
1001101

2.18. Unique Combinations

How many unique combinations of 1s and 0s are possible with 22 bits?

A. 4,194,304
B. 4,194,303
C. 2,097,152
D. 2,097,151
E. None of the above

2.19. Unique Combinations

How many unique sounds would you have heard if this video was extended so that the last sound was that for all 1s?

2.20. Unique Combinations

  1. How many unique combinations can be created with four zeros and ones?
  2. How many unique combinations can be created with six zeros and ones?
  3. How many unique combinations can be created with eight zeros and ones?
  4. How many unique combinations can be created with thirty-two zeros and ones?

2.21. Bit Range

How many 1s and 0s are required to represent any arbitrary 16 bit binary number?

A. 17
B. 15
C. 16
D. 8
E. None of the above

2.22. Bit Range

If an arbitrary 16 bit binary number is multiplied by 2, what is the maximum number of bits required to write that product as a binary number?

A. 18 bits
B. 17 bits
C. 16 bits
D. 15 bits
E. None of the above

2.23. Bit Range

How many bits are required to write the binary representation of the decimal number 512?

A. 8
B. 9
C. 10
D. 11
E. None of the above

2.24. Bit Range

Using 16 bits, the numbers from 0 through LARGE can be represented. What is the decimal value of LARGE?

A. 32,767
B. 32,768
C. 65,536
D. 65,535
E. None of the above

2.25. Storage Estimate

Approximately how many sheets of notebook paper would you need to store the same number of characters that can be stored on a 4.7 GB DVD? Assume that (1) the characters on the DVD are only from the 7-bit ASCII character set, (2) that you could write 5,000 characters on one side of a sheet of notebook paper, and (3) you use both sides of the notebook paper.

A. 470,000
B. 5,000
C. 470
D. 37
E. 470,000,000

2.26. Bytes per Dollar Estimate

If a 300 GB hard drive costs $100 and a 4.7 GB DVD disk costs $2, which is a better deal in terms of bytes per dollar? Show your work.

2.27. Estimate

The 1984 science fiction novel Neuromancer by William Gibson contains 271 pages of text. Each page contains, on average, approximately 400 words. Each word, is on average, five ASCII characters long. Knowing that each ASCII character requires 7 bits of computer memory storage, how many bytes of computer memory storage are required to store all the words from Gibson's Neuromancer novel?

A. 271,000 bytes
B. 34,688,000 bytes
C. 4,336,000 bytes
D. 542,000 bytes

Personal computers in 2010 can come equipped with hard disk drives having 1 terabyte of storage capacity (a terabyte is one trillion bytes, or 1,000,000,000,000 bytes). Approximately how many copies of William Gibson's Neuromancer novel could you store on your 1 terabyte hard disk drive, assuming the entire disk is available for storage?

A. About 1,845,000 copies
B. About 230,627 copies
C. About 28,828 copies
D. About 3,690,000 copies

To find estimates for these problems and the following ones, read the following information on estimates.

2.28. Estimate

A page from a book contains 500 words, and each word contains on average 4 characters. Considering only the characters on the page, how many bits of information does the page contain? (Assume that the characters are encoded using bit patterns of length 7.)

A. 500 bits
B. 2,000 bits
C. 8,000 bits
D. 6,000 bits
E. None of the above

2.29. Estimate

If a single ASCII character (from the extended set) can be represented by 7 bits, and we have a 500 gigabyte hard drive available for storage, about how many ASCII characters can be stored on this hard drive? (NOTE: One gigabyte is equal to about one billion bytes)

A. about 500 million ASCII characters
B. about 8 billion ASCII characters
C. about 64 billion ASCII characters
D. about 500 billion ASCII characters
E. None of the above

2.30. Estimate

A U.S. one dollar bill measures about 6 centimeters wide by 11 centimeters long. If there are 330 ASCII characters printed on one side of it, then what is the approximate data density (in bytes per cm2) of one side of a U.S. one dollar bill? (Assume each ASCII character is encoded with a 7 bit pattern.)

A. about 4.4 bytes per square centimeter
B. about 5.5 bytes per square centimeter
C. about 6.6 bytes per square centimeter
D. about 7.7 bytes per square centimeter
E. None of the above

2.31. Data Density

On a piece of 8.5 inch x 11 inch piece of paper, you can write about 500 letters and numbers ("characters"). Suppose that each character is encoded as an 8-bit binary number.

  1. What is the data density of the information stored on the piece of paper in bytes per square inch?
  2. If this information was stored on a hard drive, how many ones and zeros would be needed?

2.32. Data Density

How many 7-bit ASCII characters could you store on a hard drive? (Choose a reasonable value for the storage capacity for a hard drive on a laptop or desktop computer purchased in the last five years.)

3. Activities

3.1. Prefixes

Why do you think a number like 220 is used in computing to describe a MB instead of 106?

Guess the value of x in the statement: 1 TiB = 2x.

3.2. Make your own encoding

An encoding table is a table that associates a bit pattern with a character or object. For example, and ASCII table associates the bit pattern1001000 with the character H.

Create an encoding table with bit patterns each with three bits. Write a binary encoded message. Hand the message to your partner along with the decoding table and see if they can determine what your binary encoded message means.

3.3. Discussion Question

How would a forensic computer scientist figure out that the yellow numbers correspond to an Excel Spreadsheet document?

From upload.wikimedia.org on May 18 2019 17:06:46.

=

001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111 001010010100101001010101 010101010101010100101000 010111101010101010101111 111110111111101111111111

3.4. Discussion Question

From www.wiilovemario.com on May 19 2019 08:37:36.

Explain in basic terms the meaning of the following: "The old monitor only supports 8-bit color, my monitor supports 24-bit color". (Related link: 8-bit art:[1])

3.5. Other Encoding

The 7-bit ASCII character set only has 128 possible characters. How would you encode the Greek letter beta in binary on a computer? That is, suppose you were asked to come up with a scheme for encoding the Greek letter beta in binary so that anyone who saw the particular list of zeros and ones would immediately know you meant the Greek letter beta.









3.6. Experiment

ASCII is one of many ways to encode numbers and characters as binary patterns.

Question: Compare this search: CCCP with this search: CCCP. Why are the results different? The text in the search box looks the same!

Experiment:

  • Click the first link, copy the text CCCP from the search box, paste it into Notepad (or TextEdit on a Mac), and save. What is the size of the saved file?
  • Do the same for the second link. What happens when you choose ANSI as the encoding versus UTF-8 when saving? What is the size of the file when you chose ANSI? UTF-8? When you choose ANSI, exit and re-open the file you saved. What do you see?

3.7. Estimates

Note: In the problems that request an estimate, your answer can be in a fairly wide range of values. Your explanation is more important than your actual number. For example, if I asked you to compare the area of a DVD to the area of a sheet of paper, a correct answer could be "I can lay four DVDs on a sheet of paper and it covers up most of the paper. Therefore the area of a DVD is about four times that of a sheet of paper." Another correct answer could be to compute the area of the sheet of paper and the area of the DVD using the formulas for the area of a rectangle and the area of a circle and then take the ratio of these areas. The first answer is less accurate, but the approach is more in the spirit of an estimate.

  1. Estimate the number of characters (the numbers 0 through 9, letters a-z upper and lowercase) that you could write on a piece of paper by hand using a pen or pencil. Explain how you arrived at your estimate.
  2. Estimate the number of bits of information that you could store on a single sheet of notebook paper using only a pen. Explain how you arrived at your estimate.
  3. Estimate the number of bytes of information that you could store on a single sheet of notebook paper using only a pen. Explain how you arrived at your estimate.
  4. How many sheets of notebook paper would you need to store the same number of characters that can be stored on a DVD? (Assume that the characters on the DVD are only from the 8-bit ASCII character set.)
  5. Estimate the density of data stored on your sheet of notebook paper (in bytes per square centimeter).
  6. Estimate the density of data stored on a DVD (in bytes per square centimeter).

3.8. Encoding Chinese

7 bits are required to represent the most commonly used written English language characters. The 7-bit ASCII character set relates 128 binary numbers to 128 commonly used characters in the English language.

For example, the bit pattern 1100001 corresponds to the character a and 1000001 corresponds to the character A.

The Chinese character set is composed of unique characters that taken together comprise the written Chinese language. A college-educated Chinese adult is fluent with 6,000 to 7,000 unique Chinese characters.

How many bits are required to represent the entire set of written Chinese characters for a college-educated Chinese adult?

4. Resources

  • An advanced discussion of encoding [5].
  • The binary alphabet is superior to the "best" alphabet (Korean) [6].
  • Joke: Two bytes walk into a bar. The bartender says "Can I get you anything?" The bytes say "sure, make us a double." (A sequence of sixteen bits is called a double.)
Personal tools