Security Concepts and Terminology
Definition of Information Systems Security (INFOSEC):
From the NSA (U.S. National Security Agency): Protection of information systems against unauthorized access to or modification of information, whether in storage, processing or transit, and against the denial of service to authorized users, including those measures necessary to detect, document, and counter such threats.
[Most of this material is from the excellent “Introduction to Computer Security” by Matt Bishop, (C)2005 Addison-Wesley]
· Confidentiality Keeping information, resources secret from those that shouldn’t know about them. This related to the concepts of authentication (who is requesting resources?) and authorization (what resources is a given person/program allowed to access?).
·
Integrity Keeping information and resources trustworthy
(preventing improper or unauthorized modification, deletion, addition, or
replacement, called corruption). This relates to the concepts of credibility,
authentication, and authorization. It is also important to keep
data externally consistent (data in system accurately reflects reality)
and internally consistent (ledger books balance, totals match the sums,
etc.) A common risk is with standard databases losing consistency, which is
addressed by normalization.
Integrity mechanisms can prevent corruption or detect
corruption.
· Availability Keeping information and resources accessible to authorized persons, whenever they are otherwise allowed access. This related to reliability.
These last three concepts are often referred to as the CIA triad.
· Authentication Representing identity: Users, groups, roles, certificates.
When accessing some protected service or resource of some system you must authenticate yourself (prove who you are) before the system can decide if you should be allowed access.
Often this is done by providing information only you and the system know (i.e. a password, a.k.a. a shared secret).
Another technique is to rely on another organization for authentication. You present proof that the organization has authenticated you, by showing a valid passport, driver’s license, major credit-card, etc. Proving you have a valid email address or phone number falls into this category.
Using a phone-based system (i.e. providing a phone number that can be checked) can be used to call-back the user trying to access a resource to confirm it really is them. Phone calls are made to the (presumably real) user when someone attempts to log in as him or her. The user can then punch in a code on the phone. Phones an also serve as fraud alerts.
Another possibility is proving possession of something only you (should) have such as a smart card swipe card, credit-card (using CVS number), dongle, or RFID. Increasingly biometric identifiers such as a fingerprint, eye print, handprint, voice print, or even lip prints may be used.
Many of the most convenient authentication schemes are weak and easily circumvented (spoofed). When using such weak methods users typically need to use two different types of authentication. This is known as a two-factor authentication system. For example entering something that only they would know and use something that only they would physically have on their bodies. This usually means entering a login and PIN or password, along with the use of a USB authentication key or smart card, for example.
Some systems don’t care who you are as long as you are a person. Services such as web-based guest-books or the WHOIS database are available to anyone, but not to automated programs (or robots). In these systems you must only prove you are human. Typically this is done by having the use do a task easy for most people but difficult for machines.
A CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart, if you can believe that) is an obscured graphic of some text shown on a web form. A human can usually read and then enter the text but a program has a difficult time reading the text. (Captchas have become an incentive to improve OCR systems!) Other tasks include picture recognition (e.g., “click on the one picture above of a moose”) or audio CAPTCHAs. SANS Institute (Chief Research Officer Johannes Ullrich, Sys Admin, V16 N4 (April 2007), “Minimizing Content Form and Forum Spam” pp.30–ff) reports that CAPTCHAs are effective at reducing spam, but also reduce legitimate traffic by about 20%.
· Authorization Policy stating who is allowed to do which actions on what resources: ACLs, capabilities.
· Accountability Keeping track of who does what and when. Log files are an example.
These last three concepts are often referred to as the AAA triad. (Networking router and switch vendors rarely worry about confidentiality or integrity — and they should!)
· Non-repudiation Related to accounting, this means you can’t convincingly deny some action such as sending a message, receiving a message, entering/changing/deleting data, etc. (Example: ordering an expensive item on-line, then refusing to pay claiming you never ordered the item.)
· Security Policies
A security policy is a statement of what is and what is not allowed. This is also called a specification. Policies can be stated by mathematics and tables that partition the system states into allowable (secure) and not allowed (insecure) states. More commonly, vague descriptions are used, which can lead to ambiguity (some states are neither allowed nor disallowed, or both).
When two or more entities communicate or cooperate, the combined entity has a security policy based on the individual policies. If these policies contain inconsistencies the parties involved must resolve them. (Example: proprietary document provided to a university.)
·
Security Mechanisms
A security mechanism (or security measure) is any
method, tool, or procedure used to enforce a security policy, or to reduce the
impact (and frequency) of threats. Not all mechanisms are tangible.
Security mechanisms can be used to prevent an attack, to detect an attack has occurred (or is occurring), or to recover from an attack. Usually you will used multiple mechanisms to meet all three of these goals.
Qu: what is the goal of a password mechanism? (Ans: prevention)
Prevention mechanisms keep the system functioning normally and available during any attack. Such prevention mechanisms include over-provisioning and fault-tolerance, and can be used in safety-critical systems where the high cost is deemed acceptable.
Detection mechanisms are also used to monitor the effectiveness of other mechanisms.
Recovery mechanisms are deployed when detection of an attack has occurred. The first part of recovery is to repair the damage done by the attack and to restore normal service levels. Recovery is not complete until an assessment of the incident has been made and preventative measures are put into place. This may include a change to policy, design, or configuration of services, the addition of extra mechanisms, or legal action.
· False Positives And False Negatives occur when checking incidents against a security policy. A false positive occurs when the security scanner reports something as suspicious when in fact nothing is wrong or insecure. A false negative is when the scanner fails to report a problem when in fact one exists.
Neither of these can be completely eliminated in any real scanner, and there is something of a trade-off between them. Most scanners opt for more false positives to generate fewer false negatives. However this can lead to the BWCW (boy who cried “Wolf!”) syndrome, leading administrators to ignore the scanners reports. Add-on tools (usually Perl scripts) can attempt to filter log files and reports to show the most important messages, to summarize, to spot trends (dictionary or DOS attacks in progress), etc. (Example: logwatch.)
· Dual Controls These are mechanisms designed to prevent any single individual from violating a security policy. Examples include dual-entry bookkeeping, safe deposit boxes at banks, and military missile-launch protocols (require two persons to turn keys more than 10 feet apart at the same time).
· The meaning of trust and Assurance Trust is a measure of trustworthiness of a system, relying on sufficient credible evidence that a system will meet a set of given requirements. For example, a system may be considered physically secure if the system is locked up and only authorized people have keys. However this assumes that those with access to the system and know how to pick locks (or copy keys) can be trusted not to do so unless authorized. Bank managers are authorize to move funds between accounts (up to some limit), but must be trusted not to move funds into their private accounts. You pay for some goods on amazon.com, but must trust them to actually ship them to you.
Another example is that you must trust some installed server not to allow violations of system security policy. If the server has exploits (e.g., bugs, back-doors, viruses) that cause the server to fail and break our security, our trust was misplaced.
Trust relationships exists whenever there is some risk and uncertainty. One must weigh the risks of granting trust against the expected gains.
Assurance is the measure of confidence that a system meets its security requirements, based upon specific evidence gathered through various assurance techniques. While trust can’t be precisely measured or even defined, circumstantial and anecdotal evidence (assurances) can be accumulated that can be use to determine how much to trust a system (or determine your insurance rates!)
Measures that can be taken to provide assurance in a trusted system include security cameras in high security areas to make sure no one is using lock-picks, extensive background checks before hiring a bank manager, following standard best practices, obtaining certifications, using vulnerability scanners, using referrals and recommendations for software, and tamper resistance. For example don’t trust a bank manager if they have stolen before, or buy goods on-line from a vendor with a poor reputation, or install software that has a history of problems (e.g., MS IIS) or is from an unknown source (e.g., acme email server).
Consider installing a strict SELinux MAC system. Now you don’t need to trust software as much, because the (trusted) MAC system provides assurance that broken or malicious software won’t compromise the security policy.
· Design The design of a system is the process of determining a set of components (mechanisms including procedures) and their interactions and configurations, that will be used to implement the security policy (specification). The design that results can be analyzed to determine if it will satisfy the specification. While it may be difficult or impossible to fully answer this question, assurance can be generated that the design will satisfy the security policy.
Once this is done the design can be implemented. The implementation must satisfy the design, much like the design must satisfy the specification. Proving an implementation satisfies some design is difficult and involves proofs of correctness.
· Identity is the representation of some unique entity or principal: a person, a website, an authorized meter-reader). Note a given entity may have many identities. (How many login names do you have?)
· Privacy is a complex notion, and not always a person’s right — it depends on local laws and customs. Each person or entity (corporation or government agency) has their own unique point of view on what information is (or should be) under that person’s or entity’s control.
Medical records are a case in point: Doctors don’t like to release records to a patient, in case there is later any dispute. By U.S. law some records must be released if a patient wants them. In addition medical records can be used for research purposes and medical assurance purposes. In these cases, identifying information is supposed to be removed first, in a process known as anonymizing or sanitizing (or blinding) the data. But this is extremely difficult to do right, and often impossible when only a few records are involved. (The British Medical Association has the most comprehensive laws and rules on patient privacy.)
Safe Harbor Agreement
An agreement between the United States and the European Union (EU) regarding the transfer of personally identifiable information from the EU to the United States, which is consistent with Fair Information Practices. Companies that register for Safe Harbor with the U.S. Department of Commerce and abide by the agreement are deemed by the EU to provide adequate data protection for personally identifiable information transferred from the EU to the United States.
· Risk analysis (Sometimes referred to as Cost-Benefit analysis or CBA, or trade-off analysis or TOA) This means to determine whether some asset should be protected, and if so to what degree. You must balance the cost of an incident should it occur, the likelihood that it will occur, and the cost of preventive measures. You must also consider mitigation measures, which are mechanisms and policies designed to lower the cost and/or likelihood of some type of security incidents.
Stephen Murdoch has published a dissertation that observing the behavior of users of “covert channels”, especially anonymity systems, may be enough to discover their intentions or even their identity. The approach is similar to how card players in a game of bridge are able to determine cards by observing the behavior of other players. He adds that collusion between two partners can make the process easier. The strategy can be applied to TCP/IP environments and the simple traffic analysis of an “anonymous” network such as Tor. His findings are similar to those presented at the Black Hat conference August ’07. Murdoch says that anonymizing technologies might offer protection from casual scans or monitoring but they are unlikely to withstand the scrutiny of dedicated attackers, researchers, or law enforcement officials. He says “[There is] a wealth of practical experience in covert channel discovery that can be applied to find and exploit weaknesses in real-world anonymity systems”.
Risks change with time, and thus a new risk assessment should be made periodically.
· Audit trails and Logging Audit trails provide Accountability, which prevents false repudiation (“I didn’t do it!”).
· Intrusion
Detection Host IDSes (e.g. file alteration/integrity monitor, or
FAM/FIM) and network IDSs seek to detect when an intruder has been
attempting (or successful) at compromising integrity.
Security Issues of Backups, Installs,
Updates, and Patches
Treat backup media as very sensitive security. Ideally you need to destroy completely old media, never just throw away (dumpster-diving).
Never be the first to install new stuff. (Policy should state how long to wait before installing new packages.)
Always get packages from trusted sources. If you really feel a need to install new software not available from a trusted source, be very sure you really need to software! Do not follow any links you find in public forums or in spam. Instead lookup the software yourself with Google and go to the author’s website. Check the DNS info for that site and make sure it wasn’t added yesterday!
Verify the download no matter where it came from (using gpg or if not available at least md5sum). If no verification is available, then try to download another copy of the package from a completely different location (preferably from a different country) and compare the packages with cmp.
Installing from source code can be safer than installing a binary package, if you have the skills and time to carefully examine the code (including any auxiliary code such as makefiles and configure scripts). If you don’t have required programming skills but the software is key to your organization, consider hiring a consultant.
Never compile code as root! After building, try testing the code as a non-root user before becoming root and running make install. Make use of the “-n” option to make to see what the make install command tries to do.
When running new/untrusted software, use tools such as strace and ltrace (which traces both sys-calls and library calls) and, additionally on Solaris and BSD, dtrace, truss and sotruss:
ltrace -fCS -n 2 -a 70 -- date 2>&1 >/dev/null |less
Look for odd system calls (e.g., open a network socket when running the date program would be suspicious, as are exec calls to /bin/login, ...), odd libraries (.so files in non-standard locations or non-standard .so files anywhere), and odd arguments to system calls (very long strings, strange pathnames, etc.)
Note these trace commands will show in plain text everything passed as arguments, including usernames and passwords! Tracing tools are invaluable learning and trouble-shooting tools!
To contain untrusted code you should use virtual hosts VServer, VMware, ... if available.
If not then in addition to using the security features you do have (chroot, jail, zones) you can use sandbox software, a.k.a. system-call censor software, that intercepts each system call, inspects it, and then allows, denies, or asks the user (useful for trusted but vulnerable software). You can define a policy file for each program to run in the sandbox, defining the minimum access required. Examples include Janus (user-level censor) and Systrace (kernel-based censor).
Updates generally replace one version of a package with a newer one. All files are replaced with an update, so you must treat an update the same way as new software, that is untrusted.
Patches come in two types: binary and source. A source patch modifies the code and then requires a re-build, so the same concepts apply as above. Binary patches edit the machine code files directly. Never use binary patches except from your system vendor! All patches (of either type) are installed according to your patch management schedule.
Boot Security
On *nix systems the boot process is not secure by default. If an attacker can boot to single-user mode on most systems they get a root shell with no password required. By inserting a bootable disk (floppy, CD-ROM, DVD, flash) the attacker can bypass all your security efforts. There are a few steps to take to make the system boot procedure more secure:
A grub password is important. This will ensure that the user can’t alter the kernel parameters (say to boot into single user mode).
Set the system to boot from the HD first. This prevents a bypass by inserting a bootable media.
Password protect the BIOS. This can still be cleared by opening the case and setting a jumper, or in some cases by removing a battery. But this is not easy, especially if you...
Use a secure (locking) case, and keep servers in physically secure and monitored locations.
Security Threats
A threat is a potential violation of security. (U.S. Military defines a threat as a potential intruder.) This can also be called a vulnerability. Note that the violation need not occur, but just be possible, to be considered a threat. You must guard against threats. The common counters to threats are the three security services (CIA) defined above.
An action that could cause a threat to occur is called an attack. Those who perform such actions, or cause them to be preformed, are called attackers or sometimes intruders or crackers.
Threats come in all shapes and guises, but can be classified in a number of ways. One such way divides threats into four categories:
· disclosure (unauthorized access to information). Snooping/sniffing (passive wiretapping), stealing passwords or data, are examples.
· deception (acceptance of false data). Social engineering is an example.
· disruption (prevention of correct operation of a system). DOS, web site defacement.
· usurpation (unauthorized control of some part of a system). Having your mail system used for spam by a virus.
Some threats fall into more than one of these categories. For example, modification or an unauthorized change of information is a threat where the goal may be deception, disruption, or usurpation. (This is sometime referred to as active wiretapping.)
An example of this type of threat is the man-in-the-middle attack, where Anne thinks she is communicating with Betty, but Cindy is intercepting Anne’s messages and forwarding (possibly altered) messages to Betty. (And in the reverse direction too.)
Masquerading or spoofing is when one entity (person, website, ...) impersonates another. This type of attack falls into the deception and usurpation categories.
Delegation is when one person (or server) authorizes another to act on their behalf. Note the distinction between delegation and masquerading: in the first, no one pretends to be someone they are not, and if Al delegates to Bill, Bill will not pretend to be Al when communicating with Charlie. Bill will inform Charlie he is Bill, acting for Al, and if Charlie queries Al, he will confirm.
Repudiation of origin is a false denial that an entity (person or server) created or sent something. An example is when you order some product, it is shipped to you, and you refuse to pay, claiming you never ordered the product. This is related to
Denial of receipt which is a false denial that you received information, message, or some product or service.
Delay and Denial of Service are threats that inhibits the legitimate use of some service or resource. The difference is in the time of the outage; a delay is for a short while, or applies to a single message. This form of attack usually is used when one has compromised a secondary server. By delaying a response from the primary server (whose security the attacker may not be able to compromise), many clients will retry using a secondary server (which the attacker may already be in control of).
A buffer overflow attack is an attempt to exploit poorly written software, to make it do things it wasn’t intended to. An example is a server program, run as root. If you can pass enough bad data to it, you may be able to trick the program to, say, start a root telnet session for you. (See p. 139 for details.) (Demo buf-overflow on YborStudent.)
A typical system has so many services running, written by so many different people, and each configured separately, it is perhaps inevitable that vulnerabilities will exists on your systems. Many systems have well known sets of these, and attackers have created toolkits designed to crack a system and grant the attacker root privileges. These have evolved over time into point and click programs to crack systems, and are called rootkits. These are popular with script kiddies, novice attackers who don’t know about security except how to run some rootkit. (Show HackAttack.txt.)
Reverse engineering is a form of attack where the attacker analyses the executable, looking for ways to either exploit it for system attacks, to steal the intellectual property of the author, or to bypass restrictions on the use of the software. Naturally this is easier if the source code is available.
Network Attack Types
Here is a brief list of the types of attacks on communication networks [Stallings Cryptography and Network Security 4th ed., p. 319]:
Disclosure When an entity (person or process) not possessing the appropriate key can access the message contents.
Traffic analysis Learning the pattern of communications between parties not including yourself. When, how often, the duration, and the size of data passed during communication can be valuable data to an opponent. Another possibility is message injection (“AF is Midway” WWII story).
Masquerade Messages with a fake source, including fake acknowledgements (or negative acknowledgements, the ACK of non-receipt).
Modification Changes to the content, including additions, changes, and deletions.
Re-sequencing Most messages today are sent as a sequence of packets. Sequence modification, or re-sequencing, means the insertion, removal, or reordering of these packets.
Time modification The delay or replay of messages or parts of a message.
Repudiation Source repudiation is the denial of transmission of a message by the alleged source. Destination repudiation is the denial of receipt of a message of the alleged recipient.
Other Vulnerabilities
There are many different types of vulnerabilities in any complex system. A few that are often overlooked include:
Too much data can overwhelm a human’s or program’s ability to process it. For example, a current assessment indicates there were warnings available prior to the 9/11/2001 attack but those were lost in a “flood” of data.
Math errors are vulnerabilities in the hardware and common software used for INFOSEC. Such errors have been found in older systems, such as the division bug in Intel’s Pentium microprocessor in discovered in 1994 and a multiplication bug found in Microsoft’s Excel. A math error would allow an attacker to break a public key cryptography system by discovering the error in a widely used chip and sending a “poisoned” encrypted message to the computer, allowing the attacker to compute the value of the secret key used by the targeted system.
Cryptography Overview, Digests / Hashes, MACs, PKI, Steganography
An access control mechanism is used to support the goal of confidentiality. One possible such mechanism is cryptography. Some of the many forms of cryptography include: one time pad (OTP), symmetrical (crypt, DES), and public key. Discuss password versus passphrase (and key).
Related concepts include digital signatures, key exchange (Diffie-Hellman), certificates, PKI. Alternatives to PKI: escrow, web of trust. (See public key web resources.)
Describe OTP: A string of truly random numbers are added to message, never used twice. (Ex: CAB-> 312, OTP=465, result=777.) Problem is distributing the OPT in the first place!
Traditional crypto was all too easy to break (show cryptoquote puzzle copycreate.com). Cryptanalysis studies the encrypted text and looks for patters. For example the most common letter is “e” in English. A word such as “WXY’Z” almost certainly ends in a ‘t’ or ‘s’. Early ciphers such as a Caesar cipher suffered from this problem.
It was realized that a weak but secret method offers security against only the most casual attackers. (Early word processors used to XOR the document with the password repeatedly: passwordpasswordpassword... XOR plaintext. By XORing the encrypted document with itself, the password drops out!) This lesson is often phrased as security through obscurity is no security at all. Unfortunately many companies fail to heed this advice and release weak security products.
Later, stronger crypto was used as the math was better understood. These methods first compress the message to remove redundancy, then encrypt the result. Keys are very large numbers, hundreds of digits long. This key is usually stored in a file so the user can get to it more easily.
Long keys are more secure than short ones since an attacker must guess (on average) many more keys, too many to be practical. However key guessing isn’t the only way to attack encrypted communications, and long keys may provide a false sense of security.
Symmetric Key Encryption
Modern crypto techniques use symmetric or private key encryption. With such a system, the details of the method used to encrypt the message do not need to remain private. Only a password (or key) does. With such a method a company can and should feel free to have its security methods published, so they can be reviewed for weaknesses. An example is the U.S. Gov. DES, or digital Encryption Standard, or AES (Advanced Encryption Standard). (There are lots of these!)
This method uses the same key to encrypt and to decrypt a message, using 2 related methods (usually one is the reverse of the other). Even knowing the methods an attacker can’t recover the plaintext from the cyphertext. While some methods provide stronger or weaker encryption than others, generally the strength of the method comes from the long key length. An attacker must guess keys one at a time to attempt to recover the plaintext. Initially DES used 56 bit (7 byte) keys. Today’s keys are often 2k or 4k bits long.
A problem with symmetric key systems is that of key distribution. How do you get the key to all who need it, securely? Major concerns such as governments can send keys via couriers, credit-card companies and banks send PINs (personal identification numbers) via U.S. mail. But key distribution is still a problem for B2B (Business to Business) and C2B (Consumer to Business) communications.
Public key (Asymmetric) Encryption
A breakthrough was made in the 1970s. It was realized that an encryption method could have two keys, mathematically related in such a way that you could use one to encrypt a message and the other to decrypt it. Even knowing the method and one of the keys, an attacker cannot determine the other key!
What this means is that one of the pair of keys can be made public (say posted on a website). To send a secret message, anyone can use that key. However even if an attacker then intercepts the encrypted message they can’t read it. It takes the other key (the private one that I keep secret) to decrypt the message.
Plaintext + key1 = cyphertext, cyphertext + key2 = plaintext
RSA Three mathematicians named Rivest, Shamir, and Adleman worked out a practical method for this, formed a company, patented the method, and sold licenses until the patent expired. Their method is called RSA. This has been refined over the years and is now an Internet standard, used for HTTPS, SSL, SSH, etc.
A popular example of RSA is PGP (Pretty Good Privacy), which has an interesting history but for now all you need to know about it is that GPG or GnuPG (GNU Privacy Guard) is the modern version of this and is compatible.
Public key distribution (mostly) solves the key distribution problem of symmetric key encryption. For two parties to communicate securely, each can send the other (using insecure means) their public key. Public Key systems are commonly used for HTTPS, SSL/TLS, SSH, Wireless communications, and other popular protocols.
Digital Signatures A neat application of public key systems is for digital signatures. A digital signature is an encryption of a message with the private key.
This means anyone with the matching public key can decrypt the message, but since nobody but the owner has access to the private key, a successful decryption proves the messages was sent by the owner of that pair of keys. For security, a digitally signed message is usually further encrypted with the recipient’s public key, so only they can read it. There are other ways to provide digital signatures, for instance DSS (Digital Signature Standard).
Digital signatures are an excellent way to prevent all sorts of attacks. By putting a timestamp in the message, an old signed message can’t be sent later as “new”. This prevents a replay attack.
Since only the sender has access to the private key, the sender can’t later claim they never sent the message. This is called non-repudiation. In some states, a digitally signed message or file has the legal status of one with a holographic signature.
On June 30, 2000, Congress enacted the Electronic Signatures in Global and National Commerce Act(1) ("ESIGN" or "the Act"), to facilitate the use of electronic records and signatures in interstate and foreign commerce by ensuring the validity and legal effect of contracts entered into electronically.
Because of the importance of keeping the private key secure, it is often stored in a file that is itself encrypted with DES or some other symmetric key encryption method. To use the private key, the owner must enter in a password or passphrase (on the local system, not across a network). This way, even a virus-infected host can’t send private keys to an attacker. (However a human must enter the key for each use. Even if a program such as a mail server only read the key once (and kept a copy in RAM), you couldn’t have an unattended reboot!)
Elliptic-curve cryptography (ECC) is the next big thing in public key cryptographic technology. ECC can offer equivalent security to RSA with substantially smaller key sizes. This makes ECC software stronger and faster, and more efficient in use of memory and bandwidth, than RSA. The NSA has an initiative called Suite B that has replaced RSA with ECC as the public-key cryptography technology. Sun, OpenSSL, Firefox, MS, Red Hat and others have endorsed this newer method, making it the likely standard for the Internet 2.0.
Key Exchange
Public key methods are great, but very slow! Too slow and CPU intensive to be used in the obvious way, to encrypt long messages and TCP/IP sessions, or to digitally sign files.
For digital signatures where the goal isn’t privacy of the message anyway, the usually approach is to compute a message digest which is a cryptographically secure form of checksum (MD5 and SHA1 are popular examples). This digest is a comparatively small number and only that is digitally signed. The encrypted digest is appended to the message. At the receiving end, the same digest is computed, the encrypted digest is decrypted and compared. This ensures message integrity.
To exchange files or messages with confidentially, symmetric key encryption is much faster. The common solution is to use public key encryption to securely exchange a random huge number between two parties. This value is used as the shared key (called the session key) for a symmetric encryption method. Unfortunately the obvious way to exchange these keys and agree on a session key is subject to several types of attacks.
Diffie and Hellman solved the problem of key exchange in the 1970s, and Diffie-Hellman key exchange is used today to establish a confidential communication session with SSH, SSL/TLS and other systems. (Show D-H Key Exchange resource.) There are other key exchange protocols in use today as well (IPSEC uses IKE, Internet Key Exchange protocol).
Checksums, CRCs, Message Digests, and MACs
A checksum (or just sum) is a simple way to check for errors. The bytes (or words) of some file are added together, ignoring any overflow. The result is a checksum. This was used by the Roman bishop Hippolytos as early as the third century. Today money, credit card numbers, ISBN numbers, etc. all use checksums.
A CRC (cyclic redundancy check) is an improved version of a checksum: the data is considered a very large single number which is divided by a special value. The remainder becomes the CRC. Unlike a checksum a CRC will detect transposed values and many other types of problems that would get past a checksum.
Neither of these methods is useful for security applications. It is not difficult for an attacker to cause a modified file to have the same checksum or CRC as the original.
A [message] digest and [secure] hash [code] (a.k.a. “MDs”) is usually used instead of checksum or CRC for security applications. Also called a fingerprint, file signature, or message integrity code (MIC). These have the same overall purpose as a checksum or CRC but the algorithms used makes it very difficult for an attacker to predict what changes are needed to cause a particular result.
MACs (message authentication code) differ from MDs in that MACs also use a shared key to compute the digest; both the sender and receiver must share this key to generate (sender) and validate (receiver) the MAC. A MAC that uses some hashing algorithm is called an HMAC.
Consider what YouTube has in its url: .../watch?v=pgyuYHXqlO4
What is “pgyuYHXqlO4”? This can be determined with a little math. Each of the 11 characters has 62 possible values (a-z, A-Z, 0-9), so there are 62^11 possible strings. How unique are they? The decimal equivalent is about 5.2x10^19. To find out what that means let’s see how big this number is on a computer. We do this by finding the log2 of 5.2x10^19, which is 66 bits. This is bigger than a long. It’s probably a 64 bit (or smaller) hash of something, with some extra bits in front so the lookup can easily be handled by multiple hosts in a cluster (each host is identified by that prefix number).
A further problem is one of trust: How do you know that the email you got from me, containing my key, is really from me and not some impostor? This problem is hard!
One insight is that a trusted third party can digitally sign my public key, before I publish it. A user wanting Wayne Pollock’s real public key can find it on the Internet, verify the trusted third party’s signature, and thus be assured that this key is really Wayne Pollock’s. Of course you need to “know” the 3rd party’s public key, and you must “know” that key really belongs to the trusted 3rd party.
Key signing leads to the concept of a certificate. A person’s (or website’s) certificate is a document containing their identifying information (name, address) and their public key. That in turn is combined with the issuer’s (the trusted third party) identity, the period of validity of the certificate, and (usually) a URL where the issuer’s public key can be found. Then the certificate is digitally signed. Of course, unless you know the issuer’s public key is genuine you still have problems.
The most common type of certificate in use today is the X.509 compatible one. These are encoded when stored in a file in a variety of formats (.pem, .crt, ...). Your computer has one (or more) databases, or stores, of imported certificates.
The GPG/PGP solution is called the web of trust. The idea is to have your public key signed by as many people as possible. Then if someone else wants to use my key, they only need to personally trust any one of the signers. If they don’t know any of them, too bad! Some popular keys are signed a thousand times, making key exchange difficult.
The U.S. government solution was (once) that they would be the trusted third party. Anyone could have them generate a public key signed by them, who would hold all keys including the private ones in escrow, a authoritative database of keys (so they could wiretap). This scheme did not prove popular.
A third solution is used today, a global public key infrastructure (PKI). This scheme has certificate authorities (CAs) that everyone knows the public keys for, and everyone trusts. PKI defines different types of certificates, such as personal and website certificates. (You can’t use the wrong type). PKI includes mechanisms to allow you or the issuer to revoke a certificate (before it’s expiration date).
A hierarchy of CAs is possible, with one CA signing the CA certificate on another. Ultimately you only need to trust a few root CAs.
To use PKI, you generate you own pair of keys, and create a key signing request (KSR) file. That document contains your information plus your public key. You then submit the KSR to your CA of choice, who typically will charge money. The CA is then supposed to do a check to confirm your identity, and generates the certificate for you.
The certificates for some well-known root CAs are included in most web browsers (show). Also, anyone can become a CA, only if you do your CA certificate isn’t signed by another CA already part of PKI and so won’t (or shouldn’t) be trusted by the world at large. Still, organizations such as HCC could become their own CA, to issue certificates to students, to enable secure emails with faculty and staff, to allow secure registration, etc.
To become your own CA, you generate a pair of keys, create a CA signing request, then self-sign the certificate (I trust myself.) Now you have a valid (but untrusted) CA certificate, that can in turn be used to sign personal and website certificates.
PKI has stalled. The various companies involved can’t seem to move forward. What we have now is a few companies such as Verisign, Thawte, EquiFax, and others whose CA certificate is well known enough to be considered trusted.
Steganography
While cryptography can protect the content of a message, it can’t hide the fact that a message was sent (and on the Internet, some further information is available in the packet headers). Sometime the fact that the U.S. president is sending messages to the premier of Russia is too much information to let out. In military terms, sending many messages to someone or someplace can let the enemy know their communications has been tapped (passive wiretapping, or man-in-the-middle) or something significant will soon be occurring. (Traffic analysis or signal intelligence, a.k.a. SIGINT.)
To prevent such consequences even the fact that a message was sent must be kept confidential. Steganography is the art of concealing information, typically messages, so nobody even suspects a message was sent. Often other techniques are used too, such a cover stories and fake messages. (Coca-Cola does this to prevent anyone from learning the formula for Coke by studying shipping orders from their suppliers over time.)
OpenID (single sign-on across entire Internet)
OpenID (openid.net) is an attempt to solve the problem of having (and maintaining) multiple accounts when using the Internet, by providing a single authentication service for a user. The idea is that you only need a username and password on your chosen OpenID server. Other websites will trust the OpenID server to say if you are you or an impostor.
When a user wants to log in to any web site, instead of submitting a username and password the user submits an OpenID URL that he owns. This URL points to the user’s OpenID authentication server. The web site uses this URL to find the user’s authentication server and asks it, “This user claims he owns this URL. The URL says you are in charge of authenticating this fact, so tell me, can he access my site?”
The authentication server then prompts the user to log in using the IP address and port number of your already open web browser window (to the “store”), and will then ask the user whether he wishes to be authenticated to the external site. If the user confirms that yes he does, then the authentication server will notify the external site that the user is authenticated.
Other Identity Systems
There are several identity schemes being developed, non widely supported yet. Several of these schemes use SAML (the OASIS Security Assertion Markup Language). Cat Okita defines identity (“(digital) Identity 2.0”, ;login: (Usenix) V32#5 Oct. 2007, pp. 13–20) as:
1. Verifiable – My statements/assertions about my identity can be verified.
2. Minimal – Only tell others the least they need to know to verify my identity. This helps preserve privacy.
3. Unlinkable – The different parties I’ve proven my identity to cannot link together the information each has about me, to get around the minimal requirement.
4. Usable – The identity system must be useable by average persons, or it is useless.
These are based on Kim Cameron (of Microsoft) Laws of Identity (msdn2.microsoft.com/en-us/library/ms996456.aspx), which were restated by Ann Cavoukian (www.ipc.on.ca/images/Resources/up-7laws_whitepaper.pdf) as: (1) Personal control and consent, (2) Minimal disclosure, (3) Need to know access only (justifiable parties), (4) Protection and accountability (directed identity), (5) Minimal surveillance, (6) Understandable by (average) humans, and (7) Consistent experience. The major schemes today include:
· Microsoft CardSpace (cardspace.netfx3.com) – Not unlinkable.
· Liberty Alliance (www.projectliberty.org) – Verifiable but not currently minimal or unlinkable. Uses SAML.
· OpenID (openid.net) – Not verifiable or unlinkable. Used by Web 2.0.
· Shibboleth (shibboleth.internet2.edu) and Pubcookie (pubcookie.org) – Not unlinkable, uses SAML, common in academia.