Every web, backend, or diligent developer should know how cryptography works on a conceptual level so they don't make any security mistakes if they're ever put into a position where they have to manage a website's security. Knowing the mathematical functions and properties aren't necessary as they won't make you a better developer, but if you're interested, it's pretty cool.
I'm going to describe everything in a way that makes it easy to understand HTTPS (and how passwords are stored) because I'm going to describe the security of logging into a webpage in a sequel to this article. Keep in mind, I'm a high school student - I'm no expert on this. Let's begin. Imagine you're logging into website.com. How does that website store your passwords securely?
(This is the long one)
Website.com will hash your password to store it securely. Hashing algorithms are what's known as one-way function. This means that, if someone was given the output of a hashing function, there's no way they could figure out what the output is from the output, other than by putting an input into the hashing function and seeing if it's the same as the output from an unknown input. A good hashing algorithm is SHA-1. Other good examples that are used in practice are SHA-256, Bcrypt and Scrypt. For example, if we input "hashnode" into the SHA-1 hashing function, we'll get
f9e89a4bb6157c20420266b2de678d343c408d84. The only way to prove that "hashnode" is the input to get that output is to put "hashnode" into SHA-1 and see if the output is the same. Any other input into SHA-1 will have a different output, as far as we know.
This is why hashes are used for passwords. By storing password hashes on a server, you don't actually ever have to store a password, but you can confirm that a user knows the password for any account. This way, if a hacker steals a database of passwords, they won't know any of the passwords. For example, if my email is email@example.com and my password is hashnode, and a hacker steals the database of website.com. If website.com hashes passwords, the hacker wouldn't know what my password is. If the website didn't hash passwords, the hacker could learn my passwords and log into my account. If I use the same password to log into my GitHub account, the hacker could access my GitHub account. This is why it's bad to reuse passwords - if one website you use the password on gets hacked, then the hacker could use that password to access your other accounts.
Why does the length of a password matter?
Remember when I said if website.com hashes passwords and is hacked, the hacker wouldn't know my password? Well, that's true, but there's ways of getting around it. I'll go over the most simple method: guessing. Remember the only way to know the input from a hash is by testing an input and seeing if the two hashes are the same. So, hackers will try to guess every character combination until they get a matching hash, indicating that it's the correct password. Let's say that a password can only be one letter and it must be lowercase. There's 26 letters in the alphabet, which means that a hacker can just put every letter in the alphabet through the hashing function, and see which input matches the output. For example, the SHA-1 hash for the input 'd' is
3c363836cf4e16666669a25da280a1865c2d2874. The hacker can find the hash of 'a' and see if it's equal to the hash. If it's not, he can check the hash of 'b' and see if it's equal, and so on. Once he gets to 'd,' he'll see it has the same hash so he knows the password is 'd.' That would take nanoseconds to compute on modern computers, though, because he only has to compute a maximum of 26 hashes. What if the password has to be eight characters long, and uppercase letters, lowercase letters, and numbers are allowed. This means that each character in the password can be one of 62 possible options (26 uppercase letters + 26 lowercase letters + 10 digits), and there's eight characters, so the maximum amount of hashes a hacker would need to figure out someone's password here would be 62 × 62 × 62 × 62 × 62 × 62 × 62 × 62, or 628, hashes, which is 218,340,105,584,896. That's a lot of hashes, but believe it or not, it could be cracked in a short time. Let's figure out how quickly. If this random guy on StackOverflow is correct, he can do 622 million SHA-256 hashes per second with his GPU. So, it would take him a maximum of about four days (number of hashes / hashes per second) to crack a password of this strength. To counteract this, developers use hashing functions like Scrypt and Bcrypt in practice which takes longer to crack by using more resources, like requiring more RAM or re-hashing a string over and over. There's other ways for a hacker to get around this, like using different types of attacks. The type of attack demonstrated here - guessing every password until you get the right one - is called a brute-force attack. Hackers sometimes use a dictionary attack, which essentially only guesses words in the dictionary or a list of common passwords, because it's likely that no one makes their password
M3cA42z and would rather make it something easy to remember, like
Pancake1. Dictionary attacks work because it significantly lowers the amount of hashes a hacker would have to compute, but it usually doesn't work because developers salt passwords, essentially meaning they add random characters to the password. I won't go into that, though.
Now that we've learn how passwords are stored, let's learn about how passwords are sent securely.
Symmetric and Asymmetric Encryption
The Middleman Problem
Have you ever been to a website that said "Insecure," or the URL started with
http:// instead of
https://? Well, that means that your connection to the website isn't encrypted. So, if someone else had access to your internet connection (which will almost always be the case), they could see all the data you sent between you and the website. This is the middleman problem. To fix it, we use HTTPS, a way of preventing middle parties from snooping on the data you sent to websites.
In encryption, a piece of data you use to encrypt or decrypt other data is called a key. In symmetric encryption, the same key can be used to encrypt or decrypt data. For example, if my key is "hello" and my data is "hashnode is so cool," I can encrypt it to look like random text, like
fiojakjalksauixnwbjs, then I can decrypt it with the same key, "hello," to get back to my original data. Two parties can use this to exchange encrypted data. If Person A and Person B have the same key, they can both send each other encrypted data. A middle man would have to know the key if they wanted to read the data sent between Person A and Person B. That's it from a conceptual view, let's look at it in real life. A good symmetric encryption algorithm is AES. Assuming both parties know the same AES key, we can communicate with them securely by only sending encrypted data between the two parties.
Let's do an example, with you and me. First, here's my AES key:
6v8y/B?E(H+MbQeT. This is a 128-bit key, and I'm going to use it with the AES-ECB algorithm. Now that we both have the same key, here's some data I've "sent" you:
Can you figure out what it says? (Hint, use this website, and make sure to press 'Decode into Plain Text'). Here, we've both got the same key, so that we can communicate with each other in an encrypted "channel" by sending data encrypted to you with the key, then you decrypt it with the key, and vice versa.
In HTTPS, the user sends a website a random AES key, and then the user and website can communicate in an encrypted channel with that AES key.
You might be wondering, why can't the middleman just see the AES key sent from the user to the website, and decrypt the data between the user and the website with it? Well, HTTPS uses asymmetric encryption so that a middleman can't get his hands on the AES key.
Asymmetric encryption is my favorite part of cryptography. Basically, a computer can generate two keys: a public and private key. Someone can use the public key to encrypt data, but only the person with the private key can decrypt it, so the private key must only be kept with those who should be able to decrypt data that was encrypted with the public key.
Say Person A has a public and private key and is in a room with Persons B, C and D. He can then shout out his public key everyone in the room. Everyone can then encrypt a message with Person A's public key and shout it back. Even though everyone knows Person A's same public key, only Person A is capable of knowing what everyone said. For example, say Person B's message was
Hello! Person B encrypts it with the public key, then shouts it across the room, which could be
3xj92x@sd(2=kkW1. Even though everyone in the room heard Person B shout
3xj92x@sd(2=kkW1, only Person A and B will know that the message was
Hello!. Person A will know because they can use their private key to decrypt
Hello! and Person B will know because they knew that the message was before they encrypt it.
This is how users can securely send AES keys to websites. The user asks the website for its public key, the user generates an AES key, the user encrypts the AES key with the website's public key, and sends it to the website where it is decrypted with the website's private key. A middleman couldn't get the AES key because he only can see the encrypted AES key sent from the user to the website, and he doesn't have the website's private key to decrypt the encrypted AES key. Now, both parties have the same AES key that the middleman doesn't know which they can use to send data to each other securely. I didn't get very specific with asymmetric encryption, but the algorithm used for asymmetric encryption in HTTPS is called RSA if you'd like to do some research yourself.
An Extension To The Middleman Problem
Middlemen can do more than just see the data you're sending to a website - they can change it. You've likely seen this happen at a public WiFi hotspot. For example, at Starbucks, if you try to connect to a website it will send you to a login portal to access their WiFi. So, what is stopping a middleman from intercepting your request and giving you their own public key that they have a private key for (To rephrase, they generate their own public and private key and give that to the user, so they could decrypt it themselves)? Well they can, but something call a certificate authority can be used to validate that a public key is legitimate (that it wasn't sent by a middleman).
How Certificate Authorities Work
A certificate authority (CA) is a party that everyone trusts. Certificate authorities issue valid certificates to people who own domains (and rarely, IPs). I won't get into how certificate authorities validate that they're giving the certificate to the right person, but I will talk about how certificate authorities work. Like with websites, certificate authorities have public and private keys. Unlike websites, they use their public and private key to ensure that a website is giving the user a valid public key.
Before the protocol is explain, we need to learn about the cryptography behind it - digital signatures. Essentially, a certificate authority uses a DSA (digital signature algorithm) to sign a website's public key with their private key. To reiterate, the CA uses its private key to sign a website's public key. Then, anyone with the CA's public key can use a different DSA to validate that the CA did sign a website's certificate.
Let's go through the protocol now. You want to go to a website, so the website sends you its public key. How do we know it's the key the website sent, and not anything sent from a middleman? Well, we can run an algorithm using the CA public key to validate that the CA used their private key to sign the website's public key. If the algorithm validates it, we're good to go, and can securely connect to the website. How does your browser get the certificate? Simple - it already has them. Your browser and operating system keep a bunch of trustworthy certificate authorities' public keys. If the algorithm says that a certificate authority didn't sign the website's private key, you'll get an error in your browser.
Putting It All Together
Let's go to Hashnode.com. When we type
Have any questions or notice any inaccuracies? Please let me know in the comments and I'll do my best to reply. Keep in mind that I'm a high school student and not an expert on the topic, and this was just meant to provide a high-level overview of how HTTPS works.