Coming from the background of Competitive Programming and Software Development, I have compiled a list of algorithms and data structures that every programmer should know about. We will see what they do and where they are used with simple examples. This list is prepared to keep in mind their use in competitive programming and current development practices.
Here are the Top 7 algorithms and data structures to know:
- Sort algorithms
- Search algorithms
- Dynamic programming
- Exponentiation by squaring
- String matching and parsing
- Primality testing algorithm
1. Sort Algorithms
Sorting is the most heavily studied concept in Computer Science. Idea is to arrange the items of a list in a specific order. Though every major programming language has built-in sorting libraries, it comes in handy if you know how they work. Depending on the requirement, you may want to use any of these.
More importantly, you should know when and where to use them. Some examples where you can find a direct application of sorting techniques include:
- Sorting by price, popularity, etc. in e-commerce websites
- Sorting by score in HackerEarth contest leaderboard
2. Search Algorithms
Binary Search (in linear data structures) - Binary search is used to perform a very efficient search on sorted dataset. The time complexity is O(log2N). The idea is to repeatedly divide in half the portion of the list that could contain the item until we narrow it down to one possible item. Some applications are:
- When you search for a name of the song in a sorted list of songs, it performs binary search and string-matching to quickly return the results.
- Used to debug in git through git bisect
Depth/Breadth First Search (in Graph data structures)
DFS and BFS are tree/graph traversing and searching data structures. We wouldn't go deep into how DFS/BFS work but will see how they are different through the following animation.Applications:
- Used by search engines for web-crawling
- Used in artificial intelligence to build bots, for instance, a chess bot
- Finding shortest path between two cities in a map and many other such applications
Hash lookup is currently the most widely used technique to find appropriate data by key or ID. We access data by its index. Previously we relied on Sorting+Binary Search to look for index whereas now we use hashing.
The data structure is referred as Hash-Map or Hash-Table or Dictionary that maps keys to values, efficiently. We can perform value lookups using keys. Idea is to use an appropriate hash function which does the key -> value mapping. Choosing a good hash function depends on the scenario.
- In routers, to store IP address -> Path pair for routing mechanisms
- To perform the check if a value already exists in a list. Linear search would be expensive. We can also use Set data structure for this operation.
4. Dynamic Programming
Dynamic programming (DP) is a method for solving a complex problem by breaking it down into simpler subproblems. We solve the subproblems, remember their results and using them we make our way to solving the complex problem, quickly.
I cannot help but quote this answer on Quora to explain DP in layman terms.
*writes down "1+1+1+1+1+1+1+1 =" on a sheet of paper* What's that equal to?
*writes down another "1+" on the left* What about that?
How'd you know it was nine so fast?
You just added one more
So you didn't need to recount because you remembered there were eight! Dynamic Programming is just a fancy way to say 'remembering stuff to save time later'
- There are many DP algorithms and applications but I'd name one and blow you away, Duckworth-Lewis method in cricket.
5. Exponentiation by squaring
Say you want to calculate 232. Normally we'd iterate 32 times and find the result. What if I told you it can be done in 5 iterations?
Exponentiation by squaring or Binary exponentiation is a general method for fast computation of large positive integer powers of a number in O(log2N). Not only this, the method is also used for computation of powers of polynomials and square matrices.
- Calculation of large powers of a number is mostly required in RSA encryption. RSA also uses modular arithmetic along with binary exponentiation.
6. String Matching and Parsing
Pattern matching/searching is one of the most important problems in Computer Science. There have been a lot of research on the topic but we'll enlist only two basic necessities for any programmer.
- KMP Algorithm (String Matching)
Knuth-Morris-Pratt algorithm is used in cases where we have to match a short pattern in a long string. For instance, when we Ctrl+F a keyword in a document, we perform pattern matching in the whole document.
- Regular Expression (String Parsing)
Many times we have to validate a string by parsing over a predefined restriction. It is heavily used in web development for URL parsing and matching.
7. Primality Testing Algorithms
There are deterministic and probabilistic ways of determining whether a given number is prime or not. We’ll see both deterministic and probabilistic (nondeterministic) ways.
- Sieve of Eratosthenes (deterministic)
If we have a certain limit on the range of numbers, say determine all primes within range 100 to 1000 then Sieve is a way to go. The length of the range is a crucial factor because we have to allocate a certain amount of memory according to the range.
For any number n, incrementally testing up to sqrt(n) (deterministic)
In case you want to check for few numbers which are sparsely spread over a long range (say 1 to 1012), Sieve won't be able to allocate enough memory. You can check for each number n by traversing only up to sqrt(n) and perform a divisibility check on n.
- Fermat primality test and Miller–Rabin primality test (both are nondeterministic)
Both of these are compositeness tests. If a number is proved to be composite, then it sure isn’t a prime number. Miller-Rabin is a more sophisticated one than Fermat’s. In fact, Miller-Rabin also has a deterministic variant, but then it's a game of trade between time complexity and accuracy of the algorithm.
- The single most important use of prime numbers is in Cryptography. More precisely, they are used in encryption and decryption in RSA algorithm which was the very first implementation of Public Key Cryptosystems
- Another use is in Hash functions used in Hash Tables
We'll discuss some advanced algorithms every competitive programmer should know in the next post. Meanwhile, master the above algorithms or share in the comments about what you think every beginner-intermediate programmer should know.
Till next time. Evíva!
Also published on Medium.