What are merkle trees?
Let's start with the basics by brainstorming a solution to this problem:
Let's do a little better with the understanding that utilizing CPU time locally is cheaper than sending something over the network.
A better approach would then be to generate a hash value (i.e., a checksum) from each of the two files and compare the results. This approach starts making even more sense as you increase the number of machines on which the file is replicated and comparison needs to be performed--each machine can generate a checksum and share the results.
Now, let's ask another question: "When you find differences in the hashes generated for file comparison, how can you quickly find out which part of the file is different?"
This is more challenging because a checksum/ simple hash value is generated for the entire file and not certain parts of it, and you cannot find which part of the file was different by just looking at the checksum. That's where Merkle trees help.
Instead of generating just one hash value, you can split the file into multiple chunks and generate hash value for each chunk separately. And to make the process efficient, instead of sharing all the hash values, you generate a hash of two hash values, and keep building the tree. The final, single hash value is for the entire file and can be shared. When differences are found, you can compare the hash values at the next level (and so on), till you identify the block which was different.
But what does this have to do with blockchain?
A public blockchain network could have thousands of computers. How do you check and ensure the integrity of the blockchain across all these computers? Do you share the hash values or the entire chain for comparison? And how do you find out which part of the chain had a mismatch? How can you quickly rebuild the hash after changing some part of the data?
Merkle trees are your friends.
Let's start with the basics by brainstorming a solution to this problem:
How can you compare two copies of a large file across two machines?A naive approach would be to send the file from one to the other machine, and compare the contents using a sting matching algorithm.
Let's do a little better with the understanding that utilizing CPU time locally is cheaper than sending something over the network.
A better approach would then be to generate a hash value (i.e., a checksum) from each of the two files and compare the results. This approach starts making even more sense as you increase the number of machines on which the file is replicated and comparison needs to be performed--each machine can generate a checksum and share the results.
Now, let's ask another question: "When you find differences in the hashes generated for file comparison, how can you quickly find out which part of the file is different?"
This is more challenging because a checksum/ simple hash value is generated for the entire file and not certain parts of it, and you cannot find which part of the file was different by just looking at the checksum. That's where Merkle trees help.
Instead of generating just one hash value, you can split the file into multiple chunks and generate hash value for each chunk separately. And to make the process efficient, instead of sharing all the hash values, you generate a hash of two hash values, and keep building the tree. The final, single hash value is for the entire file and can be shared. When differences are found, you can compare the hash values at the next level (and so on), till you identify the block which was different.
But what does this have to do with blockchain?
A public blockchain network could have thousands of computers. How do you check and ensure the integrity of the blockchain across all these computers? Do you share the hash values or the entire chain for comparison? And how do you find out which part of the chain had a mismatch? How can you quickly rebuild the hash after changing some part of the data?
Merkle trees are your friends.
Comments
Post a Comment