Mises2 — Decentralized StorageDownload Whitepaper
Mises2 — Decentralized StorageDownload Whitepaper
Web3’s need for decentralized storage
Decentralized storage is the infrastructure of Web3, it should meet following

requirements:

1. The capacity should be high enough for large volume of users at the same time
Every storage, retrieval and download operation of the user is equivalent to transaction, and it requires consensus to be recorded on the chain applying to decentralized storage service. If this decentralized storage system serves a decentralized community or decentralized social media, the volume of transactions will be very large. When the DAU of a video website reaches 100 million, users will view more than 1.6 billion videos per day.

So, the blockchain for Web3 must have high TPS and low gas fee. At present, the successful blockchains all provide financial services, so users have the expectation to make money by using them. And that’s why they can tolerate high gas fees. Despite this, high gas fees have been criticized by users and defi projects in ETH ecosystem.

2. Low operating cost

The main cost of nodes participating in decentralized storage is the purchase of server storage equipment, IDC hosting and bandwidth rental fees (or the cost of purchasing cloud computing). Relatively speaking, bandwidth is more expensive. Can decentralized storage cost less than centralized storage services with the same user experience?

3. Good user experience

Can users quickly retrieve, obtain stored data? Even faster than centralized storage? The user experience mentioned here is mainly the speed of retrieval and download. Due to the topology of the Internet, for users in different regions accessing the same server, the variance in speed is very large. Centralized applications often require the purchase of CDN services, while decentralized services can be composed of a large number of servers distributed in different regions. Build your own CDN network to give users a better experience.

4. The storage node has autonomy of rejecting certain content

Decentralized services will face the problem of dealing with illegal content. On the one hand, the entire network cannot conduct content censorship for users, nor can there be a unified standard; on the other hand, each node has to meet the legal restrictions on content in the country where it is located.

At present, Arweave has only a very crude solution, and when it grows its influence, it is impossible to meet the contradictory requirements from users and governments.

5. The storage service of the node can be verified

At present, most decentralized services can achieve verifiable storage services, but most of them are complex and inefficient.

6. Can be integrated with existing web services through gateway

IPFS allows anyone to create a gateway, so that users can use a browser to access the content stored on IPFS. This technology is not difficult to implement, but how to motivate a large number of users to build gateways.

Existing decentralized storage networks cannot meet all the needs of web3.

IPFS

IPFS is a well-designed p2p decentralized storage and transmission network. It applies the best of p2p technology in the past 20 years, including DHT, BitTorrent, Git, etc. The entire IPFS protocol is divided into 7 layers: identity, network, routing, exchange, object , files and naming. Three of these sub-protocols are briefly introduced here.

1. Routing

IPFS is based on content addressing. It uses DSHT based on S/Kademlia and Coral. Kademlia is widely used by p2p applications such as Gnutella and BitTorrent. It can find files in a large scale node network. For a network with 10 million nodes, 20 hops would be enough to find files maximumly.

Coral has improved Kademlia. When searching for a hash key, only part of the value can be returned; Coral can also form a cluster by region, and the node first searches in the cluster of its own region, which greatly increases the efficiency of search and transmission.

For unpopular content, searching through DHT is still inefficient, and it needs to go through multiple hops. The actual network connection between hops may be very slow, resulting in the search taking a long time. Applications with high real-time requirements, such as short videos, are difficult to use DHT to find content.

2. Exchange

IPFS adopts and develops BitTorrent’s BitSwap protocol to complete the distribution of content. A node maintains the want_list and have_list of a data block, and exchanges data with other nodes. Unlike BitTorrent, the blocks of want_list and have_list are not limited to blocks of a file, and all nodes form a data exchange market.

In order to prevent nodes from only requesting but not contributing, BitSwap establishes a ledger for nodes and establishes credit for other nodes that exchange data with it. If a node does not always respond and does not serve other nodes, other nodes will no longer respond to its data requests. Through such a credit system, IPFS encourages nodes to store and actively respond to the data needed by other nodes, but if a large amount of popular content is stored and responded to requests from a large number of nodes, the cost is very high. It is difficult for IPFS to provide large-scale, stable, high-quality services if there’s no economic incentives but relying on credit systems like BitSwap.

3. Object

IPFS data objects are stored in the structure of Merkle DAG. Merkle DAG has many advantages: 1) tamper-proof, all content is verified by checksum, once tampered or damaged, it will be immediately discovered; 2) reuse, same blocks from different files or different versions from the same file will be stored repeatedly; 3) It is beneficial to the version management of the file, and the IPFS object itself is also developed from Git; 4) If the decentralized storage with incentives and economic model is adopted, the user pays for storage, the node needs to prove that it has always stored the user’s data. The data object adopts Merkle DAG, which makes the storage proof relatively simple to complete, which will be explained in detail later.

A block in the IPFS data object is set to 256k, which also brings some disadvantages: 1) For large files, too many blocks are used, resulting in queries and too many nodes involved in downloading. If most of the nodes are unstable nodes, user experience will be very bad 2) It is more difficult for nodes to censor content. On the one hand, nodes cannot violate the privacy of users. On the other hand, nodes must also comply with the laws of the country where they are located. For example, nodes in the United States cannot serve child pornography content. We will discuss this problem in detail in the solution.

IPFS inherited the spirit of one for all and all for one in the early days of the Internet. Nodes contribute their own storage and bandwidth for free, and also use the storage and bandwidth of other nodes for free. The advantage of being free is that there is basically no requirement for participation, but there are also many defects: 1) Users often use their own home PCs and home bandwidth as IPFS nodes, because home PCs have excess storage space, and home bandwidth is paid monthly so it will not have extra costs. However, users will not spend money to deploy servers in the IDC to purchase bandwidth. The home node is not stable enough, and the home upstream bandwidth is much smaller than the downstream, which greatly limits the bandwidth for providing services to others; 2) Free services are easy to be attacked, and the cost of malicious nodes is low. IPFS has to adopt a more complex identification mechanism and mutual evaluation mechanism between nodes; 3) The user experience is not good enough, due to the instability of many nodes, with the low efficiency of DHT retrieval, users cannot expect an efficient and stable storage service.

Filecoin

After IPFS, Protocol Labs launched Filecoin, an incentivized decentralized storage, and there are three markets in Filecoin. In the storage market, storage miners rent out the available storage space, the storage is verified by the Filecoin network, and storage users store data by paying Fil. In the retrieval market, users pay Fil to retrieval miners for the data they provide. Finally, participants can conduct Fil transactions, and Fil circulates among miners, users and other token holders.

Filecoin’s economic model is based on data storage. Storage miners play a central role in providing storage services and ensuring consensus on the chain. Storage miners have four types of income: 1) Storage market income, and obtain Fil paid by users by selling storage services; 2 ) Blockchain transaction fees, by creating new blocks through competition, to obtain transaction fees for transactions within the block; 3) Retrieval market revenue, to obtain Fil paid by users by providing retrieval services; 4) Storage mining rewards, by the block Rewards which are issued to reward those maintaining the blockchain, running contracts, and subsidizing reliable and useful storage.

The upper limit of Fil minting is 2 billion, of which 10% is allocated to financing, 7.5% sold in 2017; 15% is allocated to Protocol Labs; 5% is allocated to the Filecoin Foundation; the remaining 70% is allocated to miners as mining rewards , 55% is storage mining reward, and 15% is mining reward reserve. Since most of Fil is distributed to miners through rewards, after the launch of the Filecoin mainnet in 2020, miners mainly focus on reward mining, and most miners do not serve real users. The miners fill in the data by themselves (copy it directly with the hard disk), encapsulate the sectors, and completely regard Fil as the target of financial speculation, and only rent a small amount of bandwidth to meet the basic requirements of mining.

What kind of users need decentralized storage? What kind of applications do they need? Are they willing to pay for it? Filecoin has not yet been able to answer these questions very well. Filecoin uses financial means to motivate miners to participate in large-scale participation. It already has a huge storage hardware device. Can it really serve customers in the end?

For a paid decentralized storage service, the node needs to prove to the customer that it has stored the user’s data for a period of time. Filecoin adopts two methods: Proof-ofReplication and Proof-of-Spacetime. Proof-of-Replication ensures that the node saves the user’s data in its own local physical storage. The storage proof generally adopts the challenge/response proof method. For example, the verifier randomly asks the node to provide a part of the user data. However, Sybil, Outsource and Generation can attack. Filecoin’s PoRep introduces concepts such as sector and encapsulation to avoid the above attacks. PoRep can only prove whether the node stores user data at the moment when the verifier challenges, Proof-of-Spacetime requires nodes 1) generate sequential Proofs-ofReplication as a way to determine time; 2) recursively compose the executions to generate a short proof . Filecoin uses zk-SANRKs to implement PoRep and PoSt, which is quite complicated.

If there is no storage mining reward, the storage proof does not need to be very strict. For example, a Sybil attacker may have no motivation to attack and gain no benefit. PoS can also be implemented in a very simple way, which means there is no need to come up with PoSt anymore, the implementation of proof of storage will be greatly simplified, and we will discuss it in detail in the solution.

Arweave

The Arweave mainnet launched in June 2018, and it provides a “permanent storage” service. Arweave found that storage costs have declined at a rate of about 30% per year over the past few decades, and a model estimated that users prepaid storage costs for at least 200 years is not an alarming value. Because of this feature, Arweave has become a decentralized storage used by many Defi and NFTs, such as mirror, solana, uniswap, yearn, makerDAO, etc. Arweave uses a much more conservative estimation when it actually calculates the cost of permanent storage, but will the cost of storage continue to fall so rapidly over the next 200 years? Arweave also does not take into account the cost of miner bandwidth, and its economic model is questionable whether it can truly support permanent storage.

Arweave stores user data in the transaction of the block, and it creates Blockweave and blockshadow technology, so that nodes do not need to store all the blocks of the chain. Arweave introduces two new data structures, 1) blocklist, which is a hash list of all previous blocks, through which old blocks can be verified and potential new blocks evaluated; 2) walletlist, which is a list of all active wallet addresses , which enables nodes to validate new transactions without owning the block that produced the previous transaction. Since these blocklists and walletlists have already been fully verified in the previous “continuous verification” process of each block, new miners do not have to download and verify the entire blockchain. The design of Arweave should be to reduce costs. If each node must save a complete block, it is equivalent to that all nodes save a copy of user data. This cost is too high. We estimate that Arweave is more suitable for saving files with a small amount of data, such as web pages, etc., but not suitable for videos, games, etc. with a large amount of data.

Since user data is stored in transactions, Arweave introduced BlockShadow technology to decouple transactions and blocks, and only send the “shadow” of blocks between nodes. It only contains a walletlist and block hashlist, as well as a set of transaction hashlists , through this information, the node can reconstruct the complete block. This solution solves the cost and efficiency problems of block distribution and consensus, but we should dig deeper. Is it a good design to store user data in transaction? Its drawbacks are obvious, it greatly increases the complexity of the system, and is also very unfavorable for the storage of large files.

Another important feature of Arweave is free data access, which uses the wildfire mechanism to incentivize nodes to share data for free. In reality, servers are placed in different IDCs, and the speed of users accessing servers in different IDCs is very different. Generally speaking, the server in the IDC that is closer to the top router of the backbone network responds faster to users, and the user experience is better; a server in an ordinary IDC can only serve users in the same area. Buying bandwidth from a good IDC is very expensive, far exceeding the cost of storage. Therefore, in order to provide a high-quality user experience for Internet services, it is often necessary to purchase CDN services. Even the cheapest p2pcdn, its price far exceeds the price of storage. We believe that accessing data for free is not a good design choice and will be discussed in more detail in the solution.

Arweave’s wildfire is similar to IPFS’s Ledger. The node establishes a ranking system for other nodes that exchange data with it. The ranking is based on the node’s response speed to network requests and the speed of receiving data. The node will give priority to sending transactions and blocks to the nodes according to the ranking to give these nodes an advantage in mining, thereby incentivizing nodes to actively respond to requests from other nodes. The distribution of user data in the entire network is extremely important. If user data is always stored on the “closest” node to the user who needs it, then the use cost and user experience of the entire decentralized storage will be optimal, making the cost even lower and the user experience even better than centralized storage solutions. This is the key to the success of the decentralized storage network. What kind of data distribution mechanism is best and closest to achieving the ideal state? It will be discussed in detail in the solution.

Arweave supports nodes refusing to serve certain content and to vote democratically in the overall network governance to refuse to serve content that is condemned by the majority. Although this mechanism exists, in the current practice, only a blacklist is maintained, including a blacklist of hash values of data that the node does not want to store. Due to the small amount of usage, this demand is not urgent for Arweave, although the current mechanism is still simple and does not seem to have any adverse effects. A decentralized network serving a large number of users and applications will inevitably encounter this important problem, involving the laws of different countries, and we will also discuss this in the solution.

SWARM

Swarm was a project supported by the Ethereum Foundation in the early days and operated independently after financing. Let’s take a look at the economic model of Swarm first. The initial distribution of token BZZ issued by swarm is as follows: 7% for foundation, 42% for private placement, 8% for public offering, 10% for DApp subsidy, 20% for development team, 13% for ecosystem, and ecosystem means financing for participating nodes’ equipment and bandwidth costs, plus 1 million BZZ airdrops through the testnet, total 62.5 million. After the mainnet goes online, BZZ follows the Bonding Curve mechanism. When BZZ is higher than the public offering price, it will be issued in the same proportion as the above distribution; when the currency price falls, BZZ will be automatically destroyed to reduce the circulation, but it is not a stable currency mechanism.

Storage nodes obtain benefits through storage incentives, bandwidth incentives and discovery incentives. 1) Storage incentives, users need to pay “stamps” to upload data, Swarm aggregates all stamps, redistribute them by storage size of each node; 2) Bandwidth incentives, data upload and download between nodes usually have senders, forwarders and receivers, and both forwarders and receivers will receive benefits. The benefits come from the mutual transfer of traffic fees between nodes. The upper limit of the traffic fee that a node can receive depends on the bandwidth. The higher the bandwidth, the higher the upper limit of the traffic fee; 3) Node discovery incentives. Swarm is different from Filecoin. The income of nodes mainly comes from the payment of real users, not mining rewards. Before Swarm went online in 2021, it attracted a large number of miners trying to mine, reaching more than 400,000 nodes, and more than 200,000 active ones. But after its launch, miners soon find out the benefit is very low because Swarm cannot attract a lot of real paying users. A lot of miners cannot operate for a long time.

What is the real need for decentralized storage? How big is it? How to develop and compete with alternatives? This is a basic problem that a decentralized project with a purpose of servicing must solve, and we will discuss it in the solution. Swarm encourages users to use it first. Its Accounting Protocol design is very unique. The contribution of bandwidth between nodes will be accounted for. It does not need to be paid at the beginning until the debt of one party reaches a certain level, thus encouraging users to use it first.

Swarm is similar to a simplified IPFS in retrieval and transmission, which is equivalent to implementing an incentive layer with the Ethereum xDai sidechain. The basic unit of storage in Swarm is called Chunk, and the maximum size of a chunk is 4k. Therefore, its design goal does not include large files. The Swarm team may have chosen a strategy of growing along with the growth of demand for decentralized storage from decentralized applications, rather than attracting users through the rise in currency prices, because most of the speculators attracted by this are not decentralized Stored users. With these two different strategies, who can reach the maturity of decentralized storage?

CRUST

CRUST introduces MPOW (Meaningful Proof-of-Work) based on TEE (Trusted Execution Environment). TEE is a concept proposed by Global Platform. Currently, there are mainly Intel and ARM implementations. We don’t know much about TEE technology. It requires special hardware. To achieve trusted computing, but not the consensus of the blockchain, the storage proof of nodes, and trusted computing are not necessary. If trusted computing is to increase efficiency and increase security, then it is not the only choice, and good architecture design can achieve the same purpose.

There are also some decentralized storage projects, such as sia, storj, etc., but so far, there is no project that can fully meet the needs of web3 for decentralized storage networks, and the market size of the entire decentralized storage is still very small. In this chapter, we will propose solutions for Mises.

Mises Solutions

1. User needs

The decentralized storage network can serve both 2C users and 2B users, and the needs of the two are very different.

2C users are willing to pay to use decentralized storage services to save two types of data: 1) Important personal data; 2) Data that can make money for users. The data may be text, pictures or videos. At present, the centralized network disk (iCloud, etc.) has a large number of paying users. The online drive will give users a certain amount of free storage space, and the excess part will be charged according to the size and storage time. Compared with centralized network disks, decentralized storage has the following advantages: 1) The user’s account and content are not controlled by the service provider, and truly belong to the user; 2) The security is better, and user data may be stored by different service providers in the network. Being stored by different service providers at the same time reduces the probability of data being damaged; if the storage costs are comparable, users will choose decentralized storage services.

There are many creators on the Internet, they create novels, music and web dramas, and hope to get income from the content, these creators are active on different platforms, contribute content, and share the revenue with the platform. Due to the monopoly nature of Internet platforms, creators are often in a weak position, most of the revenue is taken away by the platform, and creators have no bargaining power. Creators can put their content on the decentralized storage network, and the network generates URLs for the content, including introduction, preview, price and other information, and spreads it through various forums, communities, social media, and later through decentralized social media or Community, where creators really have their own viewers/customers. There is no one powerful platform that takes most of the creator’s revenue, only a small part of the revenue determined by the open source code is distributed to the nodes and development teams of the decentralized network.

2B users are more focused on using decentralized storage CDN services to provide their users with a high-quality user experience at a low cost. Decentralized storage always looks for the closest and fastest node to the end user to serve him. If there are enough nodes and they can be distributed evenly in the Internet, it can be done, and the consumption of Internet backbone network bandwidth can be reduced. Implementation options are discussed in Section 6.

The content of 2B users is often free to its end consumers, and a few premier content only allows its members to access, or needs a one-time payment. Since the decentralized network takes each access to the content as a transaction, it needs to support gas fees (very few) and bandwidth fees, which requires the enterprise to help its users pay. If the enterprise encounters a malicious attack, using the program accessing its content at scale, a decentralized web doesn’t take this into account, requiring businesses to deploy their own anti-spam strategies.

To clarify the user needs of decentralized storage services, the following sections will give solutions to Mises. For specific system design, please refer to Chapter 3.

2. Economic Model

Principle 1: User data is not stored on the blockchain. The upload and download of user data is a transaction between consumers and servers (storage nodes), and each transaction is stored on the Mises chain. Different from the atomic properties of token transfer transactions, the completion of upload and download transactions requires a process, and its completion requires a mechanism to verify.

User data is not stored on the chain because:

  • 1. It is not necessary, and data uploading to the chain is not a necessary condition for decentralized storage;
  • 2. If user data is stored on the chain as a transaction, it greatly increases the amount of data synchronized by the transaction. The amount of storage required by nodes and the time required to synchronize transactions greatly reduce the efficiency of consensus and increase the cost of nodes;
  • 3. In order to optimize the problem in 2), many optimizations like Arweave have been applied, increasing the complexity of the code, making the robustness of the entire network weaker.


Principle 2: The storage node is not a node of the Mises chain. Its revenue comes entirely from the user’s payment and share, and there is no mining reward.

Mining rewards for storage nodes will bring many disadvantages:

  • 1. Stimulate false storage demand, storage nodes constructing fake transactions to obtain rewards, filecoin is currently facing such a situation;
  • 2. In order to prevent nodes from defrauding mining rewards, a strict storage proof mechanism is bound to be introduced, and high gas fees will increase the cost of cheating, thereby increasing the complexity of entire system.
  • 3. A large number of tokens are rewarded for behaviors with no real value, the entire economic model is not optimized for the decentralized storage service with the lowest overall cost, for example, high gas fees are charged, It is very unfavorable to the real demand;
  • 4. A lot of investment in the early stage, if there is no real demand to support, it will inevitably be a fail in the later stage, hurting the entire industry.


The user’s payment for the storage service will not be directly paid to the storage node at one time. Storage generally has a certain storage service period. After the user pays, the MIS is managed by the Mises chain. The Mises chain node will periodically (one month) request the storage node for a storage certificate. After the node successfully proves, it will receive a one-month service fee, of which a small part of it will enter the staking pool of the storage node until the storage node meets the requirements of the staking pool.

Content consumers, creators and storage nodes form a content market, which sets prices for content. When consumers acquire content, they need to pay according to the price (or a third party pays for consumers, and the third party is often the operator of a certain service). Storage After the node completes the service, it can obtain 25% of the revenue, if this part of the revenue has exceeded its minimum service fee. Storage nodes can share the premium income of good content.

Principle 3: The Mises chain adopts the POS mechanism, and the chain nodes need to pledge MIS to obtain mining rewards and transaction gas income.

Please refer to the economic model section of the mises project white paper.

Principle 4: Mises chain gas fee should be kept low and stable relative to fiat currency Mises chain mainly supports decentralized social relations, decentralized storage services, decentralized social media/communities, they are not financial services, users cannot stand high gas fees, and try to avoid the possibility of excessive gas caused by currency price fluctuations.

A simpler solution is to consider the currency price for a period of time (3 months) each time the gas fee is calculated. The average currency price of the 9 time points in the last three months is p, and the target price of the gas fee is g. The number of MIS to be paid is m=g/p. If the gas fee is paid by the community or social media operator to help the user, the node will charge m MIS. If the user needs to pay personally, the wallet recommends m to the user, and the user can choose Increase or decrease the paid MIS, as long as there are nodes willing to pack it.

3. Build and Cost

Let’s first look at the two projects: filecoin and swarm.

Filecoin was launched by Protocol Labs, which developed IPFS and has a high reputation. In 2017, filecoin ICO sold 7.5% of Fil and raised 250 million US dollars, causing many investors around the world to invest in mining. According to a rough estimation, in 2020 when the filecoin mainnet went online in 2018, miners invested billions of dollars in hardware equipment such as server storage. After the mainnet goes online, due to miners’ staking, high gas fee burning and 180-day mining release mechanism, coupled with the entire crypto bull market in the first half of 2021, the price of Fil has been rising, attracting more miners to invest, and the current computing power It has reached 15.53EB (2022.02.16). With such a huge storage capacity, there is not much real demand. The income of miners depends entirely on the rewarded Fil. If filecoin does not use this time to develop real decentralized storage users, it will either attract more miners to join, or collapse. Miners must lose their huge investment in the early stage. We believe that it is basically impossible to develop such a huge demand for decentralized storage.

It was rumored that Swarm was a project promoted by Vitalik and Garvin Wood. It was also popular in the first half of 2020, with a large number of miners participating. At one time, there were more than 400,000 nodes and more than 200,000 active nodes. In addition to airdropping a total of 1 million BZZ to the nodes participating in the test, after the mainnet goes online, the reward of BZZ is very small, but the actual users use very little, and a large number of miners find that they can only get a small amount of BZZ, which is not enough to pay for the cost of server hosting. The price of BZZ did not skyrocket as expected, the popularity of swarm ebbed, and a large number of nodes were no longer active. This is a good thing for Swarm. By gradually developing users, reducing user usage costs, and optimizing user experience, it may find a way to gradually prosper.

Mises believes that the development of users cannot be achieved overnight. The development of nodes should not achieve false prosperity by hyping tokens. Decentralized storage should not be a financial service in nature. It is a key step for crypto to go out of the financial field. Whether it can succeed, it determines whether the crypto industry can grow dozens of times faster in the next 20 years.

4. Storage Proof

The storage service has a service period, and it cannot be an instantaneous transaction. During the service period, the storage node is generally required to provide verifiable storage proof. In particular, the network that rewards nodes for mining based on storage services requires strict storage service proofs. Otherwise, a large number of reward tokens will be obtained by the Wool Party by cheating, which will not achieve the effect of incentives. Filecoin’s PoRep and PoSt proofs are quite complex and use zk-SNARKs. However, if there is no mining reward, the incentive of cheating to obtain rewards disappears, the node gets more income from the real access service, and the node does not need too strict storage proof at all.

Mises’s solution is that during the storage service period, the nodes of the miess chain are not fixed at a fixed time every month.

The challenge storage node randomly asks it to provide a block of 4k data of a certain content, and the chain node calculates the hash of the block, and compares it with the Hash of the corresponding block in the Merkle tree of the content when the content is uploaded. If it is correct, it proves to be successful. The size of the data block can be configured. If the data block is too large, more data needs to be transmitted for each verification, and more calculations are required; if the data block is too small, the content of the merkle tree is too large.

The storage fee paid by the user will not be sent to the storage node at one time. According to the service time, after each storage proof is successful, the network will transfer a part of the node until the storage service ends. When the storage node is initialized, there is no need to provide staking tokens. After receiving the service fee, 20% of the staking token is withdrawn each time as the node’s staking token until the number of staking tokens reaches the requirement. If the data is accessed by other users, it is equivalent to a more complete storage proof. If it is found that the node does not save the complete data on the server, the network will deduct the node’s staking tokens to compensate paying users. The node can no longer provide services until the node completes the staking token.

5. Efficient retrieval

Decentralized storage is often addressed through the hash of the content, and retrieval is also decentralized. IPFS, SWARM and other projects have adopted Kademlia-based or improved protocols, but there are still problems: 1) Kademlia is based on the id of the node (ie is hash) to calculate the “distance” between nodes, therefore, when the retrieval message is transmitted, the actual Internet distance of the relayed nodes may be very far, for example, one in China and one in the United States, the network speed may be very slow. Even if the improved Coral is used, this problem cannot be eliminated; 2) Because the retrieval speed cannot be guaranteed, some applications with high real-time requirements cannot be used, such as tiktok;

Mises proposed a two-level index scheme. 1) All the nodes of the mises chain are also used as index nodes, that is, according to the hash value of the content, the index of hash : urls is established. The establishment of the index needs to go through the consensus mechanism of the mises chain, so that the index content of each node is consistent, and any user who connects to any index node will retrieve the same result. 2) The chain node can simply index the content according to the hash, for example: content id1, first retrieve the index of the chain nodes responsible for it, and then connect to one of the nodes to query; 3) When uploading some files, specify to generate a preview page , a URL that can access the file is directly generated in the preview webpage, which can be accessed without retrieving the user. Unless the URL is invalid, other copies in the decentralized network will be retrieved.

Content indexed by all nodes is the most efficient in retrieval, and only needs to be retrieved once, but the resources of nodes are limited, and higher retrieval costs should be paid by the content owner, which is especially suitable for applications that require high real-time performance. Mises’ solution allows the content to be retrieved with at most two queries, which greatly improves efficiency. In addition, mises does not divide the content into many chunks for storage, which greatly reduces the size of the index file, and fewer index nodes can serve more Finally, most of the access through the preview webpage does not require retrieval, which greatly reduces the end user’s demand for retrieval. Mises is suitable for services that require high user experience. In fact, it can achieve better user experience than centralized services. In the next section, we analyze it from the perspective of download.

6. Content distribution and CDN

1) Network distance

The Internet connects different types of networks in different countries and regions through routers. Generally speaking, accessing a server in another country and region will be slower than accessing a server in this region. There are three reasons:

  • 1. Accessing other countries and regions More routers are required for transit;
  • 2. Backbone network bandwidth between countries and regions is limited;
  • 3. Other non-technical reasons.

Routers also have different levels. The top-level router generally only exchanges routing information with the top-level router through the BGP protocol, and directly connects to the server of the top-level router, and its routing information is more easily known to other routers. If we regard the entire Internet as a graph, with routers as nodes, and the access speed of two directly connected routers as the distance between them, we can calculate the direct “distance” between any two computers on the Internet.

The IP address is assigned to each Internet operator in advance, and then it is assigned to each computer it accesses. Therefore, knowing the IP often also knows its approximate location on the Internet. The Internet can be divided into different AS domains. The computer networks in the same domain are relatively close, and the retrieval and storage services provided by servers in the same AS domain will greatly improve the efficiency. The index and storage nodes of mises will be grouped according to the AS domain. Provide a better user experience than centralized services.

The index nodes, storage nodes and users in the Mises network need to determine their own domain. Secondary index nodes in the same domain only index the content saved by the storage nodes in this domain. When an index node responds to a user’s retrieval request, it always returns the content saved by the storage node in the same domain as the user. Domains are not static. Nodes and nodes, users and nodes need to regularly check the access speed of their connected peers, so as to make the domain division as accurate as possible.

2) Content distribution

Principle 1: Keep content closest to your visitors.

There is no server on the Internet that can allow all visitors to quickly access, and applications with a large number of users need to purchase CDN services to improve user experience. The essence of CDN services is to keep the content closest to users. If the decentralized storage network has enough nodes, it does not need a third-party CDN service, and it can realize its own CDN by adjusting the storage distribution of content.

Principle 2: Visitors to content are mostly in the same area as content creators.

Mises users first connect to chain nodes, index nodes and storage nodes in the same region as themselves. When a user uploads content, the storage node in the same area is preferentially selected. The decentralized storage network has a content “replication” strategy. When the access of the content in a certain area increases, the storage node in the area can actively request to store the content, and the user does not need to pay the storage fee of the node. Users of the content get benefits, and nodes can no longer store the content at any time.

2B users who need CDN services can specify multiple areas for the content, and the number of storage nodes required for each area. 2B users need to pay storage fees and bandwidth fees for the content, and users who access the content do not need to pay.

3) DSM

Users (content supply users, content consumption users), storage nodes, and index nodes together form a storage market and a content consumption market.

Storage nodes quote according to unit storage space and time, it can see the quotes of all storage nodes, and

And you can adjust your offer at any time.

When a user stores personal data (referred to as user A), when A stores content, he can see the quotations of nodes in the same area. He can choose the storage node, or he can simply hand it over to the network for matching according to the market price.

When a user stores the content to be disseminated (referred to as user B), in addition to selecting a storage node, B also needs to set a price for consuming users to access the content, and the price cannot be lower than the minimum price specified by the storage node.

2B users are actually buying CDN services (suspended)

When a content consuming user accesses paid content, he first sees the preview page of the content, and then decides to pay to watch the full content. For the free content of 2B users, he can browse directly.

Does Decentralized Storage Not Moderate Content? Is it not possible to delete useruploaded content? What if the node or user violates the laws of the country where they are located? These issues are very important. Kazaa and BitTorrent have faced many legal lawsuits all over the world over the past 20 years due to pirated content. Studying these lawsuits will help us find solutions that not only protect users’ data, but also prevent nodes from breaking the law. situation.

Principle 1: User data for personal use only is not reviewed and cannot be deleted unless he is in arrears.

Mises encrypts and stores the user’s personal data, which can only be decrypted by the user through the private key. The node or any third party does not know what the content is, so it is impossible to audit the content. Since viewing the content requires the user’s private key, it is impossible for the user to disseminate the content, and the content cannot have social impact, so there is no need for censorship. The node also has no right to delete the user’s private content unless the user defaults on the storage service fee.

Principle 2: The content that the user wants to use for dissemination, after uploading the generated URL or previewing the web page, can be Mises browser or any browser access. When a user uploads content, if he wants a third party to access the content, mises will generate a URL for the content, the user can spread the URL through his own social media or community, and other users can access it through a browser; if the user wants to access the content Charge, mises will generate a preview page URL for the content, and other users will decide whether to pay the user MIS to view all the content after seeing the preview page. If you use the mises browser, if the URL is invalid, you can query the mises network through the hash id of the content to find the content saved by other nodes, and it can still be accessed.

Principle 3: The node can watch the content (via url or preview webpage) stored by the user for dissemination, and the node has the right to refuse to provide services for the content

Every country has laws that prohibit some content from being distributed on the Internet. For example, viewing child pornography is illegal in the United States. Nodes that operate for a long time cannot violate local laws, so nodes have the right to refuse to provide services for certain content. The node can view the content stored by the user for dissemination. If the content violates the law of the node’s location or the religious belief of the node operator, the node can refuse to file a denial of service on this ground. The decentralized network will look for another node to provide services (such as a node in another country) according to the reason for rejection. Then, the node can delete the content and return the MIS that has been collected, and the index node will be updated accordingly. If all nodes refuse to serve the content, the mises network will refund the MIS to the user after deducting the gas cost.

In Mises’ scheme, the node is the legal responsible person for the content. The node is not outside the laws of the country where it is located, but there are more than 200 countries in the world, and different countries have different laws. It is this diversity that allows different content to have the opportunity to find the largest living space through the decentralized network.