Re: [ANN] Storj - Decentralized Storage

Quote from: PulsedMedia on June 11, 2014, 02:13:15 AM

didn't notice that - just learned of storj.

That article tho - you move directly from storage pricing to compare storage pricing against transfer pricing? What?!?
Now that doesn't make - since when did transit capacity become storage capacity?

Apples to Oranges.

Now, the going rate per TiB per month at lowest levels on current dedicated market is 7.5 for low end servers, and around 6.2 for very large nodes.
Hence, 100Gb costs a month somewhere around the average of 7.25 when using the current most cost effective offers out there.

and for someone like me who has his own DC... Well, it's on the CAPEX and not OPEX side of the sheet, but turned to OPEX... Well, it's a fraction of that

Depends on what kind of deals you got, what bulk pricing you get, do you outright buy it or do you lease or rent it, what level of hardware are you utilizing etc. If you buy really big, and seriously skimp on the quality level of hardware we are talking about under 1.5/TiB/Mo potentially, plus bandwidth fees.
So approximately 0.15 for 100GiB/Mo with current HDD pricing at cost level, at high efficiency operation.

So, on a system like this, we could potentially see pricing to fall to about quadruple to that fairly easily, when the storage providers start to outweigh user base, we will see it to drop around just 25-30% markup.

If average lifetime of data is 1year, and it is read 4 times during this time, we have 3months timespan, hence with 1Gbps you can host about 790TiB. 1Gbps connected server never really achieves more than 85-95% of the maximum, infact, no link can do faster, since usually about 5% is spent on error correction.

The true ratios of read, cold data timespans etc. will only be revealed on production, it's totally impossible to predict.

What kind of erasure coding is in the plans?

Further, deduplication on GRAND scale would be worth it on something like this, on blocks/files larger than 1000Mb or something like that, then the hash tables will stay somewhat sanely sized. but that can be worked in later on... Just saying, deduplication along with erasure coding in this kind of system are not only just a GREAT idea, but almost necessary. Further, the dedup count could increase redundancy factor for that data as well, dynamically making data which is of interest to more people, more resilient to failures, and the cost could be shared by all those users who wanted to store that data, thus driving the cost dramatically down for those putting in data others have put in as well.
Those dedup tables will still consume INSANE amounts of storage - but hey, that's what's abundantly available

It would still work rather fast with nicely optimized lookup trees, split those hashes up! Wink

a single look up would consume only a couple of I/Os, needs some careful thought to see the quantity of I/Os and to minimize them.

Just because it's cloud, resources ought not to be wasted Wink

EDIT: Oh yeah, and i'm sorry to say, but reading that article made the impression that the writer doesn't know sh** about industrial scale computing, at least explain the reason why comparing apples to oranges, and assumptions that 1Gbps can actually do 1Gbps tells writer had no idea of network technology. In all my 15 or so years in hosting industry i've probably never seen above 120M/s for 1Gig link, and over internet never above 116M/s, and even that is a freakishly rare occurence, average node, average network seems to stall at 95-105M/s commonly.

We are using free data sources for our prototype application, so transit capacity does actually become data storage capacity. As we start to overload these sources it should be clear that the rest of that becomes profit would be split among that actually people who store the data (at whatever rates the market decides).

We want to build legacy APIs to have integrations with existing platforms. If we can make this easy enough we don't see the providers outpacing the consumers as quickly as one might think.

As far as erasure coding zfec looks best because we are a Python shop right now. We plan to implement multiple data sources, so Tahoe-LAFS integration would probably be a nice little fallback if you want proper erasure coding.

1 Gbps = 1 Gbps is just a simplification for describing the cost disparity. If you could give us some good sources on that we can have that updated.