Today's ARIN estimated depletion date:


The IPv4 depletion disclaimer

2009-05-30

Introduction

We all know it, the pool of unallocated IPv4 is steadily running out and the future path is to migrate to IPv6. It is no longer a matter of if but rather just a question of when to be ready for the new protocol. It is vital that we accurately can predict the IPv4 depletion date so that stakeholders can plan their migration to IPv6 accordingly.

The methods being used to predict the IPv4 depletion dates vary from relatively straightforward curve fitting [1] to complex simulation approaches [2]. Perhaps the most known and most comprehensive study is Geoff Huston’s, IPv4 address report [3].  His estimates are often used by the industry as the leading prediction of the depletion date. My efforts in the area of IPv4 depletion have been focused around creating a publicly available prediction tool and a report.

Huston’s calculations and data have been the main inspiration for my work. Without his collection of data, my work would not be possible. I had an opportunity to discuss IPv4 depletion with him at the ARIN XXIII meeting in San Antonio, TX in April ’09. He gave me some insights in how IANA works and we discussed some issues that I had found lately with the 188/8 block. With the insight from each other, we both made changes to our predictions afterwards.

It is beneficial for the Internet community that we now have at two IPv4 depletion estimates that both point in the same direction, even if they differ somewhat. We both agree that the IANA pool will be depleted in less than 2 years from now. But why is there a 7 months difference between my calculations and Huston’s calculations?

To verify my calculations I had to look into what differed between us. This article describes what I found. I thought it would be a good idea to inform the community that we are going to run out of IPv4 addresses earlier than the most referenced prediction claims.

My intent has never been to find issues with somebody else’s work. My findings described in this article are a result of my curiosity when I realized that my estimates came out earlier than Huston’s. Furthermore, I was troubled over the fact that Huston’s predicted “time until depletion” got higher as days went buy. See figure 1.

image002

# Suggested IANA depletion date Date of prediction Days until depletion
1. 2 Apr. 2011 28 Feb 2009 763
2. 22 Aug. 2011 30 Apr. 2009 844
DIFF 142 61

+81

Figure 1

In my opinion, the “time until depletion” can perhaps come to a standstill for a while if we have a low consumption rate. But I had to question the validity of a projection that for a long time moves “time until depletion” into the future.

The IPv4 depletion tool

The IPv4 depletion tool has been available online at www.ipv4depletion.com for about a year. However, it was first recently, at the Google IPv6 Implementors Conference [4] in March ’09 and then at the Rocky Mountain IPv6 Summit in April ’09 [5], that the tool was announced to a broader audience.  The goal with the tool was to make as many variables as possible selectable to the user. This flexibility allows anybody to create their own estimate without detailed knowledge of underlying mathematics and allocation policies. I’m also tracking the result of my tools output with my default settings and comparing those to other experts’ research.

When I built my IPv4 depletion tool I had to make a choice. I could either base my tool around the algorithms and source code from Huston or I could implement a completely new calculations engine. I decided to create a new engine, so that I could compare my findings with Hustons and see how and if they differ.

The tool have been active for approximately a year now and the results consistently points to an IANA depletion date by the end of next year. When tracking my predictions over a period of time, one can see how large allocations show up like steps. Days without any significant allocations slowly moves the prediction date forward. See figure and table 2 below. The fact that each change in the prediction can be traced back to an actual allocation event attests for the validity of the prediction.

image004

# Date Network Requester Size(number of IP) Impact (days)
1 18 Dec. 2008 174.192.0.0/10 Verizon Wireless 4,194,304 18
2 12 Feb. 2009 112.224.0.0/11 China Unicom Shandong province 2,097,152 12
3 9 Apr. 2009 110.96.0.0/11 China TieTong Telecommunications 2,097,152 5
4 6 May 2009 111.0.0.0/10 China Mobile 4,194,304 18

Figure / Table 2, Example of large allocations and their impact on the overall prediction.

Top down / Bottom up approach

My projections are calculated individually per RIR in a bottom up approach. The first thing I calculate is the usage and pools of IPv4 addresses in each region. If a certain threshold is met, the program will simulate that the RIR in question allocates additional space from IANA.

Huston is calculating the depletion dates in a top down approach. The first thing he calculates is to estimate the IANA pool depletion by looking at the IANA allocation history. The first thing he calculates is the summarized address consumption for all RIRs. He then breaks this up to the individual RIRs based on a calculated ratio between the RIRs. The problem with that approach is that it incorrectly takes into account all allocations in all regions between the RIR and a member. This is not how it works in reality.

For example, I don’t assume that AfriNIC will make any more requests to IANA. Therefore any allocations in that region have no impact on the overall IANA depletion date in my model (unless there is an insane growth in the allocations in Africa forcing AfriNIC to request more space from IANA). With my calculations, APNIC comes out as being the RIR that will make the final allocation from IANA. So right now, the complete system is defined by the allocations being made in the APNIC region. This can however change if one of the RIRs has an unexpected drop or uptake in allocations. RIPE is predicted to make their final allocation just two weeks before APNIC. If RIPEs allocation rate continues to be low, we might soon see them defining the system.

Another issue with a top down approach that uses IANA data is that it estimates the IANA consumption with a continuous line where in reality the IANA consumption made up of discrete allocate events. The continuous line will create a significant rounding error for the IANA depletion date.  See figure 3 below. Huston’s calculation is however not affected by this as he is not using the IANA consumption statistics in his model.

image006
Figure 3

Mathematical model

I’m using a mix of linear and exponential models. Exponential models have an unfair bad reputation for growing too quickly. However, when fitted correctly to the data one is estimating, they produce very accurate and relevant results. Several phenomena in the biology, physics and economy (bacterial growth, nuclear chain reactions and Madoff’s ponzi scheme) can be described by an exponential model.

Furthermore, an exponential model is consistent with Reed’s law on how networks can be valued [6]. And the size of the Internet will grow proportionally to the value of the Internet. The higher value of the network, the more devices and users will be connected.

Huston is using a polynomial model. Although Metcalfe used a polynomial function in his law describing the value of a network [7], I tend to agree with Dr. Reed’s criticism of the polynomial model as an estimator for the value of a network. I don’t think it is relevant to use a polynomial model in this context.

There are also some additional problems with a polynomial model that needs to be taken care of if one chooses to use it. The problem is that the model can bend and create a decelerating growth that is slower than a linear growth. This seems to have happened lately with Huston’s prediction. For example, when looking at figure 18b (http://www.potaroo.net/tools/ipv4/fig18.png) you can see that the light blue line that represent the polynomial growth are slower than the green line representing linear growth. This gives you an idea on what can happen with polynomial models.

Least Square fitting

I have implemented three different least square fitting algorithms in my program. One for linear, one for exponential and one for polynomial (however, the polynomial is not being used because of the issues identified and described above).

Huston has only implemented a linear least square fitting model in his program. He is using the first order differential of his input to create a 2nd order polynomial fitting model. This is however mathematically incorrect as he is losing one degree of freedom. His result from this is in the form Ax2 + Bx and not Ax2 +Bx + C.

To make up for the lost C in the formula above, he estimates the C with the current usage (that is how I understand his code) [8]. This glues his future estimates nicely to the historical data, he estimates C by minimizing the error to the original data series, but is again not mathematically correct. Instead he should have implemented a polynomial least square fit algorithm that would have given him all three variables.

There is a similar problem in his exponential function that he uses to break up the summarized view into each individual RIR. Instead of implementing an exponential least square fit model, he is using the logarithm of his values and runs those through his linear fit model. This is will not produce the expected result as larger Y-values are incorrectly penalized [9].

Smoothing the data

I’m using the raw data from the individual delegated files from each RIR.

Huston is smoothing his data with a three pass of a sliding window smoothing function. Smoothing before least square fitting should not be done as it destroys the Gaussian properties of the data. The residue is no longer independent. Books on the subject of regression analysis warn of smoothing the data before applying least square fit [10].

RIR pool estimates

My model for when and how much each RIR should allocate from IANA implements the policy as it is written with a dynamic model for when the RIR requests more space [11]. I argue that the policy says that we are going to see the RIRs allocate earlier and earlier as their demand grow and that the combined RIR pool will be over 20 blocks at depletion date.

We might also see a small rush for the last IANA blocks, resulting in a pumped up RIR pool at IANA depletion date. Discussing an IANA to RIR rush is perhaps more controversial than a discussing a “big/bad ISP” to RIR rush. However, it is not a completely unrealistic thought that the RIRs will be pretty “trigger happy” to claim their perceived fair share of addresses as the IANA pool diminishes.

Huston uses a static low threshold for when RIRs request more space from IANA. This static low threshold model seems to underestimate the RIR pool sizes at the IANA depletion date. In his model the pools at the IANA depletion date is merely around 17.5 blocks. We can see this in the last saw tooth of the green line in figure 29g (http://www.potaroo.net/tools/ipv4/fig29g.png). The saw tooth is dipping down to about 17.5 blocks.  I argue that this estimate is too low and that it affects the estimate of the IANA depletion date by delaying it with about 3 months.

Regional set aside policies

Each region have decided on or are discussing a policy where a fairly large amount of IPv4 addresses are set aside and will only be allocated in smaller chunks with a waiting period between each request [12]. This means that in practice that you cannot take for granted that those last IPv4 addresses will be available for you. My tool is taking those policies into account, Huston’s prediction does not.

Conclusion

With correct mathematics applied, one can conclude that current research of the IPv4 depletion underestimates the remaining time until the free IPv4 pool get depleted.

The remaining IANA pool (at the time this article was written) consists of 25 x unallocated /8 blocks and 5 x reserved /8 blocks that are being saved for the so called N=1 policy [13]. Given that fact that there are only 25 blocks remaining in the IANA, one can conclude that there are actually only 13 remaining allocations to be made from IANA to the RIRs. Most research and estimates of the date for the forthcoming IPv4
depletion date come with a disclaimer such as “Do not believe any dates this program tells you” or “This article does not attempt to encompass such ambitious forms of prediction”. But as we are getting closer to the IANA depletion date it becomes easier to predict what the end game will look like. With the right mathematical models, it is far from impossible to make an accurate estimate of the depletion date. Those disclaimers are no longer needed with correct mathematics applied to the problem.

The nice thing about statistics is that it doesn’t require a huge sample size to make a accurate prediction. This is for example used for election exit polls. Unless there is a very close call, exit polls tend to be very reliable. When the same type of statistics is applied to the IPv4 depletion problem, the outcome is a very reliable prediction of the depletion date.

I can promise that we will not make it longer than the end of 2011. In fact, I don’t even believe that we are going to make it to the end of next year. My research suggests that the IANA pool will be depleted by 30 October 2010. And there is no disclaimer, other than “don’t sue me”.

A daily updated estimate can be found under the report tab at www.ipv4depletion.com.

About the author

Stephan Lagerholm is an IPv6 and IT-security expert with over 11 years of international and management experience. His background includes leadership positions at the largest networking and security system integrator in Scandinavia, and managing the design of hundreds of complex IT-networks.

Stephan is the founder of Scandinode (www.scandinode.com), a consulting organization based in Dallas, TX that provides networking and security advice and researches IPv6 and the depletion of IPv4. One of his recent engagements was with InfoWeapons Inc., a worldwide leader that creates next-generation, fully IPv6-compliant DNS and DHCP products. He is CISSP certified and holds a Master of Science degree in Computer Science and Mathematics from Uppsala University in Sweden. Stephan is the chairman of the Texas IPv6 Task Force (txv6tf.org).

[1] Tony Hain – http://www.tndh.net/~tony/ietf/ipv4-pool-combined-view.pdf

[2] Murphy/Wilson – http://www.ripe.net/ripe/meetings/ripe-55/presentations/murphy-simlir.pdf

[3] The IPv4 address report – http://www.potaroo.net/tools/ipv4/index.html

[4] Google IPv6 implementors conference – http://sites.google.com/site/ipv6implementors/conference2009/agenda

[5] Rocky Mountain IPv6 summit – http://www.rmv6tf.org/IPv6Summit.htm

[6] Reed’s law – http://hbr.harvardbusiness.org/2001/02/the-law-of-the-pack/ar/1

[7] Metcalfe’s Law – http://en.wikipedia.org/wiki/Metcalfes_Law

[8] Potaroo ipv4.c line 1124 – http://www.potaroo.net/tools/ipv4/ipv4.c

[9] Minimizing exponential least square – http://mathworld.wolfram.com/LeastSquaresFittingExponential.html

[10] Smoothing of data – “Fitting models to biological data using linear and nonlinear regression” By Harvey Motulsky and Arthur Christopoulos, p 20.

[11] IANA allocation policy – http://www.icann.org/en/general/allocation-IPv4-rirs.html

[12] Set aside policies – See Arin proposal 2009-2 and LACNIC policy 2008-04 for examples.

[13] IANA N=1 policy – http://www.icann.org/en/general/allocation-remaining-ipv4-space.htm

9 Responses to “The IPv4 depletion disclaimer”

  1. Geoff Huston says on :

    If you are going to describe my work, please do me the courtesy of describing it accurately.

    “Huston is calculating the depletion dates in a top down approach. The first thing he calculates is to estimate the IANA pool depletion by looking at the IANA allocation history. He then breaks this up to the individual RIRs based on a calculated ratio between the RIRs. The problem with that approach is that it incorrectly takes into account all allocations in all regions between the RIR and a member. This is not how it works in reality.” -= sorry – thats just incorrect – the model I use simply does NOT do it that way.

    “Huston has only implemented a linear least square fitting model in his program. He is using the first order differential of his input to create a 2nd order polynomial fitting model. This is however mathematically incorrect as he is losing one degree of freedom. His result from this is in the form Ax2 + Bx and not Ax2 +Bx + C.” – Again this is just wrong. see the material at the URL below for more details.

    “Huston is smoothing his data with a three pass of a sliding window smoothing function. Smoothing before least square fitting should not be done as it destroys the Gaussian properties of the data. The residue is no longer independent. ” The problem with least squares is that it places undue emphasis on outliers in the data set. Smoothing reduces the influence of the outlier on the entirety of the data set.

    I’m pretty unhappy with this rather poor and inaccurate representation of my work. If you are going to take this type of approach of direct criticism, and its one that I take a pretty dim view of considering that you have chosen to misrepresent the work, then you should read http://www.potaroo.net/ispcol/2009-05/ipv4model.html first.

  2. admin says on :

    Hi Geoff and thanks for your comments,

    I will certainly change any inaccurate description that I have made of your work.

    We discussed some of the issues when we met in San Antonio. I followed up with an e-mail to you on April 29th where I expressed my concern around some of the potential issues I had found in your report so I certainly welcome a debate around the topic. It would be interesting to hear your opinion why you think we are 7 months apart in our prediction.

    Let me clarify each of the topics that you commented on:

    Top down approach
    —————–
    I claimed that you used a top down approach. I claimed this based of the following found in the IPv4 address report:

    “It is then possible to take these three best fit data series, and extrapolate their data forward in time until the point where all available address space has been allocated by the IANA and no further unallocated address pool remains”

    My assumption was that you used the output of this least square fitting over IANA in your further calculations. Now when you point it out, I realize that you are actually never using the outcome of this calculation and that it is irrelevant for your overall model. I’m sorry for that, my mistake.

    Least square fitting model
    ————————–
    I still disagree with this way of calculating the second order polynomial least square. There is no way of correctly knowing the constant C without using a second order least square algorithm. Fixating A and B and then estimate C by looking how it produces the least error to the original data series will not produce the same result. I experimented with the algorithm below (written in Perl) before I stopped using polynomial models. This algorithm returns all three constants needed to correctly fit a second order polynomial to the set of data.

    sub polynomial2_fit(@) {
    $P = $Q = $R = $S = $T = $U = $V = 0;
    $n = scalar(@_);
    $x = 0;
    foreach $y (@_) {
    $P += $x;
    $Q += $x * $x;
    $R += $x * $x * $x;
    $S += $x * $x * $x * $x;
    $T += $y;
    $U += $x * $y;
    $V += $x * $x * $y;
    $x++;
    }
    $W = $n * $Q * $S + 2 * $P * $Q * $R – $Q*$Q*$Q – $P*$P*$S – $n*$R*$R;
    $a = ($n*$Q*$V + $P*$R*$T + $P*$Q*$U – $Q*$Q*$T – $P*$P*$V – $n*$R*$U)/$W;
    $b = ($n*$S*$U + $P*$Q*$V + $Q*$R*$T – $Q*$Q*$U – $P*$S*$T – $n*$R*$V)/$W;
    $c = ($Q*$S*$T + $Q*$R*$U + $P*$R*$V – $Q*$Q*$V – $P*$S*$U – $R*$R*$T)/$W;
    return $a,$b,$c;
    }

    Smoothing
    ———
    I have to disagree on this. Smoothing should NOT be done before least square is performed. Linear regression is a kind of smoothing function in itself and no extra smoothing is needed. One can under some circumstances remove significant outliers in the sample, but that is not the same thing as smoothing the data. I’m sure that Carl Friedrich Gauss would have included smoothing in his algorithm if it would increase the accuracy.

    /Stephan

  3. Ninho says on :

    This is not going to be a direct comment on your work, rather a noobie wondering : couldn’t we augment the IPv4 assignable stock by including addresses from 241.0.0.0 to 254.255.255.255 (leaving out the 240/8 for multicast, which should suffice) ?

  4. Ninho says on :

    corrected (oops, sorry) I meant even starting from 225.0.0.0, reserving 224/8 only for multicast addresses. That would free thirty fresh /8s !

  5. admin says on :

    Hi Ninho,

    I don’t think the Multicast (224/4) blocks could be used for unicast, many system treat the multicast addresses differently.

    The so called E-blocks (240/4) could potentially be reassigned to unicast, however some systems treat the e-blocks as being illegal and will not accept a configuration with them. See http://www.cisco.com/web/about/ac123/ac147/archived_issues/ipj_8-3/ipv4.html for a detailed desription.

    Thanks, S

  6. Ninho says on :

    I suspected some systems out of the box would not accept the proposed addresses for unicast – but it still seems that would be an unusually light patch to any reluctant OS.

    Thank you very much for pointing me to the very instructive page at Cisco’s !

    Regards…

  7. admin says on :

    Agree,

    I would suggest that:

    240/6 (4 blocks) is changed to RFC 1918 blocks by IANA. There are ISPs today that are so large that they are running out of RFC1918 space. Those ISP could use this space internally instead of allocating new usable space like they do today.

    244/6 (4 blocks) is changed to UNALLOCATED and that IANA urges all vendors to make sure that their equipment can route those addresses. Those 4 blocks can be used last, when everything else is depleted, that should be sometime in 2012, perhaps most vendors have patched their systems by then.

    The remaining 248/5 (8 blocks) remains unallocated for future use.

  8. Jörgen says on :

    Starting to use the D and E-blocks is a bad path. It is still a finite number of addresses! The problem right now is not technical but rather to get the train moving. I would say that it is better to run out of v4 right here and now and face the pain than to prolong the agony… :-)

  9. admin says on :

    Jörgen,

    I disagree. We should have gotten the train moving 5 years ago, now it is too late. Any effort to prolong the lifetime of IPv4 is good. The alternative is an unstable Internet. Perhaps those last IPv4 addresses should be coupled with some type of agreement that the allocator must start migrating to IPv6? Also, there are ISPs out there using real IPv4 addresses internally just because they run out of 10.0.0.0/8, 192.168.0.0/16 and friends. See http://www.nanog.org/meetings/nanog37/presentations/alain-durand.pdf

    It is ridiculous that 73/8 is used in this wasteful way. Why not use part of the E-network?

    Thanks, S

Leave a Reply