Title: | Analyze System Scalability with the Universal Scalability Law |
---|---|
Description: | The Universal Scalability Law (Gunther 2007) <doi:10.1007/978-3-540-31010-5> is a model to predict hardware and software scalability. It uses system capacity as a function of load to forecast the scalability for the system. |
Authors: | Neil J. Gunther [aut], Stefan Moeding [aut, cre] |
Maintainer: | Stefan Moeding <[email protected]> |
License: | BSD_2_clause + file LICENSE |
Version: | 3.0.3 |
Built: | 2025-02-22 03:40:10 UTC |
Source: | https://github.com/smoeding/usl |
The Universal Scalability Law is a model to predict hardware and software scalability. It uses system capacity as a function of load to forecast the scalability for the system.
Use the function usl
to create a model from a formula and
a data frame.
The USL model produces two coefficients as result: alpha
models the
contention and beta
the coherency delay of the system.
The Universal Scalability Law has been created by Dr. Neil J. Gunther.
Neil J. Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer, Heidelberg, Germany, 1st edition, 2007.
Estimate confidence intervals for one or more parameters in a USL model. The intervals are calculated from the parameter standard error using the Student t distribution at the given level.
## S4 method for signature 'USL' confint(object, parm, level = 0.95)
## S4 method for signature 'USL' confint(object, parm, level = 0.95)
object |
A USL object. |
parm |
A specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
The confidence level required. |
Bootstrapping is no longer used to estimate confidence intervals.
A matrix (or vector) with columns giving lower and upper confidence limits for each parameter. These will be labelled as (1-level)/2 and 1 - (1-level)/2 in % (by default 2.5% and 97.5%).
require(usl) data(specsdm91) ## Create USL model usl.model <- usl(throughput ~ load, specsdm91) ## Print confidence intervals confint(usl.model)
require(usl) data(specsdm91) ## Create USL model usl.model <- usl(throughput ~ load, specsdm91) ## Print confidence intervals confint(usl.model)
The efficiency of a system expressed in terms of the deviation from linear scalability.
## S4 method for signature 'USL' efficiency(object)
## S4 method for signature 'USL' efficiency(object)
object |
A USL object. |
The function returns a vector which contains the deviation from linearity
for every measurement of the model input. A value of 1
indicates
linear scalability while values less than 1
correspond to the
fraction of the measurement compared to linear scalability.
A vector of numeric values.
Neil J. Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer, Heidelberg, Germany, 1st edition, 2007.
require(usl) data(raytracer) ## Show the efficiency efficiency(usl(throughput ~ processors, raytracer))
require(usl) data(raytracer) ## Show the efficiency efficiency(usl(throughput ~ processors, raytracer))
Calculate the scalability limit for a specific model.
## S4 method for signature 'USL' limit.scalability(object, alpha, beta, gamma)
## S4 method for signature 'USL' limit.scalability(object, alpha, beta, gamma)
object |
A USL object. |
alpha |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
beta |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
gamma |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
The scalability limit is defined as:
This is the upper bound (Amdahl asymptote) of system capacity.
The parameters alpha
, beta
and gamma
are useful to do a
what-if analysis. Setting these parameters override the model parameters and
show how the system would behave with a different contention or coherency
delay parameter.
The scalability limit is undefined if alpha
is zero.
This function accepts an argument for beta
although the value is not
required to perform the calculation. This is on purpose to provide a
coherent interface.
A numeric value for the system capacity limit (e.g. throughput).
usl
,
peak.scalability,USL-method
optimal.scalability,USL-method
require(usl) data(specsdm91) limit.scalability(usl(throughput ~ load, specsdm91)) ## The throughput limit is about 3245
require(usl) data(specsdm91) limit.scalability(usl(throughput ~ load, specsdm91)) ## The throughput limit is about 3245
Calculate the point of optimal scalability for a specific model.
## S4 method for signature 'USL' optimal.scalability(object, alpha, beta, gamma)
## S4 method for signature 'USL' optimal.scalability(object, alpha, beta, gamma)
object |
A USL object. |
alpha |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
beta |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
gamma |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
The point of optimal scalability is defined as:
Below this point the existing capacity is underutilized. Beyond that point the effects of diminishing returns become visible more and more.
The value can be constructed graphically by projecting the intersection of the linear scalability bound and the Amdahl asymptote onto the x-axis.
The parameters alpha
, beta
and gamma
are useful to do a
what-if analysis. Setting these parameters override the model parameters and
show how the system would behave with a different contention or coherency
delay parameter.
The point of optimal scalability is undefined if alpha
is zero.
This function accepts a arguments for beta
and gamma
although
the values are not required to perform the calculation. This is on purpose
to provide a coherent interface.
A numeric value for the load where optimal scalability will be reached.
usl
,
peak.scalability,USL-method
limit.scalability,USL-method
require(usl) data(specsdm91) optimal.scalability(usl(throughput ~ load, specsdm91)) ## Optimal scalability will be reached at about 36 virtual users
require(usl) data(specsdm91) optimal.scalability(usl(throughput ~ load, specsdm91)) ## Optimal scalability will be reached at about 36 virtual users
A dataset containing performance data for an Oracle OLTP database measured
between 8:00am and 8:00pm on January, 19th 2012. The measurements were
recorded for two minute intervals during this time and a timestamp indicates
the end of the measurement interval. The performance metrics were taken from
the v$sysmetric
family of system performance views.
A data frame with 360 rows on 8 variables
The Oracle database was running on a 4-way server.
The data frame contains different types of measurements:
Variables of the "time" type are expressed in seconds per second.
Variables of the "rate" type are expressed in events per second.
Variables of the "util" type are expressed as a percentage.
The data frame contains the following variables:
timestamp
The end of the two minute interval for which the
remaining variables contain the measurements.
db_time
The time spent inside the database either working on
a CPU or waiting (I/O, locks, buffer waits ...). This time is expressed
as seconds per second, so two sessions working for exactly one second
each will contribute a total of two seconds per second of db_time
.
In Oracle this value is also known as Average Active Sessions
(AAS).
cpu_time
The CPU time used during the interval. This is also
expressed as seconds per second. A 4-way machine has a theoretical
capacity of four CPU seconds per second.
call_rate
The number of user calls (logins, parses, or
execute calls) per second.
exec_rate
The number of statement executions per second.
lio_rate
The number of logical I/Os per second. A logical
I/O is the Oracle term for a cache hit in the database buffer cache.
This metric does not indicate if an additional physical I/O was
necessary to load the buffer from disk.
txn_rate
The number of database transactions per second.
cpu_util
The CPU utilization of the database server in
percent. This was also measured from within the database.
overhead
calculates the overhead in processing time for a system
modeled with the Universal Scalability Law.
It evaluates the regression function in the frame newdata
(which
defaults to model.frame(object)
). The result contains the ideal
processing time and the additional overhead caused by contention and
coherency delays.
## S4 method for signature 'USL' overhead(object, newdata)
## S4 method for signature 'USL' overhead(object, newdata)
object |
A USL model object for which the overhead will be calculated. |
newdata |
An optional data frame in which to look for variables with which to calculate the overhead. If omitted, the fitted values are used. |
The calculated processing times are given as percentages of a non-parallelized workload. So for a non-parallelized workload the ideal processing time will always be given as 100% while the overhead for contention and coherency will always be zero.
Doubling the capacity will cut the ideal processing time in half but
increase the overhead percentages. The increase of the overhead depends on
the values of the parameters alpha
and beta
estimated by
usl
.
The calculation is based on A General Theory of Computational Scalability Based on Rational Functions, equation 26.
overhead
produces a matrix of overhead percentages based on
a non-parallelized workload. The column ideal
contains the ideal
percentage of execution time. The columns contention
and
coherency
give the additional overhead percentage caused by
the respective effects.
Neil J. Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer, Heidelberg, Germany, 1st edition, 2007.
Neil J. Gunther. A General Theory of Computational Scalability
Based on Rational Functions. Computing Research Repository, 2008.
http://arxiv.org/abs/0808.1431
require(usl) data(specsdm91) ## Print overhead in processing time for demo dataset overhead(usl(throughput ~ load, specsdm91))
require(usl) data(specsdm91) ## Print overhead in processing time for demo dataset overhead(usl(throughput ~ load, specsdm91))
Calculate the point of peak scalability for a specific model.
## S4 method for signature 'USL' peak.scalability(object, alpha, beta, gamma)
## S4 method for signature 'USL' peak.scalability(object, alpha, beta, gamma)
object |
A USL object. |
alpha |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
beta |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
gamma |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
The peak scalability is the point where the throughput of the system starts to go retrograde, i.e., starts to decrease with increasing load.
The parameters alpha
, beta
and gamma
are useful to do a
what-if analysis. Setting these parameters override the model parameters and
show how the system would behave with a different contention or coherency
delay parameter.
See formula (4.33) in Guerilla Capacity Planning.
This function accepts an argument for gamma
although the value is
not required to perform the calculation. This is on purpose to provide a
coherent interface.
A numeric value for the point where peak scalability will be reached.
Neil J. Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer, Heidelberg, Germany, 1st edition, 2007.
usl
,
optimal.scalability,USL-method
limit.scalability,USL-method
require(usl) data(specsdm91) peak.scalability(usl(throughput ~ load, specsdm91)) ## Peak scalability will be reached at about 96 virtual users
require(usl) data(specsdm91) peak.scalability(usl(throughput ~ load, specsdm91)) ## Peak scalability will be reached at about 96 virtual users
Create a line plot for the scalability functionh of a Universal Scalability Law model.
## S4 method for signature 'USL' plot( x, from = NULL, to = NULL, xlab = NULL, ylab = NULL, bounds = FALSE, alpha, beta, ... )
## S4 method for signature 'USL' plot( x, from = NULL, to = NULL, xlab = NULL, ylab = NULL, bounds = FALSE, alpha, beta, ... )
x |
The USL object to plot. |
from |
The start of the range over which the scalability function will be plotted. |
to |
The end of the range over which the scalability function will be plotted. |
xlab |
A title for the x axis: see |
ylab |
A title for the y axis: see |
bounds |
Add the bounds of scalability to the plot. This always
includes the linear scalability bound for low loads. If the contention
coefficient |
alpha |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
beta |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
... |
Other graphical parameters passed to plot
(see |
plot
creates a plot of the scalability function for the model
represented by the argument x
.
If from
is not specified then the range starts at the minimum value
given to define the model. An unspecified value for to
will lead
to plot ending at the maximum value from the model. For add = TRUE
the defaults are taken from the limits of the previous plot.
xlab
and ylab
can be used to set the axis titles. The defaults
are the names of the regressor and response variables used in the model.
If the parameter bounds
is set to TRUE
then the plot also
shows dotted lines for the theoretical bounds of scalability. These are
the linear scalability for small loads and the Amdahl asymptote for the
limit of scalability as load approaches infinity.
The parameters alpha
or beta
are useful to do a what-if
analysis. Setting these parameters override the model parameters and show
how the system would behave with a different contention or coherency delay
parameter.
require(usl) data(specsdm91) ## Plot result from USL model for demo dataset plot(usl(throughput ~ load, specsdm91), bounds = TRUE, ylim = c(0, 3500))
require(usl) data(specsdm91) ## Plot result from USL model for demo dataset plot(usl(throughput ~ load, specsdm91), bounds = TRUE, ylim = c(0, 3500))
predict
is a function for predictions of the scalability of a system
modeled with the Universal Scalability Law. It evaluates the regression
function in the frame newdata
(which defaults to
model.frame(object)
). Setting interval
to "confidence
"
requests the computation of confidence intervals at the specified
level
.
## S4 method for signature 'USL' predict( object, newdata, alpha, beta, interval = c("none", "confidence"), level = 0.95 )
## S4 method for signature 'USL' predict( object, newdata, alpha, beta, interval = c("none", "confidence"), level = 0.95 )
object |
A USL model object for which prediction is desired. |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
alpha |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
beta |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
interval |
Type of interval calculation. Default is to calculate no confidence interval. |
level |
Confidence level. Default is 0.95. |
The parameters alpha
or beta
are useful to do a what-if
analysis. Setting these parameters override the model parameters and show
how the system would behave with a different contention or coherency delay
parameter.
predict
internally uses the function returned by
scalability,USL-method
to calculate the result.
predict
produces a vector of predictions or a matrix of
predictions and bounds with column names fit
, lwr
, and
upr
if interval
is set to "confidence
".
Neil J. Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer, Heidelberg, Germany, 1st edition, 2007.
usl
, scalability,USL-method
,
USL-class
require(usl) data(raytracer) ## Print predicted result from USL model for demo dataset predict(usl(throughput ~ processors, raytracer)) ## The same prediction with confidence intervals at the 99% level predict(usl(throughput ~ processors, raytracer), interval = "confidence", level = 0.99)
require(usl) data(raytracer) ## Print predicted result from USL model for demo dataset predict(usl(throughput ~ processors, raytracer)) ## The same prediction with confidence intervals at the 99% level predict(usl(throughput ~ processors, raytracer), interval = "confidence", level = 0.99)
USL
"print
prints its argument and returns it invisibly (via
invisible(x)
).
## S4 method for signature 'USL' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S4 method for signature 'USL' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
An object from class |
digits |
Minimal number of significant digits, see print.default. |
... |
Other arguments passed to other methods. |
print
returns the object x
invisibly.
require(usl) data(raytracer) ## Print result from USL model for demo dataset print(usl(throughput ~ processors, raytracer))
require(usl) data(raytracer) ## Print result from USL model for demo dataset print(usl(throughput ~ processors, raytracer))
A dataset containing performance data for a ray-tracing benchmark.
A data frame with 11 rows on 2 variables
The benchmark measured the number of ray-geometry intersections per second. The data was gathered on an SGI Origin 2000 with 64 R12000 processors running at 300 MHz.
The data frame contains the following variables:
processors
The number of CPUs used for the benchmark (1–64).
throughput
The number of operations per second.
Neil J. Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer, Heidelberg, Germany, 1st edition, 2007. Original dataset from https://sourceforge.net/projects/brlcad/
scalability
is a higher order function and returns a function to
calculate the scalability for the specific USL model.
## S4 method for signature 'USL' scalability(object, alpha, beta, gamma)
## S4 method for signature 'USL' scalability(object, alpha, beta, gamma)
object |
A USL object. |
alpha |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
beta |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
gamma |
Optional parameter to be used for evaluation instead of the parameter computed for the model. |
The returned function can be used to calculate specific values once the model for a system has been created.
The parameters alpha
and beta
are useful to do a what-if
analysis. Setting these parameters override the model parameters and show
how the system would behave with a different contention or coherency delay
parameter.
A function with parameter x
that calculates the
scalability value of the specific model.
Neil J. Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer, Heidelberg, Germany, 1st edition, 2007.
usl
,
peak.scalability,USL-method
optimal.scalability,USL-method
limit.scalability,USL-method
require(usl) data(raytracer) ## Compute the scalability function scf <- scalability(usl(throughput ~ processors, raytracer)) ## Print scalability for 32 CPUs for the demo dataset print(scf(32)) ## Plot scalability for the range from 1 to 64 CPUs plot(scf, from=1, to=64)
require(usl) data(raytracer) ## Compute the scalability function scf <- scalability(usl(throughput ~ processors, raytracer)) ## Print scalability for 32 CPUs for the demo dataset print(scf(32)) ## Plot scalability for the range from 1 to 64 CPUs plot(scf, from=1, to=64)
USL
"Display the object by printing it.
## S4 method for signature 'USL' show(object)
## S4 method for signature 'USL' show(object)
object |
The object to be printed. |
show
returns an invisible NULL
.
require(usl) data(raytracer) ## Show USL model show(usl(throughput ~ processors, raytracer))
require(usl) data(raytracer) ## Show USL model show(usl(throughput ~ processors, raytracer))
sigma
Extract Residual Standard Deviation 'Sigma'
## S4 method for signature 'USL' sigma(object, ...)
## S4 method for signature 'USL' sigma(object, ...)
object |
An object from class |
... |
Other arguments passed to other methods. |
A single number.
require(usl) data(raytracer) ## Print result from USL model for demo dataset print(sigma(usl(throughput ~ processors, raytracer)))
require(usl) data(raytracer) ## Print result from USL model for demo dataset print(sigma(usl(throughput ~ processors, raytracer)))
A dataset containing performance data for a Sun SPARCcenter 2000 (16 CPUs)
A data frame with 7 rows on 2 variables
A Sun SPARCcenter 2000 with 16 CPUs was used for the SPEC SDM91 benchmark in October 1994. The benchmark simulates a number of users working on the UNIX server and measures the number of script executions per hour.
The data frame contains the following variables:
load
The number of simulated users (1–216).
throughput
The achieved throughput in scripts per hour.
Neil J. Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer, Heidelberg, Germany, 1st edition, 2007. Original dataset from http://www.spec.org/osg/sdm91/results/results.html
summary
method for class "USL
".
## S4 method for signature 'USL' summary(object, ...)
## S4 method for signature 'USL' summary(object, ...)
object |
A USL object. |
... |
Other arguments passed to other methods. |
require(usl) data(raytracer) ## Show summary for demo dataset summary(usl(throughput ~ processors, raytracer)) ## Extract model coefficients summary(usl(throughput ~ processors, raytracer))$coefficients
require(usl) data(raytracer) ## Show summary for demo dataset summary(usl(throughput ~ processors, raytracer)) ## Extract model coefficients summary(usl(throughput ~ processors, raytracer))$coefficients
usl
is used to create a model for the Universal Scalability Law.
usl(formula, data, method = "default")
usl(formula, data, method = "default")
formula |
An object of class " |
data |
A data frame, list or environment (or object coercible by
as.data.frame to a data frame) containing the variables in the model.
If not found in data, the variables are taken from
|
method |
Character value specifying the method to use. The possible values are described under 'Details'. |
The Universal Scalability Law is used to forcast the scalability of either a hardware or a software system.
The USL model works with one independent variable (e.g. virtual users,
processes, threads, ...) and one dependent variable (e.g. throughput, ...).
Therefore the model formula must be in the simple
"response ~ predictor
" format.
The model produces two main coefficients as result: alpha
models the
contention and beta
the coherency delay of the system. The third
coefficient gamma
estimates the value of the dependent variable
(e.g. throughput) for the single user/process/thread case. It therefore
corresponds to the scale factor calculated in previous versions of the
usl
package.
The function coef
extracts the coefficients from the model
object.
The argument method
selects which solver is used to solve the
model:
"nls
" for a nonlinear regression model. This method
estimates all coefficients alpha
, beta
and gamma
.
The R base function nls
with the "port
" algorithm
is used internally to solve the model. So all restrictions of the
"port
" algorithm apply.
"nlxb
" for a nonliner regression model using the function
nlxb
from the nlsr
package. This method
also estimates all three coefficients. It is expected to be more robust
than the nls
method.
"default
" for the default method using a transformation
into a 2nd degree polynom has been removed with the implementation
of the model using three coefficients in the usl package 2.0.0.
Calling the "default
" method will internally dispatch to the
"nlxb
" solver instead.
The Universal Scalability Law can be expressed with following formula.
C(N)
predicts the relative capacity of the system for a given
load N
:
An object of class USL.
Neil J. Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer, Heidelberg, Germany, 1st edition, 2007.
John C. Nash. nlsr: Functions for nonlinear least squares solutions, 2017. R package version 2017.6.18.
efficiency,USL-method
,
scalability,USL-method
,
peak.scalability,USL-method
,
optimal.scalability,USL-method
,
limit.scalability,USL-method
,
summary,USL-method
,
sigma,USL-method
predict,USL-method
,
overhead,USL-method
,
confint,USL-method
,
coef
,
fitted
,
residuals
,
df.residual
require(usl) data(raytracer) ## Create USL model for "throughput" by "processors" usl.model <- usl(throughput ~ processors, raytracer) ## Show summary of model parameters summary(usl.model) ## Show complete list of efficiency parameters efficiency(usl.model) ## Extract coefficients for model coef(usl.model) ## Calculate point of peak scalability peak.scalability(usl.model) ## Plot original data and scalability function plot(raytracer) plot(usl.model, add=TRUE)
require(usl) data(raytracer) ## Create USL model for "throughput" by "processors" usl.model <- usl(throughput ~ processors, raytracer) ## Show summary of model parameters summary(usl.model) ## Show complete list of efficiency parameters efficiency(usl.model) ## Extract coefficients for model coef(usl.model) ## Calculate point of peak scalability peak.scalability(usl.model) ## Plot original data and scalability function plot(raytracer) plot(usl.model, add=TRUE)
USL
" for Universal Scalability Law modelsThis class encapsulates the Universal Scalability Law. Use the function
usl
to create new objects from this class.
frame
The model frame.
call
The call used to create the model.
regr
The name of the regressor variable.
resp
The name of the response variable.
coefficients
The coefficients alpha, beta and gamma of the model.
coef.std.err
The standard errors for the coefficients alpha and beta.
coef.names
A vector with the names of the coefficients.
fitted
The fitted values of the model. This is a vector.
residuals
The residuals of the model. This is a vector.
df.residual
The degrees of freedom of the model.
sigma
The residual standard deviation of the model.
limit
The scalability limit as per Amdahl.
peak
A vector with the predictor and response values of the peak.
optimal
A vector with the optimal predictor and response values.
efficiency
The efficiency, e.g. speedup per processor.
na.action
The na.action
used by the model.