Data Analysis Techniques

Created
Tuesday, 26 September 2017
Created by
Super User
Last modified
Monday, 10 February 2020
Revised by
admin
Voting
Average rating

1

2

3

4

5
Favourites

50 Data Analysis Techniques

Click to subscribe
Categories
Engineering - Oil & Gas

OBJECTIVE

Transform data into information.
Use statistical techniques and tools to reduce the number of metrics that need to be monitored and analyzed.
Apply regression analysis to determine the scalability of web applications and services.

Who Should Attend?

Computer system administrators, mainframe system operators, network system administrators, performance engineers, test engineers, IT consultants, data center managers, Devops, IT technical managers and software development engineers. This course does not assume any prior experience with performance analysis methods, but a working knowledge of computer systems and high school algebra is helpful.

Outline

How to Detect Bad Data

All data is wrong by definition
Broken performance tools
The power of good statistical models

Introduction to R

Why R is de RigueuR on Wall St and elsewhere
My special 911.r script
R commands
R language
R graphics
Installing R

Measurement Errors and Analysis

Measurement is a process not a number
Confidence intervals and sigma levels
Confidence bands and QQ plots
How to express errors

Review of Elementary Statistics

Descriptive statistics
Measures of central tendency: mean, median and mode
Meaning of the means: arithmetic, geometric, harmonic
Measures of dispersion: stdev, variance, stderr, percentiles
Summarizing data and its statistics

Distributions and Histograms

Review of Uniform, Normal, Poisson, Exponential distributions
How to determine normal distributions
How to determine exponential/Poisson distributions
Weighted multi-class workloads

Review of Benchmarking and Load Test Tools

History of industry benchmarks SPEC and TPC
Steady-state measurement period
Comparing vendor benchmarks

Scalability Analysis

Load test data and QA analysis
Universal scalability law
Analyzing data for scalability zones

Multivariate Linear and Nonlinear Regression

ANOVA: Analysis of Variance
Moving averages
Web server scalability
Web traffic profiles and TZ zones

Data Mining Techniques for CaP

Machine learning algorithms
Support Vector Machines
Supervised learning
The svm package in R
Detecting performance patterns and defining exceptions

Wild Not Mild Data Distributions

Power law data and distributions
Case studies: SQL access patterns, web traffic, data recovery
Data validation using qqplots, log-linear plots and log-log plots

Taming the Data Torrent

Principal component analysis
Reducing the number of monitored metrics
Case studies: PerfViz, Apdex, Barry

PDQ-R Queueing Modeling Tool

The statistics of queues
Case study: Modeling networked storage
Case study: Multi-tier e-commerce data and PDQ analysis

Review and Class Discussion

Certificates

A Certificate of Completion will be issued to those who attend & successfully complete the programme.

Schedule

08:30 – 10:15 First Session

10:15 – 10:30 Coffee Break

10:30 – 12:15 Second Session

12:15 – 12:30 Coffee Break

12:30 – 14:00 Third Session

14:00 – 15:00 Lunch

Training Methodology:

This interactive training course includes the following training methodologies as a percentage of the total tuition hours:

30% Lectures, Concepts, Role Play
20% Workshops & Work Presentations, Techniques
20% Based on Case Studies & Practical Exercises
10% Videos, Software & General Discussions
20% Application
Pre and Post Test

Fees

The Fee for the seminar, including instruction materials, documentation, lunch, coffee/tea breaks & snack is: