This version of the course guide is provisional until the period for editing the new course guides ends.

Logo UAB

Exploratory Data Analysis

Code: 104853 ECTS Credits: 6
2024/2025
Degree Type Year
2503852 Applied Statistics FB 1

Contact

Name:
Maria Rosa Camps Camprubi
Email:
rosa.camps@uab.cat

Teachers

Montserrat Ferre Delgado

Teaching groups languages

You can view this information at the end of this document.


Prerequisites

None.


Objectives and Contextualisation

Learn the descriptive and exploratory techniques applied to summarize the information contained in theexperimental datasets. Also the interpretation of the results and the diagrams in the context of the data,

Finally it is important that the students learn to use statistical software to manipulate data, perform descriptive analysis and graphs.

 

Learning Outcomes

  1. CM05 (Competence) Interpret situations based on sets of data, graphic representations and statistical summaries.
  2. CM06 (Competence) Apply the knowledge acquired to organise data, create and show tables and work with different data representations.
  3. KM07 (Knowledge) Identify behaviour patterns in data exploration.

Content

1. Preliminaries.

1.1.The goal of Exploratory Data Analysis.
1.2. Types of variables and measurement scales.
1.3. Rounding and scientific notation.

2. Summary of statistical data.

2.1. Frequency distributions: tables.
2.2. Grouping data into intervals.
2.3. Graphical representation.

3. Numerical measures of a variable.

3.1. Central position measures: mean, median, mode.
3.2. Other position measures: quartiles, deciles and percentiles.
3.3. Measures of dispersion: variance and standard deviation (sample and population), range, interquartilerange.
3.4. Measures of relative dispersion
3.5. Standard scores.
3.6. Measures of form: symmetry and curtosis

4. Extra tools for the study of a variable.

4.1. Exploratory analysis: boxplot and other diagrams.
4.2. Transformation of variables.
4.3. Other means: geometric, harmonic, quadratic.
4.4. Chebyshev's inequality.

5. Comparison of a variable in two or more groups: Exploratory analysis.

5.1. Situation of independent samples.
5.2. Situation of paired samples.

6. Tabulation and representation of the joint distribution of two categorical variables.

6.1. Contingency tables (joint, marginal and conditional frequency distributions).
6.2. Descriptive analysis of the dependence between two categorical variables.

7. Numeric description of the joint distribution of two statistical variables.

7.1. Marginal and conditionañ measures.
7.2. Regression curves and correlation coefficient.
7.3. Linear fitting and prediction.

8. Introduction to time series.

8.1. The classical decomposition.
8.2. Smoothing series: application of filters.

 

*Unless the requirements enforced by the health authorities demand a prioritization or reduction of these contents.dexes.

 


Activities and Methodology

Title Hours ECTS Learning Outcomes
Type: Directed      
Computer lab 30 1.2
Lectures 18 0.72
Problem sessions 8 0.32
Studying theoretical concepts, solving problems by hand and using R 84 3.36

Classroom work, theory and problems will be complemented by computer practices where the R-band packages will be used.

In the Moodle space of the course, students will find the subject's planning, the sets of exercices and computer lab sessions, as well as possible changes of classrooms, schedules, etc. It is important to keep in mind that CampusVirtual is not a static website
but will be updated throughout the course.

In the most practical part of the course, if possible and through the analysis and comparison of statistical data, we will comment on the causes and the social and cultural mechanisms that can sustain the observed inequalities.

* The proposed teaching methodology may experience some modifications depending on the restrictions to face-to-face activities enforced by health authorities.

Annotation: Within the schedule set by the centre or degree programme, 15 minutes of one class will be reserved for students to evaluate their lecturers and their courses or modules through questionnaires.


Assessment

Continous Assessment Activities

Title Weighting Hours ECTS Learning Outcomes
End of term written exam 25% 2 0.08
First computer lab exam 15% 2 0.08 CM05, CM06, KM07
Mid-term written exam 20% 2 0.08
Second computer lab exam 25% 2 0.08
Submission of exercise sets done with computer 15% 2 0.08 CM05, CM06, KM07

The final grade of the subject F will be obtained from:

1) The notes of the two partial exams of theory and problems, TP1 and TP2, with respective weights 20% and 25%.

2) The notes of the two computer tests, O1 and O2, with respective weights 15% and 25%.

3) The attendance to the practical sessions with computer and deliveries that are proposed, PC, with a weight of 15%. This part is not recoverable.

The final grade of the subject is obtained by making the weighted average F = 0.15 TP1 + 0.2 O1 + 0.25 TP2 + 0.25 O2 + 0.15 PC.

A requirement to pass the subject with the previous formula is that the marks TP1, TP2 and O2 must be greater than or equal to 4 and O1 must be greater or equal to 3.5.

In case F<5 or the requirement (marks O2, TP1 and TP2  >=4 and mark O1 >= 3.5) is not satisfied the students will have the opportunity to take a resit test,

There will be two resit tests:

- STP  a global exam oftheory and problems teoria i problemes, for students with marks of TP1 or TP2 less than 4 or whose bad grades in these exams cause F<5. 

- SO a global exam with computer, for students with marks O1 less than 3.5 or O2 less than 4, or whose bad grades in these exams give F<5.

the final grade will be   F=0,45 STP + 0,40 SO + 0,15 PC

(if only one of the tests are taken (STP or SO) then the other grade will be taken from the weighted mean of the passed partial exams grades).

Students who do not attend exam will get the qualification of "Not Evaluable".

"Without prejudice to other disciplinary measures deemed appropriate, and in accordance with current Academic regulations the irregularities committed by the student that may lead to a variation of the grade of an evaluation act will be scored with a zero. so, plagiarizing, copying or letting copy a practice or any other evaluation activity will involve suspending with a zero and cannot be recovered in the academic mateixcurs.If this activity has a minimum associated score, then the subject will be suspended. "

After the second partial tests the honors qualitfications will be considered and might be given, even before the resit exam.

Students who choose the single assessment modality will have to take two global exams for the subject: an exam on theory and problems and a computer test. These two exams will be held the same day,  time and place as those corresponding to the second part of the subject for ordinary students. On that day, the student can hand in the computer exercise assignments that have been scheduled throughout the course, which may be found on Moodle. The weighting for the final grade will be 45% for theory and problemes exam, 40% for computer exame and 15% of the assignments. The student may subsequently be summoned to an oral review of his exams and assignments with the professors of the course. If a grade of less than 5 is obtained, the student can do the resit tests with the rest of the students (same day, same hour, same place). And the weighting will be the same.


Bibliography

Course lecture notes

X. BARDINA, M. FARRÉ, Estadística descriptiva, Manuals, 54 Servei de Publicacions, UAB

Bibliography

A.J.B. ANDERSON, Interpreting Data. A first cours in Statistics, Ed Chapman and Hall, 1989.
R Tutorial. An R introduction to statistics.  (2016).  www.r-tutor.com
E. CASA ARUTA, Problemas de Estadística Descriptiva, Ed. Vicens Vives.
R. JOHNSON, P. KUBY, Estadística elemental: Lo esencial, Ed Thomson, 1999.
B. PY, Statistique Descriptive, Ed Económica, 1988.
M. SPIEGEL, Estadística, Teoría y 875 problemas resueltos, Schaum-McGraw-Hill, 1990.
V. ZAIATS, M.L. CALLE i R. PRESAS, Probabilitat i Estadística. Exercicis I, Eumo Ed, 1998.
H. WICKHAM, M. CETINKAYA-RUNDEL, G. GROLEMUND, R for Data Science. https://r4ds.had.co.nz/

Complementary Bibliography

G. CALOT, Curso de Estadística Descriptiva. Ed Paraninfo, 1988.
FERNÁNDEZ, J.M. CORDERO, A. C\'ORDOBA, Estadística Descriptiva, ed ESIC 1996.
L.C HAMMILTON, Modern Data Analysis, Brooks/Cole Publishing Company, 1990.
P.G. HOEL i R.J. JESSEN, Estadística básica para negocios y economía, Compañía Editorial Continental,Mexico, 1993.
R.K. PEARSON, Exploratory Data Analysis using R. Data Mining and Knowledge Discovery Series, Chapman & Hall/CRC, 2018.
D. PEÑA SÁNCHEZ DE RIVERA, Estadística. Modelos y métodos. 1. Fundamentos i 2. Modelos lineales yseries temporales, Alianza Editorial 1995. (2 volums)


Software

R and RStudio


Language list

Name Group Language Semester Turn
(PLAB) Practical laboratories 1 Catalan first semester afternoon
(PLAB) Practical laboratories 2 Catalan first semester afternoon
(TE) Theory 1 Catalan first semester afternoon