2023/2024
Computational Learning
Code: 104403
ECTS Credits: 6
Degree |
Type |
Year |
Semester |
2503740 Computational Mathematics and Data Analytics |
OB |
3 |
1 |
Teaching groups languages
You can check it through this link. To consult the language you will need to enter the CODE of the subject. Please note that this information is provisional until 30 November 2023.
Teachers
- Marc Borras Camarasa
Prerequisites
A good knowledge of the contents of the subjects studied during the first year as Probability and Calculus and of the second year as Artificial Intelligence and Modeling and Inference is considered very important. Although some theoretical part of the contents has already been treated in those subjects from a mathematical point of view, in this subject we will focus on their implementation in Python and their application in multiple real cases.
Objectives and Contextualisation
The course aims both to expand some of the topics developed during "Artificial Intelligence", and to introduce new problems associated with AI, mainly the learning of concepts and trends from data. It is about training students to be "data engineers/scientists", one of the occupations with the most brilliant future and most demanded by an increasing number of companies, including Facebook, Google, Microsoft and Amazon, to cite but a few. In fact, it is expected that the growth of the demand of these professionals in data engineering/science will be exponential at an international level, especially due to the growth in the generation of massive data. Thus, the main objective of the Course is to teach how to find a good solution (sometimes the best one is impossible) for different data analysis problems at different context,, based on identifying the best knowledge representation and applying the most appropriate technique to automatically generate good mathematical models that best explain the observed data with an acceptable deviation.
The contents taught in this Course are also given in the Universities of Stanford, Toronto, Imperial College London, MIT, Carnegie Mellon and Berkeley, to put just the most representative names. Therefore, on the one hand, thestudent gets an opportunity to achieve knowledge and skills comparable to those taught at the best universities. On the other hand, the student must be aware that this knowledge has an inherent mathematical difficulty, which involves considerable study and dedication. This is because in this Course not only the most important contents to become a data engineer are taught, but also a curriculum line is formed to allow the student to expand the range of jobs available after the Career, as well as giving the necessary methodological bases for carrying out a Master degree in data engineering/science or artificial intelligence.
If you are looking for a Course to open an international labor market, and to learn the most used machine learning algorithms in not only the great technological companies mentioned above, but also in many data analysis SME and spin-offs in our country, this Course will not disappoint if you put both attitude and aptitude.
The objectives of the Course can be summarized in:
Knowledge:
- Describe the basic techniques of computer learning.
- List the essential steps of different machine learning algorithms
- Identify the advantages and disadvantages of the learning algorithms.
- Solve problems by applying different machinelearning techniques to find the optimal solution.
- Understand the results and limitations of each learning technique in different case studies.
- Know how to choose the most appropriate learning algorithm to solve contextualized problems.
Skills:
- Recognize situations in which the application of machine learning algorithms may be adequate
- Analyze the problem to solve and design the optimal solution applying the learned techniques
- Write technical documents related to the analysis and solution of a problem
- Program the basic algorithms to solve the proposed problems
- Evaluate the results of the implemented solution and propose possible improvements
- Defend and argue the decisions taken in the solution of proposed problems
Competences
- Make effective use of bibliographical resources and electronic resources to obtain information.
- Solve problems related to the analysis of large volumes of data through the design of intelligent systems and computational learning.
- Students must be capable of applying their knowledge to their work or vocation in a professional way and they should have building arguments and problem resolution skills within their area of study.
- Students must be capable of communicating information, ideas, problems and solutions to both specialised and non-specialised audiences.
- Students must have and understand knowledge of an area of study built on the basis of general secondary education, and while it relies on some advanced textbooks it also includes some aspects coming from the forefront of its field of study.
- Using criteria of quality, critically evaluate the work carried out.
- Work cooperatively in a multidisciplinary context assuming and respecting the role of the different members of the team.
Learning Outcomes
- Identify and define computational solutions in multiple domains for decision making based on exploring alternatives, uncertain reasoning and task planning.
- Learn and apply the most appropriate learning techniques for solving computational problems in distinct case studies.
- Make effective use of bibliographical resources and electronic resources to obtain information.
- Students must be capable of applying their knowledge to their work or vocation in a professional way and they should have building arguments and problem resolution skills within their area of study.
- Students must be capable of communicating information, ideas, problems and solutions to both specialised and non-specialised audiences.
- Students must have and understand knowledge of an area of study built on the basis of general secondary education, and while it relies on some advanced textbooks it also includes some aspects coming from the forefront of its field of study.
- Understand and evaluate the results and limitations of the most common learning techniques.
- Using criteria of quality, critically evaluate the work carried out.
- Work cooperatively in a multidisciplinary context, taking on and respecting the role of the distinct members in the team.
Content
UNIT 1: INTRODUCTION
1.1 Basic concepts
1.2 History and biases of computational learning
UNIT 2: DATA REGRESSION
2.1 Descent of the gradient
2.2 Regularization
UNIT 3: DATA CLASSIFICATION
3.1 Regularized logistic regression
3.2 Support vector machines
UNIT 4: DATA MEMORIZATION
4.1 Lazy learning: hyperparameters
4.2 The K-Nearest neighbors algorithm (weighted)
UNIT 5: DATA CLUSTERING
5.1 Modeling by Mixture of Gaussians
5.2 The k-means algorithm and Expectation-Maximization
Methodology
All the subject information and related documents that students need can be found on the Caronte page (https://caronte.uab.cat/course/view.php?id=94), in the subject menu Computational Learning (104403). It will serve to be able to see the materials, manage the practice groups, make the corresponding deliveries, see the notes, communicate with the teachers, etc. To be able to use it, you must do the following steps:
- Register as a user by giving your name, NIU, and a passport photo in JPG format. If you have already registered for another subject, there is no need to do it again, you can go to the next step.
- Enroll in the "Machine Learning (104403)" teaching type, giving the subject code "apc2023" (without the quotes).
In the development of the subject, seven types of teaching activities can be distinguished:
MD0 Presentation of theory content: Presentation of the theoretical content to be worked on in the subject. These contents must have been prepared before class by reading texts, searching for information, etc. The contents presented will be directly related to the problems, projects and seminars proposed in other teaching activities, so that they will be the basis on which other activities of the course will be developed. The contents will be found on the Caronte page (presentations and videos) and will consist of two parts: a presentation where the main theoretical and mathematical concepts related to specific computational learning tasks are exposed (this syllabus will be the basis of the exam theory of the subject, see evaluation section of this teaching guide), and a second part of code in python on Jupyter notebooks that exemplify the details of coding and libraries to implement in a practical case the main concepts seen in the previous hour . The students will then be able to watch the videos of the classes, download the presentations and the python notebooks and test all the codes on their computer, to do the necessary tests and to be able to play with the various parameters to finish understanding the reasons for the different performances and precisions that are achieved in a specific database with specific configurations of the algorithms explained in the subject.
MD1 Computational problem solving: Delivery of up to a maximum of 3 problems implemented in a Jupyter Notebook, out of the 5 worked on in class. All the theory topics will be accompanied by a list of notebooks, from which the student will have to work on the problem sessions and hand in optionally. These activities must allow the student to deepen their understanding and personalize the theoretical knowledge in a specific numerical case. Some examples of data that require the design of a solution in which the methods seen in the theory classes are used will be considered. It is impossible to follow the problem classes if you do not follow the contents of the theory classes. The result of these sessions is to achieve the necessary skills for solving problems that will have to be delivered according to the specific delivery mechanism that will be indicated on the subject's website (Caronte area).
MD2 Implementation of a short guided group project: Realization of 1 guided practice to deepen the applied aspects of the theory. The practical part of the subject will be completed with practical sessions, where the students will have to solve specific problems of a certain complexity implemented in python. These projects will be solved in small groups of 2-3 people (in justified cases, groups of 1 person will be allowed), and where each member of the group will have to do a part and share it with the rest to have the solution end These working groups must be maintained until the middle of the course and must be self-managed: distribution of roles, work planning, assignment of tasks, management of available resources, conflicts, etc. Although the teacher will guide the learning process, his intervention in the management of the groups will beminimal. To develop the project, the groups will work independently and the practice sessions must be dedicated mainly by the teacher to monitoring the status of the project, indicating errors to be corrected, proposing improvements, etc. Doubts that may arise regarding the implementation of the practicals will be transmitted through the Caronte forum, where other students can answer them.
MD3 Individual solution of a Kaggle practical case: each student will individually create a jupyter notebook where the various steps taken to solve a Computational Learning problem will be explained. The projects will be applied to selected databases from the Kaggle platform (https://www.kaggle.com/search?q=machine+learning), and will consist of three parts: an explanation of the most important attributes of the database and of the attribute to predict/classify; brief description of the computational learning method applied, along with the chosen parameters; and a presentation of the results obtained. Examples of jupyter notebooks can be found in the following repository: https://datauab.github.io/
MD4 Consultations and doubts: Free hours for the student for consultations and tutorials on aspects in which he needs additional help from the teaching staff. All inquiries will be made online, through the subject's forum, or emails to teachers, for example. It will be appreciated that the students answer the doubts of their colleagues as well as that in these answers they provide information that helps in understanding the content of the teaching activities.
MD5 Evaluation activities: for each of the activities described above. See the assessment section of this teaching guide.
In the case of repeaters, if the teacher in charge is asked, the grades for the teaching activities they took the previous year will be validated, if they have passed. Repeaters must retake the individual theoretical tests (MD0).
Transversal skills
Transversal skills T01, T02 and T04 are worked on and evaluated at home during the course in the following activities
- T01 Evaluate critically and with quality criteria the work done: in the activities MD0 (study of the theory), MD1 (realization of the problems), MD2 (realization of the laboratory practicum) and MD3 (explain a practical case of machine learning)
- T02 Work cooperatively in a multidisciplinary context assuming and respecting the role of the different team members: in MD2 practices and project analysis in MD3.
- T04 Effectively use the bibliography and electronic resources to obtain information: in the preparation of the theory material of MD0, and in the preparation of the description of the practical case in MD3
Annotation: Within the schedule set by the centre or degree programme, 15 minutes of one class will be reserved for students to evaluate their lecturers and their courses or modules through questionnaires.
Assessment
Activities and evaluation instruments:
a) Scheduled evaluation process and activities
The subject consists of the following assessment activities:
- MD0: Theoretical exams, where for each exam the student will have to answer individually and in writing 5 questions (to be developed in a maximum plan) on computational learning concepts seen in the theory classes. It represents 20% of the final grade, is optional, individual and recoverable (there will be two partials and their respective recoveries, without penalty).
- MD1: Delivering a report with up to 3 solved problems, where each student will individually deliver a written report on up to 3 of the 5 problems seen in problem classes (regression, classification, memorization, and clustering). It represents 25% of the final grade, is optional, individual and recoverable (without penalty).
- MD2: Resolution of a practice with delivery of a report explaining the resolution and the results, where each group composed of two or three people (in justified cases the groups may be of 1 person), will deliver the python code, as well as a report of up to 30 pages in which they will describe the databases, the strategy they used to analyze the data, as well as the tests with different parameter values they tried and the results they obtained with the best possible configuration. There will be an optional presentation of each project. It represents 25% (code 5% + report 15% + presentation 5%) on the final grade, it is optional, group and recoverable.
- MD3: Individual development of a project with the description in a 30-pages (max) report about the python code and the results of a specific case of machine learning (either regression, classification, clustering or memorization, chosen from here: https://www.kaggle.com /search?q=machine+learning), and making an optional presentation describing the data in code, the models used (with the parameters that work best for the data),and the resultsof the chosen problem. Examples of jupyter notebooks applied to concrete cases can be found here https://datauab.github.io/. It represents 30% of the final grade (code 5% + report 15% + github 5% + presentation 5%), it is optional, individual and NOT recoverable.
Below is a description of how to pass the subject with continuous assessment:
- MD0: Individual theoretical exams
The final theory grade will be calculated from two partial exams:
Theory Grade = (0.5 * Partial1) + (0.5 * Partial2)
Partial1 is done in the middle of the semester and is used to eliminate part of the subject if it is approved. Partial2 is done at the end of the academic semester and serves to eliminate the part of the syllabus that comes after Partial1.
These exams seek an individualized assessment of the student with their ability to answer 5 long questions (develop until they occupy a maximum sheet of paper) about the techniques explained in class, as well as evaluate the level of conceptualization that the student n 'has done the techniques seen.
In order to pass the theory part of the subject by taking face-to-face exams, two requirements must be met:
it will be necessary that the grades of partials 1 and 2 be equal to or higher than 4.0 (in both partials). If less than a 4.0 is taken in any of the two Partials, the corresponding partial must be retaken during the make-up exam.
the final theory grade must be greater than or equal to 4.0. In the event that the final theory grade is not equal to or higher than 4.0, students can take the make-up exam to be assessed on all the content seen in the subject.
Make-up exam (end of January or beginning of February). In this exam, you can recover the partial(s) that did not exceed 4.0, or recover the entire syllabus in the event that the final theory gradedoes not exceed 4.0.
− MD1: Individual delivery of a report with up to 3 solved problems
The aim of the problems is to cause the student to engage with the contents of the subject continuously and, based on small problems, to become familiar directly with the application of the theory. As evidence of this work, the mandatory presentation of a portfolio in which you will have kept the problems you have been working on (competence T04) is requested.
Note Problems = Portfolio evaluation with 3 solved problems out of the 5 seen in class. They can be recovered by handing in for Caronte on the day of the second theory part, without penalty.
− MD2: Resolution of 1 group practice
The evaluation of the 2 internship project will include:
− Joint evaluation of each project (T02 competence): single grade for all members of the working group that will assess the overall result of the project, the quality of the code, the general structure of the final presentation and the documents delivered throughout the project.
- Individual assessment (T01 competence): the individual work will be assessed based on the answers to the questions in the online control sessions, the final presentation of the online project and mainly the active participation in the Caronte forums. In the cases required by any group (in cases of incidents between colleagues), a short confidential form will be evaluated qualifying the contribution of each group member to the final result.
The grade of the project will be calculated according to the formula:
Practice Grade = (0.2 * Program) + (0.2 * Presentation) + (0.6 * Documentation)
In very justified cases (e.g. PIUNE, for work, family or health issues, ...), groupscan be 1 person.
− MD3: Realization of a specific case of computational learning on the Kaggle platform
The evaluation will be based on the python code and the explanation in a 30-pages (max) report of the code and results that will be found in a github repository. The kaggle case grade will be calculated according to the formula:
Note Case Kaggle = (0.1 * Introduction to the database) + (0.25 * Analysis of attributes, correlations,...) + (0.25 * Description of the method used, how to find the best parameters, comparison of methods...) + (0.3 * Deciption of results, confusion matrices, graphs of models and data, examples of false positives/negatives, ROC curves, ...) + (0.1 * Github repository presentation)
Examples of jupyter notebooks applied to concrete cases can be found here https://datauab.github.io/.
- Evaluation of transversal skills
The partial exams will allow you to evaluate your acquisition of personal thinking and work habits (T01 Evaluate critically and with quality criteria the work done, Theory Note). With the Internship Project Note, teamwork will also be assessed (T02 Work cooperatively in a multidisciplinary context assuming respect for the role of the different members of the team, Group Note). By completing the problems and completing a Kaggle case, the acquisition of habits to solve a predetermined task with completely different data values than those seen in class will be assessed (T04 Effectively use the bibliography and electronic resources to get information, Note Issues and Kaggle Case).
The final grade of the subject is obtained by combining the evaluation of these 4 activities as follows:
Final Grade = (0.20 * Theory) + (0.25 * Practical) + (0.25 * Problems) + (0.30 * Kaggle Case)
Additionally, a Datathon will be organized forthewholeclass, where you can get up to 1.0 extra point to be among the first in the competition that will be held on the last day of the course, optional participation.
Conditions to approve:
All assessment activities are optional, to pass the subject it is necessary that the sum of the assessment of each of the activities exceeds 5 points. In the event of not passing the subject, the numerical grade of the file will be the lower value between 4.5 and the weighted average of the grades obtained:
The final MD0 theory grade must be greater than or equal to 4.0 to be able to add the theory part to the subject's final grade.
The grade of the MD2 project must be greater than or equal to 5.0 to be able to add the part of the practice to the final grade of the subject.
The final grade of the subject must be greater than or equal to 5.0 in order to pass the subject.
If the student takes an exam or completes a practice, he can no longer be assessed as "Not Assessable" in the event that he does not take any of the other assessments, but the final grade will be calculated from of those continuous evaluations that have been presented.
b) Programming of evaluation activities
The dates of continuous assessment and assignment of assignments will be published in Caronte (http://caronte.uab.cat/), in the area of this subject and may be subject to schedule changes for reasons of adaptation to possible incidents ; caronte.uab.cat will always be informed about these changes since this platform will become the usual mechanism for exchanging information between professors and students.
c) Recovery process
The student can present himself for recovery as long as he has presented himself to a set of activities that represent a minimum of two thirds(4evaluation activities out of 7 total: 2 partial theory exams; 1 practice evaluation; 3 problem submissions; and 1 kaggle case submission) of the subject's total grade.
Of these, those students who have a grade above 3.0 as an average for all the activities of the subject will be able to submit a retake.
It should be borne in mind that the Resolution ld the Kaggle Case (MD3) is not recoverable.
d) Qualification review procedure
For each individual theoretical exam, a review place, date and time will be indicated in which the student can review the activity with the teacher. In this context, claims can be made about the grade of the activity, which will be evaluated by the teaching staff responsible for the subject. If the student does not appear for this review, this activity will not be reviewed later.
e) Qualifications
Honors Matriculations: Honors Matriculations will be granted at the discretion of the teaching staff responsible for the subject, up to five percent or a fraction of the students enrolled in all the teaching groups of the subject. UAB regulations indicate that MH can only be awarded to students who have obtained a final grade equal to or higher than 9.00.
Non-evaluable: A student will be considered non-evaluable (NA) if he/she has not appeared in any of the partial exams or in any of the 2 practical assessments.
f) Irregularities on the part of the student, copying and plagiarism
Without prejudice to other disciplinary measures deemed appropriate, irregularities committed by the student that may lead to a change in the grade of an assessment act will be graded with a zero. Therefore, copying, plagiarism, deception, allowing copying, etc. in any of the assessment activities will involve failing it with a zero. Assessment activities qualified in this way and by this procedure will not be recoverable. If it is necessary to pass any of these assessment activities to pass the subject, this subject will be suspended directly, with no opportunity to recover it in the same course. These irregularities include, among others:
the total or partial copy of a practice, report, or any other assessment activity;
let copy;
present group work not done entirely by group members (applied to all members, not just those who have not worked);
present as own materials prepared by a third party, even if they are translations or adaptations, and in general works with non-original and exclusive elements of the student;
have communication devices (such as mobile phones, smart watches, pens with cameras, etc.) accessible during individual theoretical-practical assessment tests (exams);
talk with peers during individual theoretical-practical assessment tests (exams);
copy or attempt to copy from other students during theoretical-practical assessment tests (exams);
use or try to use writings related to the subject during the theoretical-practical assessment tests (exams), when these have not been explicitly allowed.
The numerical grade on the file will be the lower value between 3.0 and the weighted average of the grades in the event that the student has committed irregularities in an evaluation act (and therefore the approved by compensation will not be possible). In future editions of this subject, the student who has committed irregularities in an assessment act will not be validated for any of the assessment activities carried out.
Summary: copying, allowing copying or plagiarism (or the attempt to) in any of the assessment activities is equivalent to a SUSPENSION, not compensable or recoverable and without validation of parts of the subject in subsequent courses.
g) Evaluation of repeat students
From the second enrolment, the evaluation of the subject will consist of the individual theoretical exam, adding the marks corresponding to the MDs obtained the first time the student has registered for the subject, as long as the marks of MD2 practices are greater than or equal to 5.0.
To be able to opt for this differentiated assessment, the repeating student must ask the teacher until week 5 at the latest.
Assessment Activities
Title |
Weighting |
Hours |
ECTS |
Learning Outcomes |
Delivery of problems |
25% |
0
|
0 |
8, 7, 1, 6, 4, 3
|
Individual theory tests |
20% |
4
|
0.16 |
7, 1, 6, 5, 4, 3
|
Written documentation, implementation and presentation of the Kaggle case |
30% |
2
|
0.08 |
8, 2, 6, 5, 9, 3
|
Written documentation, implementation and presentation of the practical project |
25% |
2
|
0.08 |
8, 7, 1, 6, 4, 9
|
Bibliography
Web links
- Caronte: http://caronte.uab.cat
- Artificial Intelligence: A Modern Approach. http://aima.cs.berkeley.edu/
- Web of the UAB Library Catalogue: https://bit.ly/35jalzm
Basic bilbiography:
- S. Russell, P. Norvig. Artificial Intelligence: A Modern Approach. Ed. Prentice Hall, Second Edition, 2003. (Existeix traducció al castellà: Inteligencia artificial: Un Enfoque Moderno)
Complementary bilbiography
- L. Igual, S. Seguí. Introduction to Data Science. Ed. Springer, 2017
- Bishop, Pattern Recognition and Machine Learning, 2007.
- Duda, Hart, and Stork, Pattern Classification, 2nd Ed., 2002.
- Marlsand, Machine Learning: an Algorithmic Perspective, 2009
- Mitchell, Machine Learning, 1997
- Ripley, Pattern Recognition and Neural Networks, 1996.
Related bilbiography
- Eberhart, Shi, Computational Intelligence: Concepts to Implementations, 2007
- Friedman, Tibshirani, The Elements of Statistical Learning, 2009.
- Gilder, Kurzweil, Richards, Are we spiritual machines? Ray Kurzweil vs. the Critics of Strong AI, 2011
- Kurzweil, The Singularity is Near: When Humans trascend Biology, 2006
- Rosen, Life Itself: A Comprehensive Inquiry into the Nature, Origin, and Fabrication of Life (Complexity in Ecological Systems), 2005
- Witten,Frank, Hall, Data Mining: Practical Machine Learning Tools and Techniques, 2011
Software
The software required will be the Python programming language, a programming environment (such as Spyder, Pycharm or Visual Studio Code), the Jupyter Notebook web application, and the libraries needed for data analysis: scipy (contains NumPy, matplotlib, pandas), sklearn and Seaborn