Student Projects

Welcome to the LTS4 Student Projects page!

Below you will find a collection of projects that are available for the coming semesters. Even though most of the proposed projects are categorized as Semester or Master projects, they can generally be modified to fit other formats. Contact the responsible person that is given in the detailed project description for further information.

The project topics are the following:

In case you want to work on your own ideas that are closely related to our research activities, we are more than happy to discuss them with you.

Image Analysis and Vision

    • Robustness of deep neural networks on different spaces
      Semester/Master Project

      Deep Neural Networks have achieved extraordinary results on many image classification tasks, but have been shown to be vulnerable to small alterations of their input, also known as adversarial perturbations [1]. There are several methods that compute adversarial perturbations, each one of which perturbs different features/characteristics (spatial, spectral, chromatic) of the input image.
      In this project, we will study the robustness of deep neural networks to perturbations that lie on specific spatial, spectral, and color spaces. The goal is to identify those spaces where the networks are more vulnerable, and use them properly to build methods that increase the robustness against adversarial attacks.

      References:

      [1] Szegedy et al., “Intriguing properties of neural networks”, ICLR 2014.

      Requirements: Good knowledge of Python, sufficient familiarity with computer vision and deep learning. Experience with PyTorch or other deep learning library is a plus.

      Contact: apostolos.modas@epfl.ch

    • Behavioural analysis of neural networks to adversarial perturbations
      Semester/Master Project

      The vulnerability of deep neural networks to small, carefully crafted noise known as adversarial perturbations [1], has raised many questions regarding their safety and their overall behaviour when classifying different images. In order to improve these deep nets and increase their robustness, first we need to better understand their total behaviour both in correct classification and misclassification situations.

      In this project, we aim to study the internal behaviour of such deep architectures, when different types of perturbations are applied in the input layer. By observing the neuron activation diffusion and visualising the feature changes, we want to understand how these small perturbations in the input affect the different layers of the network, leading eventually to such significant changes in terms of classification.

      [1] Szegedy et al., “Intriguing properties of neural networks”, ICLR 2014.

      Requirements: Good knowledge of Python and MATLAB, and sufficient familiarity with deep learning and deep neural networks. Experience with PyTorch is a plus.

      Contact: apostolos.modas@epfl.ch

    • Omnidirectional vision for drones
      Semester/Master Project

      Omnidirectional imaging is a hot research topic and has important applications to aerial robots equipped with multiple cameras. Next to numerous use cases in cinematography and film-making, omnidirectional vision can enhance drone autonomy by providing 360 degree visual sensing for tasks such as collision avoidance. The present project aims at implementing and comparing efficient stitching algorithms for omnidirectional vision, possibly enhancing existing solutions. Since the algorithm is to be run on the drone onboard computer and potentially used for real-time applications such as collision avoidance, it should have low complexity and must be able to run at an acceptable speed.

      The first part of the project will involve the selection and implementation of suitable stitching algorithms. Existing algorithms may be enhanced by taking the a priori information such as the camera configuration into account. The second part of the project will involve testing of the proposed algorithms on a physical drone platform.

      Requirements: Programming skills (shell scripting, Python, C/C++), basics of image processing.

      Contact: giuseppe.cocco@epfl.ch

    • Bandwidth efficient object recognition for drone swarms
      Semester/Master Project

      The project aims at developing a bandwidth efficient distributed object detection system which can be flown on a drone swarm. The system exploits the different points of view of the drones in the swarm to improve object recognition, while keeping the amount of data that is transmitted to the ground station as low as possible. In this way a more efficient use of the limited wireless resources can be achieved. The projects will involve the use of both off-the-shelf neural network algorithms and WiFi communication protocols. 
 The first part of the project will focus on the setup of the communication network between the communication modules to be mounted on the drones and the ground station. The second part of the project will focus on the setup of the image capture/object recognition system and some basic onboard image processing. The core part of the project will consist in the implementation and optimization of the detection and communication protocol, which will build upon the modules developed so far. This part will include the evaluation of the system performance in terms of both bandwidth efficiency and detection accuracy.

      Requirements: Good knowledge of WiFi communication protocols (hands-on experience desirable), programming skills (shell scripting, Python, C/C++), familiarity with computer vision.

      Contact: giuseppe.cocco@epfl.ch

    • Deep Learning for Depth Estimation
      Semester/Master Project

      Although Deep Neural Networks (DNN) are widely known for their remarkable classification capabilities, it has recently been shown that they can effectively be used for Depth Estimation as well. The Depth Estimation problem consists in the inference of the depth of a scene from two or possibly more images of the same scene captured from different points of view. It represents a critical problem in Computer Vision, as it is necessary to target more high level problems, such as semantic segmentation, action recognition, etc. Light field cameras can capture hundreds of images of the same scene from slightly different points of view in just a single exposure. Ideally, the availability of such a number of views makes the depth estimation problem easier to address, compared to the most common scenario where only two image are available. In this project, the student will investigate the adaption of existing DNN-based Depth Estimation algorithm designed for the common setup with just two of images (stereo setup) to the Light Field case (multi-stereo setup), where multiple images of the same scene are available. Although the availability of multiple images of the same scene makes the depth estimation problem less ill-conditioned and easier to approach, the high amount of data requires to carefully take into account the overall computational and memory complexity of the final Depth Estimation algorithm, as these can become a main bottleneck when a depth estimation is required in real-time application.

      [1] Alexey Dosovitskiy et al., FlowNet: Learning Optical Flow with Convolutional Networks, ICCV 2015.
      [2] Nikolaus Mayer et al., A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation, IEEE CVPR 2016.
      [3] Wenjie Luo et al., Efficient deep learning for stereo matching, IEEE CVPR 2016.

      Requirements: Knowledge of Python and DNN tools such as PyThorch. Basics of Image Processing.

      Contact: mattia.rossi@epfl.ch

    • Geometry meets robustness
      Semester Project

      Deep networks have shown a great performance in classification tasks. However, they are proved to be vulnerable to small and often imperceptible noise called “adversarial perturbations”. It becomes extremely crucial when they are deployed in safety-critical applications such as driverless cars. So far, no definite methods have been found to counter this vulnerability effectively.

      Nevertheless, the adversarial perturbations are recently revealed to be tightly related to the geometry of the decision boundary of deep networks. In this project, we characterize these perturbations in terms of the geometrical properties of the decision boundary. Based on this understanding, we eventually design efficient algorithms to detect adversarial perturbations.

      Requirements: Good knowledge of Python and MATLAB, sufficient familiarity with machine learning and image processing, having experience with PyTorch (or similar deep learning frameworks) is a plus.

      Contact: seyed.moosavi@epfl.ch

    • Investigation of Deep Convolutional Neural Networks on the Information Plane
      Master/Semester Project

      With their success and wide application areas, there is an urge for a comprehensive understanding of learning with Deep Convolutional Neural Networks (DCNNs). There is a recent approach [1,2] that studies the information paths of Deep Neural Networks (DNNs) in the information plane, where the joint distribution of the input and output of each layer is plotted against their mutual information throughout the learning procedure. Despite bringing a new perspective and revealing details about the inner working of DNNs, these two papers experiment only with very small-scale fully-connected feedforward networks for classification, which leads to easy computations for distributions. To adapt this framework to real-world problems, one needs to apply estimation methods to compute mutual information and required distributions for the analysis. The aim of this project is to analyze simple but more realistic DCNNs, like LeNet-5 [3], in the information plane by finding and applying proper and efficient estimation techniques. Possible extension would be to examine how skip connections [4] or routing between capsules, which are groups of neurons proposed in [5] to achieve equivariance, contribute to learning process.

      [1] Tishby, N., & Zaslavsky, N. (2015, April). Deep learning and the information bottleneck principle. In Information Theory Workshop (ITW), 2015 IEEE (pp. 1-5). IEEE.
      [2] Shwartz-Ziv, R., & Tishby, N. (2017). Opening the Black Box of Deep Neural Networks via Information. arXiv preprint arXiv:1703.00810.
      [3] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
      [4] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
      [5] Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic Routing Between Capsules. In Advances in Neural Information Processing Systems (pp. 3857-3867).

      Requirements: Sufficient familiarity with machine learning and probability. Experience with one of deep learning libraries and good knowledge of the corresponding coding language (preferably Python).

      Contact: beril.besbinar@epfl.ch

    • Interpretable machine learning in personalised medicine
      Master/Semester Project

      Modern machine learning models mostly act as a black box and their decisions cannot be easily inspected by humans. To trust the automated decision-making, we need to understand the reasons behind predictions, and gain insights into the models. This can be achieved by building models that are interpretable. Recently, different methods have been proposed for data classification, such as augmenting the training set with useful features [1], visualizing the intermediate features in order to understand the input stimuli that excite individual feature maps at any layer in the model [2-3], or introducing logical rules in the network that guide the classification decision [4], [5]. The aim of this project is to study existing algorithms, which attempt to interpret deep architectures by studying the structure of their inner layer representations, and based on these methods find patterns for classification decisions along with coherent explanations. The studied algorithms will most be considered in the context of personalised medicine applications.

      [1] R. Collobert, J. Weston, L. Bottou, M. M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,”J. Mach. Learn. Res., vol. 12, pp. 2493–2537, Nov. 2011.
      [2] K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” arXiv:1312.6034, 2013.
      [3] L. M. Zintgraf, T. S. Cohen, T. Adel, and M. Welling, “Visualizing deep neural network decisions: Prediction difference analysis,” arXiv:1702.04595, 2017.
      [4] Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing, “Harnessing deep neural networks with logic rules,” in ACL, 2016.
      [5] Z. Hu, Z. Yang, R. Salakhutdinov, and E. Xing, “Deep neural networks with massive learned knowledge,” in Conf. on Empirical Methods in Natural Language Processing, EMNLP, 2016.

      Requirements: Familiarity with machine learning and deep learning architectures. Experience with one of deep learning libraries and good knowledge of the corresponding coding language (preferably Python) is a plus.

      Contact: mireille.elgheche@epfl.ch

    • Depth estimation via deep learning: the light field case
      Master Project

      The depth estimation problem is concerned with the inference of the depth of a scene from multiple pictures of it, all captured from different points of view. Depth estimation represents a critical problem in Computer Vision, as it is necessary to target more high level problems, such as 3D reconstruction, semantic segmentation, action recognition, etc. Typically, since depth estimation requires multiple pictures of the same scene to be available, the scene has to be still while the user moves around and captures the necessary pictures: this represents a significant limitation. On the other hand, Light Field cameras [1] can capture hundreds of images of the same scene from slightly different points of view in just a single exposure, therefore depth estimation from light field camera data has raised a large interested in the last decade [2][3].
      Due to the recent success of Deep Neural Network in Computer Vision tasks, the student will address the light field depth estimation problem within the framework of Deep Learning. Although some preliminary work in this direction already exists [3], this research track is still at its beginning. The student will start by studying the recent deep learning-based depth estimation algorithms for the standard stereo setup [4] (two pictures available) and the multi-view stereo setup [5] (three or more pictures available). Then, the student will consider their extension to the particularly scenario of light field cameras. On one hand, light field cameras provide a much higher number of pictures than those considered in the traditional stereo and multi-view stereo setups, therefore, the computational and memory complexity will have to be carefully taken into account. On the other hand, the light field camera pictures exhibit a very regular structure [2] that can be largely exploited in the depth estimation task. The depth estimation results will be evaluated on a state-of-the-art light field benchmark [6].

      References:

      [1] Raytrix website (https://raytrix.de/)
      [2] S. Wanner and B. Goldluecke, Globally consistent depth labeling of 4D light fields, IEEE CVPR, pp. 41-48, 2012
      [3] S. Changa et al., EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images, IEEE CVPR, 2018
      [4] W. Luo et al., Efficient deep learning for stereo matching, IEEE CVPR, pp.5695-5703, 2016
      [5] H. Po-Han et al., DeepMVS: Learning Multi-View Stereopsis, IEEE CVPR, 2018
      [6] Light field depth estimation benchmark (http://hci-lightfield.iwr.uni-heidelberg.de/)

      Requirements: Knowledge of MATLAB and/or Python. Basic knowledge of optimization and machine learning. Knowledge of DCNN libraries (e.g., Caffe) and image processing is a plus.

      Contact:mattia.rossi@epfl.ch

    • Comparative study of CNNs and human visual system under the effect of clutter
      Master Project

      Convolutional Neural Networks (CNN) are feedforward, hierarchical architectures that achieve extremely accurate classification of natural images. However, there are differences in the visual aspects captured by CNNs and by the human visual system. This project aims to do a comparative study between the behaviour of CNN and humans for the task of classification of images under the effect of crowding (clutter). According to [1] although the performance of the human visual system decreases in the presence of crowding, the performance can be progressively regained in the presence of more crowding with specific characteristics (color, size, orientation). The main goal of this project will be to investigate whether CNNs exhibit similar behaviour. The student will learn how to:

      i) create a dataset to train the CNN

      ii) train the CNN so that it classifies accurately the images without the presence of clutter

      iii) fashion experiments (similar to those in [1]) in order to test the performance of the trained CNN in the presence of clutter

      iv) provide comments on the results

      References:

      [1] Michael H. Herzog, Mauro Manassi, Uncorking the bottleneck of crowding: a fresh look at object recognition, Current Opinion in Behavioral Sciences, 1, p86-93, 2015.

      [2] Yann LeCun, Yoshua Bengio and Geoffrey Hinton, Deep Learning, Nature 521, no. 7553 (2015): 436-444.

      [3] Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, Classification with Deep Convolutional Neural Networks, NIPS 2012.

      Requirements: Good knowledge of Matlab or Python, sufficient familiarity with machine learning and image processing. Having experience with deep learning is a plus.

      The project will be co-supervised with the Laboratory of Psychophysics from the School of Life Sciences.

      Contact: effrosyni.simou@epfl.ch

    • Transformation invariant deep learning systems
      Semester Project

      Deep learning systems have achieved remarkable results in last decades. The network is able to learn meaningful features for machine learning tasks from RAW data. The system is even able to learn the data representation under some transformation due to the max-pooling operator. However, this operator is not sufficient to handle ambiguity in the natural data.

      In this project we propose to study existing algorithms, which build features invariant to transformation with deep network, and based on these methods build rotation and translation invariant system for an image classification task.

      Requirements: Programming skills, signal processing. Knowledge in deep learning is a plus.

      Contact: renata.khasanova@epfl.ch

    • Omnidirectional vision
      Semester/Master Project

      Omnidirectional cameras have a 360 degrees field of view. Therefore, they are powerful tools for object detection and classification tasks [1, 2]. For example, a single omnidirectional camera can be used to capture traffic data based on the information about vehicles on the roads or it can be mounted on drone and used for object detection and collision avoidance. Despite the broad variety of applications where omnidirectional vision can be used, to the best of our knowledge, there are just a few methods that perform object classification directly on omnidirectional images without data transformation to the classical images that leads to increase of computational complexity and possible information losses.

      We propose to use deep architecture [3] to tackle classification problem of omnidirectional images. In the project student would be required to develop an algorithm for object classification by considering omnidirectional camera’s lens geometry.

      References:

      [1] H.C. Karaimer, Y. Bastanlar. Detection and classification of vehicles from omnidirectional videos using temporal average of silhouettes. InProceedings of the Int. Conference on Computer Vision Theory and Applications (2015), pp.197-204.

      [2] L.F. Posada, K.K Narayanan, F. Hoffmann, T. Bertram. Semantic classification of scenes and places with omnidirectional vision. In European Conference on Mobile Robots (ECMR), IEEE (2013), pp. 113-118.

      [3] Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (2012), pp. 1097–1105.

      Requirements: Basic knowledge in computer vision, neural networks, signal processing, programming skills.

      Contact: renata.khasanova@epfl.ch

    • Light field depth estimation meets deep learning in the epipolar image domain
      Master Project

      A Light Field camera [1] looks exactly like your favorite point-and-shot camera, however, its smart design permits to capture multiple pictures at the same time. These pictures capture the scene from slightly different points of view, thus recording its 3D structure. With some computation, it is possible to exploit these pictures for multiple applications: re-focusing the captured pictures at an arbitrary depth, measuring the depth of the objects in the scene, or even building a 3D model of the scene.

      The first step toward these applications is depth estimation. Depth estimation [2] is a fundamental problem in Computer Vision and consists in assigning a depth value to each object (pixel) in a captured picture. Typically, to solve this problem, multiple pictures of the same scene are necessary, therefore, the scene or subject has to be still while the user moves around and captures the pictures. On the other hand, a light field camera captures all the required images in a single shot. In addition, the set of images captured by a light camera, referred to as the Light Field, exhibits a very particular structure: the depth associated to each pixel in the light field is in a one-to-one relation with the slope of a line in the Epipolar Image Domain representation of the light field [3]. Therefore, the depth estimation problem reduced to a much simpler slope detection problem in the context of light fields.

      In this project, the student will first study the special structure of light field data. Then, the student will take advantage of the effectiveness of Deep Neural Network in detecting patterns in the data to determine the line slopes, and therefore the light field depth map. The student will design a network architecture suitable for the considered task and will have to take into account the high dimensionality of the light field data, that may not permit to process the full light field at once. The depth estimation results will be evaluated on a state-of-the-art light field benchmark [4].

      References:

      [1] Raytrix website (https://raytrix.de/)
      [2] M. Bleyer and C. Breiteneder, Stereo Matching: State-of-the-Art and Research Challenges, Advanced Topics in Computer Vision, Springer, pp. 143-179, 2013
      [3] S. Wanner and B. Goldluecke, Globally consistent depth labeling of 4D light fields, IEEE CVPR, pp. 41-48, 2012
      [4] Light field depth estimation benchmark (http://hci-lightfield.iwr.uni-heidelberg.de/)

      Requirements: Fundamentals of image processing, basic knowledge of deep learning and machine learning, Python, PyTorch or TensorFlow.

      Contact: mattia.rossi@epfl.ch

    • Omnidirectional stereo: Patch Match meets Deep Learning
      Semester/Master Project

      Omnidirectional cameras capture videos with a 360-degree field of view. Thanks to a Head Mounted Display, the user can be thrown in the middle of the scene and experience a much deeper immersion than with a traditional video. At this point, although the user can watch the scene around itself while the video flows, his point of view is bound to the camera one: the user cannot make a step in an arbitrary direction in the scene, not yet. Interesting, coupling together two omnidirectional cameras permits to estimate the geometry of the scene, thus paving the way to 3D reconstruction and unveiling the possibility for the user to navigate the scene freely. A preliminary work in this direction has already been carried out in [1][2].
      The problem of geometry estimation from pairs of traditional perspective cameras is referred to as Stereo Matching [3] and is a long studied problem, for which fast and effective methods exist. Instead, omnidirectional stereo is a very recent research track. In this project, the student will first become familiar with Path Match Stereo [4][5], a fast, effective, yet simple, stereo matching algorithm, and then extend it to the case of omnidirectional camera pairs. Moreover, while Patch Match uses an Euclidean-based metric to match the views captures by the two cameras, and it is therefore sensible to light changes and occlusions, the student will consider its replacement with a metric computed by a Deep Neural Network [6], with the attempt to get a more robust algorithm.

      References:

      [1] C. Schroers et al., An Omnistereoscopic Video Pipeline for Capture and Display of Real-World VR, ACM Transactions on Graphics, Special Issue on Production Rendering, vol. 37, no. 3, August 2018
      [2] H. Jingwei et al., 6-DOF VR videos with a single 360-camera, IEEE Virtual Reality, 2017
      [3] M. Bleyer and C. Breiteneder, Stereo Matching: State-of-the-Art and Research Challenges, Advanced Topics in Computer Vision, Springer, pp. 143-179, 2013
      [4] M. Bleyer et al., PatchMatch Stereo – Stereo Matching with Slanted Support Windows, BMCV, pp. 14.1-14.11, 2011
      [5] E. Zheng et al., PatchMatch Based Joint View Selection and Depthmap Estimation, IEEE CVPR, pp. 1510-1517, 2014
      [6] J. Zbontar and Y. LeCun, Stereo matching by training a convolutional neural network to compare image patches, Journal of Machine Learning Research, vol. 17, pp. 1-32, 2016

      Requirements: Fundamentals of linear algebra, fundamentals of image processing, basic knowledge of deep learning and machine learning, Python, PyTorch or TensorFlow..

      Contact: mattia.rossi@epfl.ch

Image/Video Coding and Communication

    • Video compression for moving cameras
      Semester/Master Project

      Video streaming from mobile platforms such as drones is a widespread application which is gaining an even higher popularity in the last years. Efficient video coding standards are used nowadays to compress videos onboard the platform. However, despite the impressive progresses both from the practical and the theoretical perspective, a full theoretical model of the rate-distortion function (that is, the relationship between the compressed video data rate and quality) for natural videos taken by moving cameras is still missing. This project aims at gathering and processing data in a fully controlled environment in order to move one step forward towards the development of such modelling.

      The project is divided into three parts.
      • In the first part raw images will be taken under different conditions in a controlled environment using a camera
      • In the second part the data will be post-processed by developing a predictive video coder (e.g. in Matlab)
      • In the third part, fitting to an existing model will be carried out.

      Requirements: knowledge of Matlab, basic knowledge of image processing (e.g., discrete cosine transform, quantization) or signal processing, precision and motivation

      Contact: guiseppe.cocco@epfl.ch

    • Peer-assisted adaptive streaming of omnidirectional videos
      Semester/Master Project

      The current state of the art on multimedia streaming over the Internet is mainly based on Adaptive Streaming over HTTP (HAS) techniques [1], which provides a standard delivery framework that allows interoperability between different devices and servers while optimizing bandwidth consumption. The key concept behind adaptive streaming is that the same video content is encoded at different resolutions/encoding rates, stored on streaming servers, and each user requests over time the desired version of the content according to its download capacity. The emergence of new media formats supporting improved user experiences, such as omnidirectional and multiview videos, however, requires the transmission of a huge amount of data, and impose many new challenges on the current HAS infrastructures. One possibility for improving the state of the art when transmitting those new media formats is through the use of peer-assisted delivery techniques [2].

      This project aims at exploring the use of peer-assisted adaptive streaming in the scope of omnidirectional video. The main goal is to study how current HAS format and adaptation logics can be adapted to support an optimized omnidirectional video delivery chain and consequently improve the experience of users consuming those content.

      References:

      [1] Stockhammer, Thomas. “Dynamic adaptive streaming over HTTP–: standards and design principles.” Proceedings of the second annual ACM conference on Multimedia systems. ACM, 2011.

      [2] Streamingroot white paper: Peer-Assisted Adaptive Streaming; http://files.streamroot.io/public/whitepapers/Streamroot-Whitepaper-Peer-Assisted-Adaptive-Streaming.pdf

      [3] YouTube 360° https://www.youtube.com/channel/UCzuqhhs6NWbgTzMuM09WKDQ

      Requirements: Basic knowledge of multimedia networking, basic programming skills.

      Contact: roberto.azevedo@epfl.ch

    • Visual quality of omnidirectional images and videos
      Semester/Master Project

      Many algorithms (i.e. objective quality metrics) have been proposed in literature in order to quantify the visual quality of a digital image or video sequence, as perceived by the end user. These metrics are extremely useful in order to optimise the different steps of the digital processing chain, such as acquisition, compression, transmission, and rendering, to maximize the perceptual quality of the signal presented to the multimedia user [1].

      Nowadays, cameras which allow capturing omnidirectional (i.e. 360°) digital images and video sequences have started to appear as commercial products [2][3]. The spread in the near future of applications involving omnidirectional content will require the optimisation of its processing chain. Thus, algorithms able to quantify the visual quality of omnidirectional images and videos will be needed soon. While many quality metrics exists for classical not-omnidirectional images and videos, the quality assessment of this new kind of visual media is an open research challenge. The goal of this project will be the study of the quality perception of omnidirectional content and the design of algorithms to assess the quality of omnidirectional content.

      References:

      [1] Wang and Bovik, “Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures”, IEEE Signal Processing Magazine, 2009

      [2] https://theta360.com/

      [3] http://gopro.com/odyssey

      Requirements: Matlab programming skills, image processing

      Contact: roberto.azevedo@epfl.ch

    • Omnidirectional image and video processing
      Semester/Master Project

      Nowadays, cameras which allow capturing omnidirectional (i.e. 360°) images and video sequences have started to appear as commercial products [1][2]. An omnidirectional image can be seen as a 360° viewing sphere, since the real-world environment surrounding the camera is captured in all directions.

      In order to be processed with widely-spread algorithms designed for standard rectangular planar images, the viewing sphere is often mapped to a plane, resulting in a so-called panoramic image [3]. The alternative to this approach would be to process the signal directly in its original spherical domain. The goal of this project will be the study of a framework to adapt classical digital image and video processing techniques to the omnidirectional scenario, where signals are defined on the surface of a sphere.

      References:

      [1] https://theta360.com/

      [2] http://gopro.com/odyssey

      [3] David Salomon, “Transformation and projection in Computer Graphics”

      Requirements: Matlab programming skills, signal processing, basics of trigonometry.

      Contact: roberto.azevedo@epfl.ch

    • Plenoptic sampling
      Master Project

      With the recent advances in 3D scene representation with camera networks, like in 3DTV or Free-viewpoint TV, the problem of camera positioning is becoming very important. Optimal camera placement increases the 3D reconstruction precision and augments the compression performance of multi-view images. The problem of camera placement can be formulated as the plenoptic sampling problem, where the best sampling of the viewpoints is sought. The plenoptic function captures the luminance and chrominance properties of a light ray in any direction, at any time instant and from any viewing point.

      This project aims at proposing a novel representation and parameterization of the plenoptic function that can efficiently capture the underlying geometry in 3D scenes. As a second part of this project, the camera positioning problem will be formulated as a sampling problem in the transform domain defined by the novel geometric representation of the plenoptic function.

      Requirements: signal processing, Matlab (C/C++)

      Contact: pascal.frossard@epfl.ch

    • Study and implementation of a channel coding scheme for delay-constrained applications
      Semester/Master Project

      Multimedia streaming over wireless channels will play a fundamental role in the next generation of mobile communication networks. Real-time image and video streaming are particularly challenging due to the strict constraints in terms of delay. Such constraints are strict in case the video is streamed from a drone and is used as a reference by the pilot to control the machine. The goal of the project is to develop a channel coding scheme which can cope with limited channel state information at the transmitter while providing high reliability and meeting the strict delay constraints of real-time multimedia streaming. The implementation will be based on LDPC codes and multiuser detection principles.

      Requirements: knowledge of channel coding principles, good programming skills (Matlab and C/C++).

      Contact: giuseppe.cocco@epfl.ch

Signal Processing on Graphs

    • Network inference from a noisy graph structure
      Master Semester/Thesis Project

      Graph structures carry a lot of information on the interactions between data, and can be very useful in data analysis or interpretation. However, available structures are often very noisy or not entirely representative of data. Meanwhile, graph inference algorithms can provide very good estimations of this structure, but need a large amount of data to learn from.
      The goal of this project is to propose a graph inference algorithm for applications where fewer data is available, but a noisy graph estimation already exists. This can be an empirically constructed geometric graph or an available graph that does not necessarily capture the full information (eg. social network graph, where the same connection can represent very close friendship and people that have never met in person).

      References:

      [1] Dong, Xiaowen, et al. “Learning Laplacian matrix in smooth graph signal representations.” IEEE Transactions on Signal Processing 64.23 (2016): 6160-6173.

      [2] Kalofolias, Vassilis. “How to learn a graph from smooth signals.” Artificial Intelligence and Statistics. 2016.

      [3] Nguyen, Viet Anh, Daniel Kuhn, and Peyman Mohajerin Esfahani. “Distributionally Robust Inverse Covariance Estimation: The Wasserstein Shrinkage Estimator.” arXiv preprint arXiv:1805.07194 (2018).

      Requirements: Python, basics of probability theory, linear algebra, optimization is a plus

      Contact: hermina.petricmaretic@epfl.ch or mireille.elgheche@epfl.ch

    • Graph learning with a degree distribution prior
      Master Semester/Thesis Project

      Graphs are flexible structures that represent connections between data, whether they’re connections between people in social network or connectivity networks that govern processes in our brains. Very often (as is the case in brain networks) these graph structures are not readily available and need to be inferred. Graph learning methods have become very popular in the last few years, with solutions covering many different models of data behaviour on the graph.
      However, most solutions ignore any possible information we might have on the network structure. For example, additional knowledge about the graph can be included in terms of a degree distribution prior. The goal of this work is to propose a method for graph inference incorporating the prior information on the degree distribution of nodes.

      References:

      [1] Dong, Xiaowen, et al. “Learning Laplacian matrix in smooth graph signal representations.” IEEE Transactions on Signal Processing 64.23 (2016): 6160-6173.

      [2] Tzikas, Dimitris G., Aristidis C. Likas, and Nikolaos P. Galatsanos. “The variational approximation for Bayesian inference.” IEEE Signal Processing Magazine 25.6 (2008): 131-146.

      [3] Dong, Xiaowen, et al. “Learning Graphs from Data: A Signal Representation Perspective.” arXiv preprint arXiv:1806.00848(2018).

      Requirements: Basics of (graph) signal processing, basics of machine learning, Python/Matlab coding skills.

      Contact: hermina.petricmaretic@epfl.ch

    • Inference of multiple functional brain networks using Graph Laplacian Mixture Model
      Master Semester Project/Thesis Project

      Spontaneous brain activity, as measured through resting-state functional magnetic resonance imaging (fMRI) has provided key insights into the functional architecture of the brain. Global patterns of neural activity can be obtained by directly computing the statistical interdependence between different brain regions. The information can then be conveniently summarised into a functional connectome (1), and more intuitively represented as a set of functional brain networks. These networks are observed to be dynamic (2) and spatio-temporally overlapping (3). In this project, the goal is to simultaneously separate signals corresponding to different phases and infer multiple functional brain networks. This will be done by building upon an emerging field of graph learning, specifically by utilising a Graph Laplacian mixture model (4), a generative model for mixed signals living on multiple networks.

      References:

      [1] Bullmore, E., Sporns, O., 2009. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10 (3), 186–198.

      [2] Chang C, Glover GH (2010) Time-frequency dynamics of resting-state brain connectivity measured with fmri. Neuroimage 50:81–98

      [3] Karahanoğlu, F. I. & Van De Ville, D. (2015) Transient brain activity disentangles fmri resting-state dynamics in terms of spatially and temporally overlapping networks. Nature communications 6

      [4] Maretic, Hermina Petric, and Pascal Frossard. “Graph Laplacian mixture model.” arXiv preprint arXiv:1810.10053(2018),

      Requirements: Basics of (graph) signal processing, basics of machine learning, basics of linear algebra, basics of probabilities, Python/Matlab coding skills.

      Contact: hermina.petricmaretic@epfl.ch

    • Improving classification performance of convolutional neural networks by changing the convolutional kernel shape
      Semester/Master Project

      Convolutional neural networks have become one of the most efficient tools for tasks such as classification. In such networks, a convolutional kernel is translated over an image in order to detect some local patterns that may be characteristic of a particular class. In general, these kernels take the shape of a NxM window of pixels, which is quite arbitrary. The idea of the project is to explore different kernel shapes, possibly learnt from the data, that would allow classification improvements.

      Requirements: Basics of (graph) signal processing, basics of machine learning, Python/Matlab coding skills.

      Contact: bastien.pasdeloup@epfl.ch

    • Finding the closest regular topology to a graph representing an irregular space
      Semester/Master Project

      Graphs are often used to represent irregular topologies, such as relations in a social network, roads of a city, brain connectivity, etc. Due to this irregular aspect, some operations such as translation of a signal on a graph are quite hard to define. However, these tasks can easily be done for some particular graphs, such as an N-dimensional grid for instance. The project would consist, given a graph, to find the « closest » regular space to approximate it, using particular properties of adjacency matrices for the latter graphs (block circulancy bandwidth…). Operations could then be performed on this space as a proxy for the irregular topology.

      Requirements: Basics of (graph) signal processing, basics of machine learning, Python/Matlab coding skills.

      Contact: bastien.pasdeloup@epfl.ch

    • Defining heuristics for translations of a convolutional kernel on an irregular graph
      Semester/Master Project

      Graph convolutional neural networks are possible extensions of classical CNNs on images, allowing one to achieve better classification performance on irregular data. One of the approaches to extending CNNs for graphs consists in translating a convolutional kernel over a graph. However, translation is not easily defined on spaces that are not associated with a notion of distance. A possible way of mimicking translation on Euclidean spaces is to find an injective function on the graph that preserves neighborhoods within the kernel to translate (https://arxiv.org/abs/1710.10035). When the kernel is highly localized, such functions can be explored exhaustively to find those that minimize the kernel deformation upon translation. However, when it grows in size, this approach becomes intractable due to the NP-completeness complexity of finding interesting translations. In this project, we would like to explore possible heuristics to translate a convolutional kernel on a graph.

      Requirements: Basics of (graph) signal processing, basics of machine learning, Python/Matlab coding skills.

      Contact: bastien.pasdeloup@epfl.ch

    • Exploring hyper graph signal processing
      Semester/Master Project

      Graph signal processing (GSP) has emerged as a possible extension to classical Fourier analysis, allowing one to study signals evolving on complex topologies. In this framework, graphs model the domains on which data evolve, by creating edges between variables with some relationships. However, graphs only capture 1-to-1 relationships and ignore some more complex relations between 3 or more variables. Such relations can be modeled using a hypergraph, i.e., a graph in which edges can link more than 2 nodes. The goal of this project is to explore whether tools from the (GSP) theory can be found for such structures.

      Requirements: Basics of (graph) signal processing, basics of machine learning, Python/Matlab coding skills.

      Contact: bastien.pasdeloup@epfl.ch

    • Semi-Supervised Learning and Inpainting on Multi-Layer Graph Representations
      Semester/Master Project

      On partially labeled data, semi-supervised learning methods have been studied profoundly by expressing the relations between the data entities within weighted graph representations [1]. The inpainting task, on the other hand, is usually defined on the domains accompanying a signal content, which has been addressed with graph signal representations and operations [2]. Most of these studies focus on one type of relationship between the pair of data points during the construction of the graph structure. However, the connections between the entities may possess different types of relationships, which can be represented better by multiple graph structures. The objective of this project is to extend semi-supervised clustering and inpainting tasks on multi-layer graph settings, where each graph layer signifies a particular type of relation between vertices. This yields the same number of vertices in each graph layer, yet the topology (i.e., weight matrix) is different due to the difference between the focus of each layer.

      References:

      [1] Belkin, Mikhail, and Partha Niyogi. “Semi-supervised learning on Riemannian manifolds.” Machine learning 56.1-3 (2004): 209-239.

      [2] Perraudin, Nathanaël, and Pierre Vandergheynst. “Stationary signal processing on graphs.” IEEE Transactions on Signal Processing 65.13 (2017): 3462-3477.

      [3] Davide Eynard, Klaus Glashoff, Michael M Bronstein, and Alexander M. Bronstein. Multimodal diffusion geometry by joint diagonalization of laplacians.arXiv preprint arXiv:1209.2295, 2012.

      [4] Xiaowen Dong, Pascal Frossard, Pierre Vandergheynst, and Nikolai Nefedov. Clustering on multi-layer graphs via subspace analysis on grassmann manifolds. IEEE Transactions on signal processing, 62(4):905–918, 2014.

      [5] Xiaowen Dong, Pascal Frossard, Pierre Vandergheynst, and Nikolai Nefedov. Clustering with multi-layer graphs: A spectral perspective. IEEE Transactions on Signal Processing, 60(11):5820–5831, 2012.

      Requirements: Python, basics of Graph signal processing: graphs signal filtering and spectral clustering (covered in EE-558 Network Tour of Data Science).

      Contact: eda.bayram@epfl.ch

    • Transfer learning for network data
      Semester/Master Project

      Technology is in constant evolution and the amount of collected network data is increasing every day. Numerical examples can be found in geographical, transportation, biomedical and social networks, such as temperatures within a geographical area, traffic capacities at hubs in a transportation network, or human behaviors in a social network. Due to this growing volume of information and its diverse interactions, when represented, data resides now on irregular and complex structures, and can be modelled by graphs [1, 2, 3]. Let us consider, for example, the traffic congestion problem. If we have a model that links two different graphs of two different cities, we will be able to transfer the traffic information from one city to another and then predict the condition in which this later would be. The idea of this project is exactly to build a method and implement a machine learning algorithm, that is able to transfer information from one graph to another [4].

      References:

      [1] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega and P. Vandergheynst. The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains, in IEEE Signal Processing Magazine, vol. 30, num. 3, p. 83-98, 2013.

      [2] B. Yener, Cell-Graphs: Image-Driven Modeling of Structure-Function Relationship, ACM, Vol. 60 No. 1, Pages 74-84, 2017.

      [3] A. Ortega, P. frossard, J. Kovacevic, J. Mora P. Vandergheynst, Graph Signal Processing: Overview, Challenges and Applications, 2018.

      [4] M. Pilanci and E. Vural, “Domain adaptation on graphs by learning aligned graph bases,” 2018.

      Requirements: Good knowledge of Python and MATLAB, familiarity with machine learning and linear algebra.

      Contact: mireille.elgheche@epfl.ch

    • Building Extraction on Aerial LIDAR Point Clouds using Spectral Graph Features
      Semester/Master Project

      Airborne Laser Scanning is a well-known remote sensing technology, which provides quite dense and highly accurate, yet unorganized point cloud descriptions of the earth surface. Weighted graphs are very convenient tools for representing such irregular and 3D data types, moreover, spectral graph based methods provide spectral analysis of signals residing on the weighted graph representations. With this in mind, one can consider the airborne LIDAR data as unstructured elevation signal so that it can be processed on an appropriate graph structure.

      The goal of this project is to discover the spectral attributes of various objects in a LIDAR scene, such as buildings and vegetation, through the graph signal processing. Instead of calculating the geometric primitives such as normals, slopes and curvatures for each point on a scene and thresholding them, this approach aims to transpose classical signal processing tools to analyze 3D aerial LIDAR point clouds.

      In particular, the points on the breakline of the buildings constitute some features which can be detected and discriminated from the other objects by augmenting the spectral information on the graph. This could be achieved by formulating a spectral descriptor on the detected points and then, yielding a classification problem to discriminate the ones existing on the buildings. On a building extraction problem, the later step is to retrieve whole body of the building objects.

      References:

      [1] Michaël Defferrard, Lionel Martin, Rodrigo Pena, & Nathanaël Perraudin. (2017, October 6). PyGSP: Graph Signal Processing in Python (Version v0.5.0). Zenodo. http://doi.org/10.5281/zenodo.1003158.

      [2] ISPRS, “Isprs test project on 3d semantic labeling contest,”2017, http://www2.isprs.org/commissions/comm3/wg4/3d-semantic-abeling.html.

      [3] Blomley, R., and M. Weinmann. “USING MULTI-SCALE FEATURES FOR THE 3D SEMANTIC LABELING OF AIRBORNE LASER SCANNING DATA.” ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences 4 (2017).

      Requirements: Python, basics of Graph signal processing: graphs signal filtering (covered in EE-558 Network Tour of Data Science), familiarity with basic machine learning tools.

      Contact: eda.bayram@epfl.ch

Learning

    • Learning similarity metric for video reconstructiom
      Semester Project

      Obtaining an unsupervised representation of sequential visual data can be crucial for any autonomous intelligent agent. One of the simplest schemes to learn an unsupervised representation is to encode the data under some constraints in such a way that the error between the reconstructed and input data is minimized.

      The use of latent random variables with neural networks has been argued to model the variability observed in the data efficiently, as in Variational Autoencoders (VAE) [1] and their counterparts for sequential data, e.g., variational Recurrent Neural Networks (RNN) in [2] (an RNN is a type of neural network which includes an inner loop so that it performs the same task for every element). The aim for the first part of the project is to understand variational RNNs and use them for video reconstruction using simple moving MNIST dataset. Later, instead of element-wise errors, the student is expected to use another neural network structure to learn a similarity metric as the basis for the reconstructive objective, as done in [3] for VAEs.

      References:

      [1] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

      [2] Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A. C., & Bengio, Y. (2015). A recurrent latent variable model for sequential data. In Advances in neural information processing systems (pp. 2980-2988)

      [3] Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015). Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300.

      Requirements: Good knowledge of Python, sufficient familiarity with machine learning and probability. Having experience with one of deep learning libraries (preferably Tensorflow) is a plus.

      Contact: beril.besbinar@epfl.ch

    • Spectral vs. spatial approaches of deep learning for data on non-Euclidean domains
      Semester Project

      Deep learning methods have been very successful for signals defined on regular grids, such as images and audio. Recently, some works extend deep learning models on data defined on irregular domain, such as graphs and manifolds. The approaches so far presented in this field can be classified to (i) spatial and (ii) spectral. The aim of this semester project is to compare these methods and to find their limitations and advantages.

      The student will (i) read and understand in depth existing methods of both categories, (ii) run experiments to compare different approaches and (optional) propose ideas of dealing with the limitations of the current methods.

      References:

      [1] M.M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, P. Vandergheynst, “Geometric deep learning: going beyond Euclidean data”, arXiv:1611.08097

      [2] D. I Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83-98, May 2013.

      Requirements: Python, background in machine learning and signal processing.

      Contact: effrosyni.simou@epfl.ch

Hands-on

    • User-friendly drawing of large graphs
      Semester Project

      Very large graphs can be difficult to visualise, and more importantly, difficult to make sense of. As there is an ever-increasing amount of data surrounding us, recovering useful information becomes more and more challenging, and interpreting large data becomes the main task of data scientists. In terms of large graphs, a coarser representation can make its behaviour and trends much clearer. Furthermore, a coarse representation can be coarsened further until we reach a version that is simple enough to easily interpret. However, once we recover the basic behaviour, we might want to go back to the finer version for more meaningful information.

      The goal of this project is to create a tool to plot a coarse graph representation. Representations with different levels of detail (original graph, coarser version, futher coarsened version…) will be provided and the tool should be able to zoom into the graph to recover a finer version of the graph and zoom out to draw a coarser version. The tool should be user-friendly and ensure consistent embedding of graphs with different amounts of detail. Integrating the tool in a website would be a plus.

      Requirements: Python, basics of linear algebra and graph theory, knowledge of web programming is a plus

      Contact: hermina.petricmaretic@epfl.ch