Vendredi 2 décembre 2022, Jussieu, Paris, salle 405 (couloir 24-25)

Cette journée sera commune entre le GT OSI (Optimisation des Systèmes Intégrés) du GDR RO et l’axe thématique méthodes et outils de conception, simulation, évaluation et vérifications des systèmes du GDR SOC².

Programme :

  • 9h30-10h00 : accueil café
  • 10h00-11h00 : Karol Desnos, “Influence of Dataflow Graph Moldable Parameters on Optimization Criteria“
  • 11h00-12h00 : Laurent Lemarchand, “Design Space Exploration for TSP systems on multicore platforms under schedulability, security and safety constraints“
  • 12h00-14h00 : pause déjeuner
  • 14h-14h30 Remi Garcia, “Integer Multiplications on FPGA at Minimal Hardware Cost using Mathematical Modeling“
  • 14h30-15h Angeliki Kritikakou, “Energy-Quality-Time Optimized Mapping for Imprecise Computation Tasks on Multicores”
  • 15h-15h30 pause et échanges
  • 15h30-16h Hamza Ouarnighi, Hardware-aware Deep Learning for Edge Devices
  • 16h-17h : synthèse et bilan de la journée

Pour des raisons logistiques, l’inscription est gratuite mais obligatoire. L’inscription est à effectuer dans l’onglet “Register” à :

La journée sera retransmise en visio. Pour obtenir le lien de connexion, envoyez un message à kevin.martin@univ-ubs.fr

Le compte-rendu est disponible ici :

Présentations scientifiques :

  • Karol Desnos, « Influence of Dataflow Graph Moldable Parameters on Optimization Criteria« 

The integration of static parameters into Synchronous Dataflow (SDF) models enables the customization of an application functional and non-functional behaviours. However, these parameter values are generally set by the developer for a manual Design Space Exploration (DSE). Instead of a single value, moldable parameters accept a set of alternative values, representing all possible configurations of the application. The DSE is responsible for selecting the best parameter values to optimize a set of criteria such as latency, energy, or memory footprint. However, the DSE process explodes in complexity with the number of parameters and their possible values.

In this paper, we study an automated DSE algorithm exploring multiple configurations of a dataflow application. Our experiments show that: 1) Only limited sets of configurations lead to Pareto-optimal solutions in a multi-criteria optimization scenario. 2) How individual parameters impact on optimization criteria are determined accurately from a limited subset of design points. The approach was evaluated on three image processing applications having from hundreds to thousands configurations.

  • Remi Garcia, « Integer Multiplications on FPGA at Minimal Hardware Cost using Mathematical Modeling« 

Abstract: Multiplication is a basic operator used in many applications. When implemented in embedded systems, e.g. FPGAs, these algorithms require highly optimized hardware. Improving the multiplication implementation is an important part of the final hardware cost reduction. With this talk, we will present recent advances in the hardware design of multipliers. First, we will restrict ourselves to the multiplication of a variable with multiple a priori known constants, this problem is called the Multiple Constant Multiplication (MCM) problem. We show how Integer Linear Programming (ILP) permits to significantly reduce the hardware cost when MCM is implemented using additions and bit-shifts. Second, we will tackle the generic multiplier design on FPGAs. In this case, the common practice is to tile a large multiplier with DSP blocks and smaller LUT-based multipliers.
The recent results rely on a fixed set of small tile shapes. We extend this work and propose a general ILP-based model that performs the tiling without relying on any fixed shape by computing their cost on-the-fly. We provide a complete coefficients-to-VHDL tool for the MCM problem and, by using the FloPoCo code generator, we produce VHDL code for our multiplier tiling solutions.

  • Laurent Lemarchand, « Design Space Exploration for TSP systems on multicore platforms under schedulability, security and safety constraints« 

Avionic systems are integrating more and more functions to cope with the increasing number of features on modern aircrafts. These systems are subject to many requirements that have to be considered during their design. Time and Space Partitioning (TSP), which consists of isolating applications within partitions, is a well-known means to assign avionic applications to computing units according to security, schedulability, and safety constraints. Multicore execution platforms are becoming popular in avionic systems. In this paper, we propose to investigate the partitioning of avionic applications over such execution platforms while considering schedulability, security, and safety constraints. We propose a design space exploration approach adapting a multi-objective meta-heuristic, PAES. Our algorithm provides trade-offs between schedulability and security while considering safety and multicore platforms with different numbers of cores. It is run in 2 phases. The first phase, at application level, explores a reduced design space efficiently and the second, at task level, refines the results obtained at first step. We illustrate how this meta-heuristic can investigate key parameters such as data size, number of partitions, number of cores and inter-partition communication overhead.

  • Angeliki Kritikakou, « Energy-Quality-Time Optimized Mapping for Imprecise Computation Tasks on Multicores »

In several real-time application domains, less accurate results, computed before the deadline, are preferable than accurate, but too late, results. This is due to the fact that a real-time application has to provide a result before its deadline. When not enough time is available, approximate results are acceptable, as long as the minimum acceptable Quality-of-Service (QoS) is satisfied and the results are provided in time. For instance, in audio and video streaming, frames with a lower quality are better than missing frames. In such domains, a task can be logically decomposed into a mandatory subtask and an optional subtask. This decomposition is typically modelled by the Imprecise Computation (IC) task model. The mandatory subtask must be completed before the deadline in order to generate the minimum acceptable quality, i.e., the baseline QoS. The optional subtask refines the obtained result in order to increase the baseline QoS. At the same time, the system energy consumption has become an important concern in multicore architectures. In this context, the longer an optional subtask is executed, the higher QoS is achieved. However, more energy is consumed and more time is required for the optional subtask execution. In order to maximize the quality of the application results, and at the same time meet the constraints on energy consumption and real-time execution, proper task deployment approaches are required. Typical state-of-the-art solving techniques either demand high complexity or can only achieve feasible (suboptimal) solutions. We will present an effective decomposition-based approach to achieve an optimal solution while reducing computational complexity. The main idea of proposed method is to decompose the original problem into two smaller easier-to-solve problems: a master problem for IC-tasks allocation and a slave problem for IC-tasks scheduling. We also heuristics to obtain near-optimal results with a negligible solution time and less sensitivity to the problem parameters.

  • Hamza Ouarnighi, Hardware-aware Deep Learning for Edge Devices

Neural Architecture Search (NAS) methods have been growing in popularity. These techniques have been fundamental to automate and speed up the time consuming and error-prone process of synthesizing novel Deep Learning (DL) architectures. NAS has been extensively studied in the past few years. Arguably their most significant impact has been in image classification and object detection tasks where the state of the art results have been obtained. Despite the significant success achieved to date, applying NAS to real-world problems still poses significant challenges and is not widely practical. In general, the synthesized Convolution Neural Network (CNN) architectures are too complex to be deployed in resource-limited platforms, such as IoT, mobile, and embedded systems. One solution growing in popularity is to use multi-objective optimization algorithms in the NAS search strategy by taking into account execution latency, energy consumption, memory footprint, etc. This kind of NAS, called hardware-aware NAS (HW-NAS), makes searching the most efficient architecture more complicated and opens several questions. In this survey, we provide a detailed review of existing HW-NAS research and categorize them according to four key dimensions: the search space, the search strategy, the acceleration technique, and the hardware cost estimation strategies. We further discuss the challenges and limitations of existing approaches and potential future directions. This is the first survey paper focusing on hardware-aware NAS. We hope it serves as a valuable reference for the various techniques and algorithms discussed and paves the road for future research towards hardware-aware NAS.

Organisateurs :

  • Lilia Zaourar, CEA, pour le GdR RO
  • André Rossi, Université Paris-Dauphine, pour le GdR RO
  • Alix Munier, LIP6, pour le GdR RO
  • Mickaël Dardaillon, INSA Rennes/IETR, pour le GdR SOC2
  • Kevin Martin, Univ. Bretagne-Sud/Lab-STICC, pour le GdR SOC2

Vous utilisez une méthode originale pour résoudre un problème NP en lien avec les systèmes intégrés ? Venez nous en parler !
N’hésitez pas à contacter Lilia Zaourar <lilia.zaourar@cea.fr> ou Kevin Martin <kevin.martin@univ-ubs.fr>.