## Research

My main research areas are Convex Optimization and Conditional Gradients.

#### Approximate Vanishing Ideal Computations at Scale

The approximate vanishing ideal of a set of points X={x1,…,xm}⊆[0,1]n is the set of polynomials that approximately evaluate to 0 over all points x∈X and admits an efficient representation by a finite set of polynomials called generators. Algorithms that construct this set of generators are extensively studied but ultimately find little practical application because their computational complexities are thought to be superlinear in the number of samples m. In this paper, we focus on scaling up the Oracle Approximate Vanishing Ideal algorithm (OAVI), one of the most powerful of these methods. We prove that the computational complexity of OAVI is not superlinear but linear in the number of samples m and polynomial in the number of features n, making OAVI an attractive preprocessing technique for large-scale machine learning. To further accelerate OAVI's training time, we propose two changes: First, as the name suggests, OAVI makes repeated oracle calls to convex solvers throughout its execution. By replacing the Pairwise Conditional Gradients algorithm, one of the standard solvers used in OAVI, with the faster Blended Pairwise Conditional Gradients algorithm, we illustrate how OAVI directly benefits from advancements in the study of convex solvers. Second, we propose Inverse Hessian Boosting (IHB): IHB exploits the fact that OAVI repeatedly solves quadratic convex optimization problems that differ only by very little and whose solutions can be written in closed form using inverse Hessian information. By efficiently updating the inverse of the Hessian matrix, the convex optimization problems can be solved almost instantly, accelerating OAVI's training time by up to multiple orders of magnitude. We complement our theoretical analysis with extensive numerical experiments on data sets whose sample numbers are in the millions.

#### Acceleration of Frank-Wolfe algorithms with open loop step-sizes

Frank-Wolfe algorithms (FW) are popular first-order methods to solve convex constrained optimization problems that rely on a linear minimization oracle instead of potentially expensive projection-like oracles. Many works have identified accelerated convergence rates under various structural assumptions on the optimization problem and for specific FW variants when using line search or short-step, requiring feedback from the objective function. Little is known about accelerated convergence regimes when utilizing open loop step-size rules, a.k.a. FW with pre-determined step-sizes, which are algorithmically extremely simple and stable. Not only is FW with open loop step-size rules not always subject to the same convergence rate lower bounds as FW with line search or short-step, but in some specific cases, such as kernel herding in infinite-dimensions, it is observed that FW with open loop step-size rules leads to faster convergence as opposed to FW with line search or short-step. We propose a partial answer to this open problem in kernel herding, characterize a general setting for which FW with open loop step-size rules converges non-asymptotically faster than with line search or short-step, and derive several accelerated convergence results for FW (and two of its variants) with open loop step-size rules. Finally, our numerical experiments highlight potential gaps in our current understanding of the FW method in general.

#### Conditional Gradients for the Approximately Vanishing Ideal

The vanishing ideal of a set of points X⊆ℝn is the set of polynomials that evaluate to 0 over all points x∈X and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the Conditional Gradients Approximately Vanishing Ideal algorithm (CGAVI) for the construction of the set of generators of the approximately vanishing ideal. The constructed set of generators captures polynomial structures in data and gives rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In CGAVI, we construct the set of generators by solving specific instances of (constrained) convex optimization problems with the Pairwise Frank-Wolfe algorithm (PFW). Among other things, the constructed generators inherit the LASSO generalization bound and not only vanish on the training but also on out-sample data. Moreover, CGAVI admits a compact representation of the approximately vanishing ideal by constructing few generators with sparse coefficient vectors.

#### SoS Certification for Symmetric Quadratic Functions and Its Connection to Constrained Boolean Hypercube Optimization

We study the rank of the Sum of Squares (SoS) hierarchy over the Boolean hypercube for Symmetric Quadratic Functions (SQFs) in n variables with roots placed in points k-1 and k. Functions of this type have played a central role in deepening the understanding of the performance of the SoS method for various unconstrained Boolean hypercube optimization problems, including the Max Cut problem. Recently, Lee, Prakash, de Wolf, and Yuen proved a lower bound on the SoS rank for SQFs of Ω(√{k(n-k)}) and conjectured the lower bound of Ω(n) by similarity to a polynomial representation of the n-bit OR function. Leveraging recent developments on Chebyshev polynomials, we refute the Lee-Prakash-de Wolf-Yuen conjecture and prove that the SoS rank for SQFs is at most O(√{nk}log(n)). We connect this result to two constrained Boolean hypercube optimization problems. First, we provide a degree O(√n) SoS certificate that matches the known SoS rank lower bound for an instance of Min Knapsack, a problem that was intensively studied in the literature. Second, we study an instance of the Set Cover problem for which Bienstock and Zuckerberg conjectured an SoS rank lower bound of n/4. We refute the Bienstock-Zuckerberg conjecture and provide a degree O(√nlog(n)) SoS certificate for this problem.

#### Efficient Online-Bandit Strategies for Minimax Learning Problems

Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or learning with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the minimization is carried out over the model parameters w∈W and the maximization over the empirical distribution p∈K of the training set indexes, where K is the simplex or a subset of it. To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm. We argue that the efficiency of such approaches critically depends on the structure of K and propose two properties of K that facilitate designing efficient algorithms. We focus on a specific family of sets Sn,k encompassing various learning applications and provide high-probability convergence guarantees to the minimax values.