Aiming to efficiently capture the formaldehyde (HCHO) with low content in the air exceeding the standard, 31,399 hydrophobic metal-organic frameworks (MOFs) were first selected from 137,953 hypothetical MOFs to calculate their formaldehyde adsorption performance, namely, adsorption capacity (NHCHO) and selectivity (SHCHO=N2+O2) by molecular simulation and machine learning (ML). To combine the SHCHO=N2+O2 and NHCHO, a new performance metric, the tradeoff between selectivity and capacity (TSC) was proposed to identify more reasonably the top-performing MOFs. The MOFs were divided into three datasets (i.e., all of the MOFs (AM), MOFs with top 5% of SHCHO=N2+O2HCHO=N2+(PS) and MOFs with top 5% of NHCHO (PN)) to scrutinize and explore the characteristics of different materials capturing formaldehyde from the air (N2 and O2). Furthermore, after four ML algorithms (the back propagation neural network (BPNN), support vector machine (SVM), extreme learning machine (ELM), and random forest (RF)) are applied to quantitatively assess the prediction effects of performance indexes in different datasets, RF algorithm with the most accurate prediction revealed that the TSC has strong correlations with the MOF descriptors in PS dataset. In view of 14.10% of the promising MOFs occupied PN, the design paths of excellent adsorbents for six MOF descriptors were quantitatively determined, especially for the Henry's coefficient (KHCHO) and heat of adsorption of formaldehyde (Qst0). Their probabilities of obtaining excellent MOFs could reach 100% and 77.42%, respectively, and both the relative importance and the trends of univariate analysis coherently confirm the important positions of KHCHO and Qst0. Finally, 20 best MOFs were identified for the single-step separation of formaldehyde with low concentration. The microscopic insights and structure-performance relationship predictions from this computational and ML study are useful toward the development of new MOFs for the capture of formaldehyde from air.