BMC Public Health | |
An application in identifying high-risk populations in alternative tobacco product use utilizing logistic regression and CART: a heuristic comparison | |
Research Article | |
Qing Yu1  Yang Lei1  Matthew S Mayo1  Jasjit S Ahluwahlia2  Nikki Nollen3  | |
[1] Department of Biostatistics, The University of Kansas Medical Center, Kansas City, KS, USA;Department of Medicine and Center for Health Equity, The University of Minnesota Medical School, Minneapolis, MN, USA;Department of Preventive Medicine and Public Health, The University of Kansas Medical Center, Kansas City, KS, USA; | |
关键词: Survey sampling; Stratified samples; Logistic regression; CART; Partial interaction; | |
DOI : 10.1186/s12889-015-1582-z | |
received in 2014-08-11, accepted in 2015-02-24, 发布年份 2015 | |
来源: Springer | |
【 摘 要 】
BackgroundOther forms of tobacco use are increasing in prevalence, yet most tobacco control efforts are aimed at cigarettes. In light of this, it is important to identify individuals who are using both cigarettes and alternative tobacco products (ATPs). Most previous studies have used regression models. We conducted a traditional logistic regression model and a classification and regression tree (CART) model to illustrate and discuss the added advantages of using CART in the setting of identifying high-risk subgroups of ATP users among cigarettes smokers.MethodsThe data were collected from an online cross-sectional survey administered by Survey Sampling International between July 5, 2012 and August 15, 2012. Eligible participants self-identified as current smokers, African American, White, or Latino (of any race), were English-speaking, and were at least 25 years old. The study sample included 2,376 participants and was divided into independent training and validation samples for a hold out validation. Logistic regression and CART models were used to examine the important predictors of cigarettes + ATP users.ResultsThe logistic regression model identified nine important factors: gender, age, race, nicotine dependence, buying cigarettes or borrowing, whether the price of cigarettes influences the brand purchased, whether the participants set limits on cigarettes per day, alcohol use scores, and discrimination frequencies. The C-index of the logistic regression model was 0.74, indicating good discriminatory capability. The model performed well in the validation cohort also with good discrimination (c-index = 0.73) and excellent calibration (R-square = 0.96 in the calibration regression). The parsimonious CART model identified gender, age, alcohol use score, race, and discrimination frequencies to be the most important factors. It also revealed interesting partial interactions. The c-index is 0.70 for the training sample and 0.69 for the validation sample. The misclassification rate was 0.342 for the training sample and 0.346 for the validation sample. The CART model was easier to interpret and discovered target populations that possess clinical significance.ConclusionThis study suggests that the non-parametric CART model is parsimonious, potentially easier to interpret, and provides additional information in identifying the subgroups at high risk of ATP use among cigarette smokers.
【 授权许可】
CC BY
© Lei et al.; licensee BioMed Central. 2015
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311090499379ZK.pdf | 752KB | download | |
12864_2016_2682_Article_IEq39.gif | 1KB | Image | download |
【 图 表 】
12864_2016_2682_Article_IEq39.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]