基本検査データで10年以内にCHDを予想。

Photo by Robina Weermeijer on Unsplash

The Framingham Heart Study

Eli Kitagawa eli@Gracekyoto.com

Coronary Heart Disease (CHD) Original analysis from Framingham Heart Study https://biolincc.nhlbi.nih.gov/media/teachingstudies/framdoc.pdf

Load packages and data

There is some missing values at education, cigsPerDay, BPMeds, totoChol, BMI, heartRate and glucose. We may guess reason for this some variances are float64 rather than int64. This makes analysis inefficient.


RangeIndex: 4240 entries, 0 to 4239
Data columns (total 16 columns):
male               4240 non-null int64
age                4240 non-null int64
education          4135 non-null float64
currentSmoker      4240 non-null int64
cigsPerDay         4211 non-null float64
BPMeds             4187 non-null float64
prevalentStroke    4240 non-null int64
prevalentHyp       4240 non-null int64
diabetes           4240 non-null int64
totChol            4190 non-null float64
sysBP              4240 non-null float64
diaBP              4240 non-null float64
BMI                4221 non-null float64
heartRate          4239 non-null float64
glucose            3852 non-null float64
TenYearCHD         4240 non-null int64
dtypes: float64(9), int64(7)
memory usage: 530.1 KB

Check Missing data

Glucose has about 9.15% of missing value We can see glucose positive 0.15 correlation with target value, it is higher than average.

Check and modify variances for analysis

Unique Values in Education [ 4.  2.  1.  3. nan]

This value is categorical value change Missing Value Change to 0 for now

Unique Values in cigsPerDay 
 [ 0. 20. 30. 23. 15.  9. 10.  5. 35. 43.  1. 40.  3.  2. nan 12.  4. 18.
 25. 60. 14. 45.  8. 50. 13. 11.  7.  6. 38. 29. 17. 16. 19. 70.] 
Unique Values in BPMeds [ 0.  1. nan] 
Unique Values in totChol 
 [195. 250. 245. 225. 285. 228. 205. 313. 260. 254. 247. 294. 332. 226.
 221. 232. 291. 190. 185. 234. 215. 270. 272. 295. 209. 175. 214. 257.
 178. 233. 180. 243. 237.  nan 311. 208. 252. 261. 179. 194. 267. 216.
 240. 266. 255. 220. 235. 212. 223. 300. 302. 248. 200. 189. 258. 202.
 213. 183. 274. 170. 210. 197. 326. 188. 256. 244. 193. 239. 296. 269.
 275. 268. 265. 173. 273. 290. 278. 264. 282. 241. 288. 222. 303. 246.
 150. 187. 286. 154. 279. 293. 259. 219. 230. 320. 312. 165. 159. 174.
 242. 301. 167. 308. 325. 229. 236. 224. 253. 464. 171. 186. 227. 249.
 176. 163. 191. 263. 196. 310. 164. 135. 238. 207. 342. 287. 182. 352.
 284. 217. 203. 262. 129. 155. 323. 206. 283. 319. 304. 340. 328. 280.
 368. 218. 276. 339. 231. 198. 177. 201. 277. 184. 199. 168. 292. 305.
 306. 152. 161. 181. 251. 271. 370. 439. 145. 330. 157. 398. 162. 314.
 166. 160. 281. 289. 355. 307. 156. 329. 143. 211. 298. 334. 192. 204.
 318. 309. 353. 360. 335. 158. 372. 346. 169. 140. 324. 600. 315. 392.
 322. 149. 137. 172. 317. 358. 153. 345. 391. 410. 297. 356. 338. 107.
 148. 366. 333. 327. 344. 126. 365. 362. 316. 144. 351. 390. 321. 405.
 359. 350. 336. 380. 299. 124. 371. 113. 354. 382. 364. 341. 133. 367.
 432. 337. 696. 363. 331. 361. 453. 347. 373. 385. 119.] 
Unique Values in sysBP 
 [106.  121.  127.5 150.  130.  180.  138.  100.  141.5 162.  133.  131.
 142.  124.  114.  140.  112.  122.  139.  108.  123.5 148.  132.  137.5
 102.  110.  182.  115.  134.  147.  124.5 153.5 160.  153.  111.  116.5
 206.   96.  179.5 119.  116.  156.5 145.  143.5 158.  157.  126.5 136.
 154.  190.  107.  112.5 164.5 138.5 155.  151.  152.  179.  113.  200.
 132.5 126.  123.  141.  135.  187.  127.  160.5 105.  109.  128.  118.
 109.5 117.5 149.  180.5 136.5 212.  125.  191.  121.5 173.  144.  129.5
 117.  144.5 170.  137.   94.  119.5 143.  166.  139.5 177.5 129.  159.
 130.5 107.5 189.  168.  197.5 146.  174.  122.5  98.  131.5 195.  101.
 158.5  97.  151.5  97.5 120.  204.  157.5 140.5 171.  215.   95.  156.
 165.  178.  146.5 113.5 188.  197.   90.  152.5  95.5 209.  162.5 295.
 108.5 103.  145.5 134.5 115.5 118.5 174.5 163.  185.  220.  164.  120.5
  98.5 161.  168.5 176.  163.5 128.5 167.  205.5 167.5 172.5 183.  186.
 147.5 175.  142.5 192.   96.5 159.5 177.  102.5 133.5 244.  104.  213.
 199.  184.  198.  114.5 125.5 111.5 105.5 161.5 171.5 201.  148.5 169.
 154.5  93.5 172.  243.  187.5  99.  181.  100.5 104.5 135.5 185.5 103.5
 149.5 182.5 186.5 217.  196.  193.  110.5 155.5  92.  166.5 202.  150.5
 232.   85.5 184.5 235.  205.  169.5 210.  181.5 188.5 176.5  92.5 202.5
 191.5 208.   83.5 106.5 170.5  93.  175.5 207.5 199.5 101.5 248.   99.5
  85.  230.  214.  192.5 194.  207. ] 
Unique Values in totChol 
 [ 70.   81.   80.   95.   84.  110.   71.   89.  107.   76.   88.   94.
  64.   90.   78.   84.5  70.5  77.5  82.   68.   72.5  91.  121.   85.5
  85.   82.5  74.   92.5 102.   98.  101.   73.   92.   83.5  63.  114.
  69.   93.   66.   75.   79.   87.   99.   60.   67.5 106.   86.5 104.
  86.   61.5  71.5  76.5  77.   88.5 105.   96.   97.  100.   81.5 106.5
  80.5 124.5  61.   83.   67.   74.5  66.5  65.   72.   99.5 122.5  57.
  57.5 111.   78.5 104.5  89.5 112.   55.  123.  120.   75.5 118.   97.5
  59.  133.   69.5  95.5  96.5 135.   64.5  68.5  98.5  62.  117.   59.5
 103.  108.5  73.5  87.5 108.   93.5  90.5 114.5  62.5  94.5 140.  124.
  79.5 109.   91.5 115.  102.5  65.5 105.5 103.5  63.5 107.5 142.5 109.5
  58.  117.5 116.5 100.5 116.  119.   54.  132.   50.  101.5 136.   51.
 128.  125.  112.5 130.  110.5 113.   53.  129.   52.   48.   56.   60.5
 115.5 127.5] 
Unique Values in totChol 
 [26.97 28.73 25.34 ... 26.7  43.67 20.91] 
Unique Values in totChol 
 [ 80.  95.  75.  65.  85.  77.  60.  79.  76.  93.  72.  98.  64.  70.
  71.  62.  73.  90.  96.  68.  63.  88.  78.  83. 100.  67.  84.  57.
  50.  74.  86.  55.  92.  66.  87. 110.  81.  56.  89.  82.  48. 105.
  61.  54.  69.  52.  94. 140. 130.  58. 108. 104.  91.  53.  nan 106.
  59.  51. 102. 107. 112. 125. 103.  44.  47.  45.  97. 122. 120.  99.
 115. 143. 101.  46.] 

glucose

Unique Values in glucose 
 [ 77.  76.  70. 103.  85.  99.  78.  79.  88.  61.  64.  84.  nan  72.
  89.  65. 113.  75.  83.  66.  74.  63.  87. 225.  90.  80. 100. 215.
  98.  62.  95.  94.  55.  82.  93.  73.  45. 202.  68.  97. 104.  96.
 126. 120. 105.  71.  56.  60. 117. 102.  58.  92. 109.  86. 107.  54.
  67.  69.  57.  91. 132. 150.  59.  81. 115. 140. 112. 118. 143. 114.
 160. 110. 123. 108. 145. 122. 137. 106. 127. 205. 130. 101.  47.  53.
 216. 163. 144. 116. 121. 172. 124. 111.  40. 186. 223. 325.  44. 156.
 268.  50. 274. 292. 255. 136. 206. 131. 148. 297.  43. 173.  48. 386.
 155. 147. 170.  52. 320. 254. 394. 270. 244. 183. 142. 119. 135. 167.
 207. 129. 177. 250. 294. 166. 125. 332. 368. 348. 248. 370. 193. 191.
 256. 235. 210. 260.] 

Missing values are over 9% and corrleation with Target value is higher than average, so do some prediction for missing values

19.95151996187076
18.30354972872754
18.271335361896362
17.94537406634815

References

BioLINCC

Wikipidia https://en.wikipedia.org/wiki/Framingham_Heart_Study

MIT 15.071x(2018) The Framingham Heart Study