Exercise 2.8
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
pd.options.display.float_format = '{:,.2f}'.format # Print only 2 decimal cases.
(a) Read csv
college = pd.read_csv("../data/College.csv") # Portable import, works on Windows as well.
college
Unnamed: 0 | Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Abilene Christian University | Yes | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 | 70 | 78 | 18.10 | 12 | 7041 | 60 |
1 | Adelphi University | Yes | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 | 29 | 30 | 12.20 | 16 | 10527 | 56 |
2 | Adrian College | Yes | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 | 53 | 66 | 12.90 | 30 | 8735 | 54 |
3 | Agnes Scott College | Yes | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 | 92 | 97 | 7.70 | 37 | 19016 | 59 |
4 | Alaska Pacific University | Yes | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 | 76 | 72 | 11.90 | 2 | 10922 | 15 |
5 | Albertson College | Yes | 587 | 479 | 158 | 38 | 62 | 678 | 41 | 13500 | 3335 | 500 | 675 | 67 | 73 | 9.40 | 11 | 9727 | 55 |
6 | Albertus Magnus College | Yes | 353 | 340 | 103 | 17 | 45 | 416 | 230 | 13290 | 5720 | 500 | 1500 | 90 | 93 | 11.50 | 26 | 8861 | 63 |
7 | Albion College | Yes | 1899 | 1720 | 489 | 37 | 68 | 1594 | 32 | 13868 | 4826 | 450 | 850 | 89 | 100 | 13.70 | 37 | 11487 | 73 |
8 | Albright College | Yes | 1038 | 839 | 227 | 30 | 63 | 973 | 306 | 15595 | 4400 | 300 | 500 | 79 | 84 | 11.30 | 23 | 11644 | 80 |
9 | Alderson-Broaddus College | Yes | 582 | 498 | 172 | 21 | 44 | 799 | 78 | 10468 | 3380 | 660 | 1800 | 40 | 41 | 11.50 | 15 | 8991 | 52 |
10 | Alfred University | Yes | 1732 | 1425 | 472 | 37 | 75 | 1830 | 110 | 16548 | 5406 | 500 | 600 | 82 | 88 | 11.30 | 31 | 10932 | 73 |
11 | Allegheny College | Yes | 2652 | 1900 | 484 | 44 | 77 | 1707 | 44 | 17080 | 4440 | 400 | 600 | 73 | 91 | 9.90 | 41 | 11711 | 76 |
12 | Allentown Coll. of St. Francis de Sales | Yes | 1179 | 780 | 290 | 38 | 64 | 1130 | 638 | 9690 | 4785 | 600 | 1000 | 60 | 84 | 13.30 | 21 | 7940 | 74 |
13 | Alma College | Yes | 1267 | 1080 | 385 | 44 | 73 | 1306 | 28 | 12572 | 4552 | 400 | 400 | 79 | 87 | 15.30 | 32 | 9305 | 68 |
14 | Alverno College | Yes | 494 | 313 | 157 | 23 | 46 | 1317 | 1235 | 8352 | 3640 | 650 | 2449 | 36 | 69 | 11.10 | 26 | 8127 | 55 |
15 | American International College | Yes | 1420 | 1093 | 220 | 9 | 22 | 1018 | 287 | 8700 | 4780 | 450 | 1400 | 78 | 84 | 14.70 | 19 | 7355 | 69 |
16 | Amherst College | Yes | 4302 | 992 | 418 | 83 | 96 | 1593 | 5 | 19760 | 5300 | 660 | 1598 | 93 | 98 | 8.40 | 63 | 21424 | 100 |
17 | Anderson University | Yes | 1216 | 908 | 423 | 19 | 40 | 1819 | 281 | 10100 | 3520 | 550 | 1100 | 48 | 61 | 12.10 | 14 | 7994 | 59 |
18 | Andrews University | Yes | 1130 | 704 | 322 | 14 | 23 | 1586 | 326 | 9996 | 3090 | 900 | 1320 | 62 | 66 | 11.50 | 18 | 10908 | 46 |
19 | Angelo State University | No | 3540 | 2001 | 1016 | 24 | 54 | 4190 | 1512 | 5130 | 3592 | 500 | 2000 | 60 | 62 | 23.10 | 5 | 4010 | 34 |
20 | Antioch University | Yes | 713 | 661 | 252 | 25 | 44 | 712 | 23 | 15476 | 3336 | 400 | 1100 | 69 | 82 | 11.30 | 35 | 42926 | 48 |
21 | Appalachian State University | No | 7313 | 4664 | 1910 | 20 | 63 | 9940 | 1035 | 6806 | 2540 | 96 | 2000 | 83 | 96 | 18.30 | 14 | 5854 | 70 |
22 | Aquinas College | Yes | 619 | 516 | 219 | 20 | 51 | 1251 | 767 | 11208 | 4124 | 350 | 1615 | 55 | 65 | 12.70 | 25 | 6584 | 65 |
23 | Arizona State University Main campus | No | 12809 | 10308 | 3761 | 24 | 49 | 22593 | 7585 | 7434 | 4850 | 700 | 2100 | 88 | 93 | 18.90 | 5 | 4602 | 48 |
24 | Arkansas College (Lyon College) | Yes | 708 | 334 | 166 | 46 | 74 | 530 | 182 | 8644 | 3922 | 500 | 800 | 79 | 88 | 12.60 | 24 | 14579 | 54 |
25 | Arkansas Tech University | No | 1734 | 1729 | 951 | 12 | 52 | 3602 | 939 | 3460 | 2650 | 450 | 1000 | 57 | 60 | 19.60 | 5 | 4739 | 48 |
26 | Assumption College | Yes | 2135 | 1700 | 491 | 23 | 59 | 1708 | 689 | 12000 | 5920 | 500 | 500 | 93 | 93 | 13.80 | 30 | 7100 | 88 |
27 | Auburn University-Main Campus | No | 7548 | 6791 | 3070 | 25 | 57 | 16262 | 1716 | 6300 | 3933 | 600 | 1908 | 85 | 91 | 16.70 | 18 | 6642 | 69 |
28 | Augsburg College | Yes | 662 | 513 | 257 | 12 | 30 | 2074 | 726 | 11902 | 4372 | 540 | 950 | 65 | 65 | 12.80 | 31 | 7836 | 58 |
29 | Augustana College IL | Yes | 1879 | 1658 | 497 | 36 | 69 | 1950 | 38 | 13353 | 4173 | 540 | 821 | 78 | 83 | 12.70 | 40 | 9220 | 71 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
747 | Westfield State College | No | 3100 | 2150 | 825 | 3 | 20 | 3234 | 941 | 5542 | 3788 | 500 | 1300 | 75 | 79 | 15.70 | 20 | 4222 | 65 |
748 | Westminster College MO | Yes | 662 | 553 | 184 | 20 | 43 | 665 | 37 | 10720 | 4050 | 600 | 1650 | 66 | 70 | 12.50 | 20 | 7925 | 62 |
749 | Westminster College | Yes | 996 | 866 | 377 | 29 | 58 | 1411 | 72 | 12065 | 3615 | 430 | 685 | 62 | 78 | 12.50 | 41 | 8596 | 80 |
750 | Westminster College of Salt Lake City | Yes | 917 | 720 | 213 | 21 | 60 | 979 | 743 | 8820 | 4050 | 600 | 2025 | 68 | 83 | 10.50 | 34 | 7170 | 50 |
751 | Westmont College | No | 950 | 713 | 351 | 42 | 72 | 1276 | 9 | 14320 | 5304 | 490 | 1410 | 77 | 77 | 14.90 | 17 | 8837 | 87 |
752 | Wheaton College IL | Yes | 1432 | 920 | 548 | 56 | 84 | 2200 | 56 | 11480 | 4200 | 530 | 1400 | 81 | 83 | 12.70 | 40 | 11916 | 85 |
753 | Westminster College PA | Yes | 1738 | 1373 | 417 | 21 | 55 | 1335 | 30 | 18460 | 5970 | 700 | 850 | 92 | 96 | 13.20 | 41 | 22704 | 71 |
754 | Wheeling Jesuit College | Yes | 903 | 755 | 213 | 15 | 49 | 971 | 305 | 10500 | 4545 | 600 | 600 | 66 | 71 | 14.10 | 27 | 7494 | 72 |
755 | Whitman College | Yes | 1861 | 998 | 359 | 45 | 77 | 1220 | 46 | 16670 | 4900 | 750 | 800 | 80 | 83 | 10.50 | 51 | 13198 | 72 |
756 | Whittier College | Yes | 1681 | 1069 | 344 | 35 | 63 | 1235 | 30 | 16249 | 5699 | 500 | 1998 | 84 | 92 | 13.60 | 29 | 11778 | 52 |
757 | Whitworth College | Yes | 1121 | 926 | 372 | 43 | 70 | 1270 | 160 | 12660 | 4500 | 678 | 2424 | 80 | 80 | 16.90 | 20 | 8328 | 80 |
758 | Widener University | Yes | 2139 | 1492 | 502 | 24 | 64 | 2186 | 2171 | 12350 | 5370 | 500 | 1350 | 88 | 86 | 12.60 | 19 | 9603 | 63 |
759 | Wilkes University | Yes | 1631 | 1431 | 434 | 15 | 36 | 1803 | 603 | 11150 | 5130 | 550 | 1260 | 78 | 92 | 13.30 | 24 | 8543 | 67 |
760 | Willamette University | Yes | 1658 | 1327 | 395 | 49 | 80 | 1595 | 159 | 14800 | 4620 | 400 | 790 | 91 | 94 | 13.30 | 37 | 10779 | 68 |
761 | William Jewell College | Yes | 663 | 547 | 315 | 32 | 67 | 1279 | 75 | 10060 | 2970 | 500 | 2600 | 74 | 80 | 11.20 | 19 | 7885 | 59 |
762 | William Woods University | Yes | 469 | 435 | 227 | 17 | 39 | 851 | 120 | 10535 | 4365 | 550 | 3700 | 39 | 66 | 12.90 | 16 | 7438 | 52 |
763 | Williams College | Yes | 4186 | 1245 | 526 | 81 | 96 | 1988 | 29 | 19629 | 5790 | 500 | 1200 | 94 | 99 | 9.00 | 64 | 22014 | 99 |
764 | Wilson College | Yes | 167 | 130 | 46 | 16 | 50 | 199 | 676 | 11428 | 5084 | 450 | 475 | 67 | 76 | 8.30 | 43 | 10291 | 67 |
765 | Wingate College | Yes | 1239 | 1017 | 383 | 10 | 34 | 1207 | 157 | 7820 | 3400 | 550 | 1550 | 69 | 81 | 13.90 | 8 | 7264 | 91 |
766 | Winona State University | No | 3325 | 2047 | 1301 | 20 | 45 | 5800 | 872 | 4200 | 2700 | 300 | 1200 | 53 | 60 | 20.20 | 18 | 5318 | 58 |
767 | Winthrop University | No | 2320 | 1805 | 769 | 24 | 61 | 3395 | 670 | 6400 | 3392 | 580 | 2150 | 71 | 80 | 12.80 | 26 | 6729 | 59 |
768 | Wisconsin Lutheran College | Yes | 152 | 128 | 75 | 17 | 41 | 282 | 22 | 9100 | 3700 | 500 | 1400 | 48 | 48 | 8.50 | 26 | 8960 | 50 |
769 | Wittenberg University | Yes | 1979 | 1739 | 575 | 42 | 68 | 1980 | 144 | 15948 | 4404 | 400 | 800 | 82 | 95 | 12.80 | 29 | 10414 | 78 |
770 | Wofford College | Yes | 1501 | 935 | 273 | 51 | 83 | 1059 | 34 | 12680 | 4150 | 605 | 1440 | 91 | 92 | 15.30 | 42 | 7875 | 75 |
771 | Worcester Polytechnic Institute | Yes | 2768 | 2314 | 682 | 49 | 86 | 2802 | 86 | 15884 | 5370 | 530 | 730 | 92 | 94 | 15.20 | 34 | 10774 | 82 |
772 | Worcester State College | No | 2197 | 1515 | 543 | 4 | 26 | 3089 | 2029 | 6797 | 3900 | 500 | 1200 | 60 | 60 | 21.00 | 14 | 4469 | 40 |
773 | Xavier University | Yes | 1959 | 1805 | 695 | 24 | 47 | 2849 | 1107 | 11520 | 4960 | 600 | 1250 | 73 | 75 | 13.30 | 31 | 9189 | 83 |
774 | Xavier University of Louisiana | Yes | 2097 | 1915 | 695 | 34 | 61 | 2793 | 166 | 6900 | 4200 | 617 | 781 | 67 | 75 | 14.40 | 20 | 8323 | 49 |
775 | Yale University | Yes | 10705 | 2453 | 1317 | 95 | 99 | 5217 | 83 | 19840 | 6510 | 630 | 2115 | 96 | 96 | 5.80 | 49 | 40386 | 99 |
776 | York College of Pennsylvania | Yes | 2989 | 1855 | 691 | 28 | 63 | 2988 | 1726 | 4990 | 3560 | 500 | 1250 | 75 | 75 | 18.10 | 28 | 4509 | 99 |
777 rows × 19 columns
(b) University names as index
The fix() function in R (similar to edit()) allows on-the-fly edit to the dataframe by invoking an editor. Further details can be found here and here.
# [1]
college = college.set_index("Unnamed: 0") # The default option 'drop=True', deletes the column
college.index.name = 'Names'
college.head()
# The empty row below the columns names (e.g. Private, Apps, etc.) is there because the index has a name and that creates an additional row.
Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Names | ||||||||||||||||||
Abilene Christian University | Yes | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 | 70 | 78 | 18.10 | 12 | 7041 | 60 |
Adelphi University | Yes | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 | 29 | 30 | 12.20 | 16 | 10527 | 56 |
Adrian College | Yes | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 | 53 | 66 | 12.90 | 30 | 8735 | 54 |
Agnes Scott College | Yes | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 | 92 | 97 | 7.70 | 37 | 19016 | 59 |
Alaska Pacific University | Yes | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 | 76 | 72 | 11.90 | 2 | 10922 | 15 |
[1] https://campus.datacamp.com/courses/manipulating-dataframes-with-pandas/advanced-indexing?ex=1
# Alternative solution: We could have done this all in one less line with:
college = pd.read_csv('../data/College.csv', index_col='Unnamed: 0')
college.index.name = 'Names'
college.head()
Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Names | ||||||||||||||||||
Abilene Christian University | Yes | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 | 70 | 78 | 18.10 | 12 | 7041 | 60 |
Adelphi University | Yes | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 | 29 | 30 | 12.20 | 16 | 10527 | 56 |
Adrian College | Yes | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 | 53 | 66 | 12.90 | 30 | 8735 | 54 |
Agnes Scott College | Yes | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 | 92 | 97 | 7.70 | 37 | 19016 | 59 |
Alaska Pacific University | Yes | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 | 76 | 72 | 11.90 | 2 | 10922 | 15 |
(c)
i. Summary
college.describe(include='all')
# [2, 3, 4] Without the 'all' option, the column 'Private' is not shown because it is categorical
Private | Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 777 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 |
unique | 2 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
top | Yes | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
freq | 565 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
mean | NaN | 3,001.64 | 2,018.80 | 779.97 | 27.56 | 55.80 | 3,699.91 | 855.30 | 10,440.67 | 4,357.53 | 549.38 | 1,340.64 | 72.66 | 79.70 | 14.09 | 22.74 | 9,660.17 | 65.46 |
std | NaN | 3,870.20 | 2,451.11 | 929.18 | 17.64 | 19.80 | 4,850.42 | 1,522.43 | 4,023.02 | 1,096.70 | 165.11 | 677.07 | 16.33 | 14.72 | 3.96 | 12.39 | 5,221.77 | 17.18 |
min | NaN | 81.00 | 72.00 | 35.00 | 1.00 | 9.00 | 139.00 | 1.00 | 2,340.00 | 1,780.00 | 96.00 | 250.00 | 8.00 | 24.00 | 2.50 | 0.00 | 3,186.00 | 10.00 |
25% | NaN | 776.00 | 604.00 | 242.00 | 15.00 | 41.00 | 992.00 | 95.00 | 7,320.00 | 3,597.00 | 470.00 | 850.00 | 62.00 | 71.00 | 11.50 | 13.00 | 6,751.00 | 53.00 |
50% | NaN | 1,558.00 | 1,110.00 | 434.00 | 23.00 | 54.00 | 1,707.00 | 353.00 | 9,990.00 | 4,200.00 | 500.00 | 1,200.00 | 75.00 | 82.00 | 13.60 | 21.00 | 8,377.00 | 65.00 |
75% | NaN | 3,624.00 | 2,424.00 | 902.00 | 35.00 | 69.00 | 4,005.00 | 967.00 | 12,925.00 | 5,050.00 | 600.00 | 1,700.00 | 85.00 | 92.00 | 16.50 | 31.00 | 10,830.00 | 78.00 |
max | NaN | 48,094.00 | 26,330.00 | 6,392.00 | 96.00 | 100.00 | 31,643.00 | 21,836.00 | 21,700.00 | 8,124.00 | 2,340.00 | 6,800.00 | 103.00 | 100.00 | 39.80 | 64.00 | 56,233.00 | 118.00 |
# Alternative solution: call describe twice. One on number, and another on object.
college.describe(include=['number'])
# or college.describe(include=[np.number])
Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | PhD | Terminal | S.F.Ratio | perc.alumni | Expend | Grad.Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 | 777.00 |
mean | 3,001.64 | 2,018.80 | 779.97 | 27.56 | 55.80 | 3,699.91 | 855.30 | 10,440.67 | 4,357.53 | 549.38 | 1,340.64 | 72.66 | 79.70 | 14.09 | 22.74 | 9,660.17 | 65.46 |
std | 3,870.20 | 2,451.11 | 929.18 | 17.64 | 19.80 | 4,850.42 | 1,522.43 | 4,023.02 | 1,096.70 | 165.11 | 677.07 | 16.33 | 14.72 | 3.96 | 12.39 | 5,221.77 | 17.18 |
min | 81.00 | 72.00 | 35.00 | 1.00 | 9.00 | 139.00 | 1.00 | 2,340.00 | 1,780.00 | 96.00 | 250.00 | 8.00 | 24.00 | 2.50 | 0.00 | 3,186.00 | 10.00 |
25% | 776.00 | 604.00 | 242.00 | 15.00 | 41.00 | 992.00 | 95.00 | 7,320.00 | 3,597.00 | 470.00 | 850.00 | 62.00 | 71.00 | 11.50 | 13.00 | 6,751.00 | 53.00 |
50% | 1,558.00 | 1,110.00 | 434.00 | 23.00 | 54.00 | 1,707.00 | 353.00 | 9,990.00 | 4,200.00 | 500.00 | 1,200.00 | 75.00 | 82.00 | 13.60 | 21.00 | 8,377.00 | 65.00 |
75% | 3,624.00 | 2,424.00 | 902.00 | 35.00 | 69.00 | 4,005.00 | 967.00 | 12,925.00 | 5,050.00 | 600.00 | 1,700.00 | 85.00 | 92.00 | 16.50 | 31.00 | 10,830.00 | 78.00 |
max | 48,094.00 | 26,330.00 | 6,392.00 | 96.00 | 100.00 | 31,643.00 | 21,836.00 | 21,700.00 | 8,124.00 | 2,340.00 | 6,800.00 | 103.00 | 100.00 | 39.80 | 64.00 | 56,233.00 | 118.00 |
college.describe(include=['object'])
# or college.describe(include=['O'])
Private | |
---|---|
count | 777 |
unique | 2 |
top | Yes |
freq | 565 |
- [2] http://stackoverflow.com/questions/24524104/pandas-describe-is-not-returning-summary-of-all-columns
- [3] http://stackoverflow.com/questions/24524104/pandas-describe-is-not-returning-summary-of-all-columns
- [4] http://dataanalysispython.readthedocs.io/en/latest/pandas.html#summarizing-data-describe
ii. Pair plot
Unlike R, seaborn does not pairplot categorical vs numerical. See more here.
g = sns.PairGrid(college, vars=college.iloc[:,1:11], hue='Private')
g.map_upper(plt.scatter, s=3)
g.map_diag(plt.hist)
g.map_lower(plt.scatter, s=3)
g.fig.set_size_inches(12, 12)
iii. Box plots
sns.boxplot(x='Private', y='Outstate', data=college);
iv. Elite variable
college.loc[college['Top10perc']>50, 'Elite'] = 'Yes'
college['Elite'] = college['Elite'].fillna('No')
sns.boxplot(x='Elite', y='Outstate', data=college);
v. Histograms
In Python, to produce some histograms with differing numbers of bins for quantitative variables, we first need to convert these variables to bins. When we create bins, we transform a continuous range of values into a discrete one. For the purposes of this exercise, we will only consider equal-width bins.
# Bins creation
college['PhD'] = pd.cut(college['PhD'], 3, labels=['Low', 'Medium', 'High'])
college['Grad.Rate'] = pd.cut(college['Grad.Rate'], 5, labels=['Very low', 'Low', 'Medium', 'High', 'Very high'])
college['Books'] = pd.cut(college['Books'], 2, labels=['Low', 'High'])
college['Enroll'] = pd.cut(college['Enroll'], 4, labels=['Very low', 'Low', 'High', 'Very high'])
# Plot histograms
fig = plt.figure()
plt.subplot(221)
college['PhD'].value_counts().plot(kind='bar', title = 'Private');
plt.subplot(222)
college['Grad.Rate'].value_counts().plot(kind='bar', title = 'Grad.Rate');
plt.subplot(223)
college['Books'].value_counts().plot(kind='bar', title = 'Books');
plt.subplot(224)
college['Enroll'].value_counts().plot(kind='bar', title = 'Enroll');
fig.subplots_adjust(hspace=1) # To add space between subplots
vi. Continue exploring the data
"This exercise is trivial and is left to the reader." :)