Live Odds

The last day of the regular season. I’ll be updating these as the day goes on, and hopefully those updates will show up here.

Fantasy Playoff Probability 2018 wk13

This is the deciding week.
Note: The table is now sortable.

icon Team (record) E(Wins) E(Rank) Playoffs(%) Change(%)
icon Pack (11-1) 11.63 1.00 100.00 0.00
icon Jets (8-4) 8.95 2.62 99.97 3.09
icon Vikings (8-4) 8.54 2.81 99.50 4.74
icon IncdtlPun (8-4) 8.37 4.67 98.06 15.73
icon NotGonnaLie (7-5) 7.78 5.07 90.97 24.91
icon Economists (7-5) 7.46 5.51 71.69 18.62
icon Giants (7-5) 7.50 6.46 38.43 -34.54
icon Wonders (6-6) 6.50 7.85 1.38 -32.55
icon Wentz (3-9) 3.49 10.02 0.00 0.00
icon Broncos (4-8) 4.51 9.00 0.00 0.00
icon Fourth (2-10) 2.05 11.19 0.00 0.00
icon Blue (1-11) 1.22 11.79 0.00 0.00

There are now four of us fighting for three slots. IncidentalPunishment is almost surely in, and I’m not going to lie, it would be hard for NotGonnaLie to miss out. It’s really just down to me vs my nemesis: the Giants.

Middle Crew

 

And here is the updated expected wins table:

Div Div Rank Team Avg Pnts Opp_Pnts W E(W) E(W_1) Opp_Δ Time_Δ Tot_Δ
icon S 1 Pack 145.38 118.87 11 9.66 9.03 0.64 1.34 1.97
icon S 3 Jets 129.51 125.51 8 6.66 6.88 -0.22 1.34 1.12
icon S 4 IncdtlPun 120.62 119.63 8 6.17 5.38 0.79 1.83 2.62
icon S 9 Broncos 117.79 127.80 4 4.32 4.89 -0.57 -0.32 -0.89
icon S 10 Wentz 114.55 125.38 3 4.15 4.34 -0.18 -1.15 -1.34
icon S 11 Fourth 106.13 131.52 2 2.17 2.98 -0.81 -0.17 -0.98
icon L 2 Vikings 150.93 125.79 8 9.40 9.58 -0.18 -1.40 -1.58
icon L 5 NotGonnaLie 131.70 127.32 7 6.71 7.22 -0.51 0.29 -0.22
icon L 6 Economists 129.94 120.31 7 7.59 6.95 0.64 -0.59 0.05
icon L 7 Giants 116.83 117.30 7 5.92 4.73 1.19 1.08 2.27
icon L 8 Wonders 118.17 122.88 6 5.18 4.96 0.22 0.82 1.04
icon L 12 Blue 108.78 128.02 1 2.89 3.39 -0.49 -1.89 -2.39

Expected Wins

I am third in points scored, but 6th overall and in real danger of missing the playoffs. Digs SdLn TD UnBlvbl has had a historically productive season, but somehow has four losses and may not get a bye. There are two things that impact turning total points into wins: who you play and how your points are distributed. For example, Digs would have rather used their nearly 200 point week either last week against NotGonnaLie or the week before against the Pack. It was a bit overkill to use it against a team that barely broke 100.

Using pythagorean expectation, I calculate each teams expected wins (E(W) ), which captures how many wins we would think that would get based on the points they scored and the points their opponents scored. The difference between these the bonus (or loss) from the timing of when points are scored vs when the opponents’ points are scored (Opp_Δ).

I also do the same calculation using an the average points scored (instead of actual opponents) (E(W_1)). The difference between these two captures the bonus or penalty from the set of opponents faced by the team (Time_Δ). The combination of these two factors in a sense captures the team’s luck factor (Tot_Δ).

Observations after the table. Note: The table is sortable.

Div Div Rank Team Avg Pnts Opp_Pnts W E(W) E(W_1) Opp_Δ Time_Δ Tot_Δ
icon S 1 Pack Attack 142.91 122.32 10 8.29 8.10 0.19 1.71 1.90
icon S 3  Dryden Jets 129.26 126.54 7 5.92 6.33 -0.41 1.08 0.67
icon S 4  IncidentalPunishment 119.37 119.43 7 5.49 4.77 0.72 1.51 2.23
icon S 9  The Wrecking Broncs 117.43 127.21 4 3.96 4.45 -0.49 0.04 -0.45
icon S 10  North by North Wentz 114.59 124.76 3 3.87 3.99 -0.13 -0.87 -0.99
icon S 11  fourth and long 108.42 127.79 2 2.58 3.04 -0.47 -0.58 -1.04
icon L 2  Digs SdLn TD UnBlvbl 152.49 128.39 7 8.53 8.99 -0.46 -1.53 -1.99
icon L 5  Theological Giants 118.62 115.79 7 5.98 4.65 1.33 1.02 2.35
icon L 6  Hamilton Economists 130.53 121.46 6 6.89 6.52 0.37 -0.89 -0.52
icon L 7  Not Gonna Lie 129.08 125.09 6 6.12 6.31 -0.19 -0.12 -0.31
icon L 8  Wood Street Wonders 115.10 119.46 6 4.77 4.08 0.69 1.23 1.92
icon L 12  Go Blue 108.89 128.44 1 2.57 3.11 -0.55 -1.57 -2.11

  • As unlucky as Digs has been, apparently GoBlue is unluckier! Looking at their schedule it’s true, they had a bunch of close losses. Still, these numbers are historically high.
  • The Theological Giants must please God, because they have 2.35 more wins than would be expected if they distributed their points in an average manner against average opponents. A full point comes from their league weakest schedule, and another 1.3 is from timing.
  • IncidentalPunishment and the Wonders are up there as well. Three of the five teams fighting for playoff spots have major luck boost.
  • The Pack sits at an insane 10-1, which is just crazy if you remember Yahoo! had predicted them going 0-13!! Oops! But winning that much takes at lest some luck, and for them it seems to be timing based. They’ve had some nice, close wins (Broncos, Giants), and have generally upped their game when needed.

Fantasy Playoff Probability 2018 wk12

Only two weeks left! My playoff chances took a big hit as my team, once again, fell to the Giants (down to 53%). I don’t know what kind of voodoo that team has over me, but my worst two weeks were when I played them! So despite scoring 130 points more than them, I’m a game back and the Giants now have a 73% chance of making the playoffs. Of the middle crew, the Wonders are the most likely to miss out (34% of making) but they still definitely have a chance.

Note: The table is now sortable.

Team (record) E(Wins) E(Rank) Playoffs(%) Change(%) Bye (%)
Pack (10-1) 10.97 1.00 100.0 0.02 100.0
Jets (7-4) 8.32 3.32 96.88 -2.30 0.00
Vikings (7-4) 8.19 3.20 94.75 -3.75 71.42
IncdtlPun (7-4) 7.98 4.77 82.33 4.42 0.00
Giants (7-4) 7.68 5.16 72.97 26.54 16.87
NotGnaLie (6-5) 7.23 5.68 66.06 13.82 5.18
Economists (6-5) 6.97 6.07 53.07 -34.09 5.47
Wonders (6-5) 7.05 6.85 33.93 -4.51 1.07
Wentz (3-8) 3.74 10.00 0.00 -0.04 0.00
Broncos (4-7) 4.99 9.04 0.00 -0.09 0.00
Fourth (2-9) 3.00 11.18 0.00 0.00 0.00
Blue (1-10) 1.88 11.73 0.00 0.00 0.00

Thanks to that loss, I no longer can be considered top tier, which makes the middle crowded (5 of us fighting for 3 playoff slots). The order is becoming clearer, but overall there is not much action in the top or the bottom.

Top Tier 2018 w11

Middle Crew

Bottom Rung

 

 

Fantasy Playoff Probability 2018 wk11

Only having three weeks cuts down on the number of possible outcomes, however, I did enhance my predictions by adding in uncertainty for the tie-breakers. Previously, whichever team was forecasted to have higher score was ranked higher, which meant the probabilities skewed towards those with more points. This time, I tried to calculate the probability that, for example, the Giants close their 160 point gap in the next couple weeks.

I also realized I was calculating byes incorrectly. The byes go to the division winners, not the top two. This change decreased the Jets chances (they’d have to beat out the Pack to get a bye), and increased the chances for me and IncidentalPunishment.

icon Team (record) E(Wins) E(Rank) Playoffs(%) Bye(%)
icon Pack (9-1) 10.45 1.25 99.99 87.04
icon Vikings (7-3) 8.68 2.84 98.50 73.82
icon Jets (7-3) 8.84 2.87 99.18 12.95
icon Economists (6-4) 7.70 4.53 87.17 17.32
icon IncdtlPun (6-4) 7.73 5.21 77.91 8.85
icon NotGonnaLie (5-5) 6.74 6.29 52.24 0.00
icon Giants (6-4) 6.95 6.38 46.43 0.02
icon Wonders (5-5) 6.82 6.76 38.45 0.00
icon Broncos (3-7) 4.43 9.67 0.09 0.00
icon Fourth (2-8) 3.56 10.29 0.00 0.00
icon Wentz (3-7) 3.99 10.30 0.04 0.00
icon Blue (1-9) 2.11 11.62 0.00 0.00

Even though I won, my playoff chances only increased by 2%. Some of this was because I was expected to beat the Blues, but some was due to the change in methodology (because I’m ahead in points, the naive tie-breaking gives my a 92% chance of making the playoffs).

Top Tier 2018 w11

Middle Crew

Bottom Rung

 

Fantasy Playoff Probability 2018 wk10

(Updated w/scenarios Nov 9) Last year, I put together playoff probabilities for my fantasy football league (link).
This year, I’m starting the forecast even earlier! With 4 weeks left, there are almost 17 million possible outcomes just in terms of W/L (2^6=64 outcomes per week, for four weeks = 16,777,216) and I looked at them all (and the approximate likelihood they occur). The table below shows each team’s expected wins and rank, the probability that they make the playoffs, and the probability of earning that important first-round bye.

icon Team (record) E(Wins) E(Rank) Playoffs(%) Bye(%)
icon Pack (8-1) 9.89 1.62 100.00 89.45
icon Vikings (7-2) 9.25 1.91 99.47 80.03
icon Jets (6-3) 8.39 3.60 96.13 18.98
icon Economists (5-4) 7.50 4.63 85.53 6.04
icon Wonders (5-4) 7.49 5.52 70.00 1.45
icon Giants (6-3) 7.34 6.02 54.67 2.83
icon IncdtlPun (5-4) 7.31 6.33 48.16 1.20
icon NotGonnaLie (4-5) 6.60 6.49 45.69 0.03
icon Wentz (3-6) 4.11 10.21 0.00 0.00
icon Fourth (2-7) 3.78 9.88 0.00 0.00
icon Broncos (2-7) 3.78 10.35 0.01 0.00
icon Blue (1-8) 2.56 11.39 0.00 0.00

The Pack, the Vikings and the Jets are all basically in the playoffs. I have a very good shot (85%), but still have room to mess it up.

There are three clear tiers: The top four (just above mentioned teams) are almost all going to make the playoffs. The middle four (Wonders, Giants, IncidentalPunishment and NotGonnaLie) are battling for two slots (and potentially taking mine). The bottom four (Wentz, Fourth, Broncos and GoBlue) are essentially out of playoff contention already. Even though the Giants are 6-3, due to schedules and expected points my alogorithm is fairly bearish on their playoff chances. Their low tie breaker (points scored) is probably a major factor.

The three graphs below show the distribution of final rank by each team (grouped by tier). You can see the clear groupings, and the fact that the Pack and Vikings are almost surely going to go 1-2. Erik has a very small chance of coming in first (2%) and Erik and I have a chance of stealing the bye through grabbing 2nd (16% and 6% respectively). I have the ability to run scenarios and see how probabilities change given a certain outcome, so let me know if there is something you’d like to see. Some scenarios are shown after the distributions.

Top Tier 2017 11 16

Middle Crew

Bottom Rung

 

Scenario: IncidentalPunishment vs Giants

  • If IncidentalPunishment wins, their probability of making the playoffs rises to 68.49%
    (from 49.46%) but if they lose it falls to 21.02%.
  • If the Giants win, their probability of making the playoffs rises to 81.53% (from 56.47%) but if they lose it falls to 38.59%. They also would have a decent (relatively speaking) probability of getting a first round bye with the win: 6.37%, which decreases everyone else’s.
  • Interestingly, this game does not have that much significance outside of these two. Both NotGonnaLie and my team (Economists) has a slightly better chance (~1.25%) of making the playoffs if the Giants win. The Vikings, Jets and Economists chances of a first round bye increase with a Giants loss (1%, 0.2% 0.2% respectively).

Scenario: Economists (me) vs Fourth

  • With a win, I would improve my playoff chances to 91%. If Fourth won, their chances would moves to just above 0% (0.10%)
  • With a loss, my playoff chances drop 64%. My distribution (not shown) would resemble the middle crew.
  • My loss is the gain mainly of the middle crew. Each of them increases their playoff probability by around 4%, but IncidentalPunishment
    increases their probability by 6%. I’m not sure why they get an extra big boost.

 

P.S. Short version of methodology: used a version of Pythagorean expected wins to compute win probabilities (exponent of 6) – this seems to be similar to what Yahoo uses in their projections. This time I looked at the optimal lineup for future weeks to get predicted points. Then I computed the probabilities for each of the 16m+ possible win / loss outcomes (2^6=64 outcomes per week for four weeks). Also, had to make assumptions about points scored for the tiebreakers (gave the winner the greater of the two expectations).

Teaming Data –> RootNPI

The DocGraph / CareSet team does great work and I have personally benefited from the availability of their original CMS teaming data, even using it in a chapter from my dissertation.

They recently updated their methodology and created a new group of datasets they call “root NPI“. Along with this update, they will no longer be updating the original format teaming data. While I understand the need for this change, the fact that they have neither updated the original data, nor retroactively created the new RootNPI data (beyond 2014) is a problem for me as I use the time variation in these datasets and would like to be able to add years.

To get around this limitation I created a method, to a fairly close approximation, creates the new data sets from the old, and therefore allows me to perform analysis on data from 2009 to 2015. The idea is to take the 180 day files and make them symmetrical. My commented SAS code is here, but the main commands are:


/* Duplicate the teaming data, switching NPI_Number1 and NPI_Number2 */
DATA phy_ref_2014_180_x2;
SET ref_med.phy_ref_2014_180(rename=(NPI_Number2=NPI_A NPI_Number1=NPI_B))
   ref_med.phy_ref_2014_180(rename=(NPI_Number1=NPI_A NPI_Number2=NPI_B));
run;

/* Choose the NPI pair with the most patients */
proc sql;
CREATE TABLE npiroot_2014_180
as
SELECT NPI_A, NPI_B
   , MAX(Bene_Count) as patient_count
FROM phy_ref_2014_180_x2
GROUP BY NPI_A, NPI_B
; quit;

Complete data exist from both data sets for 2014, which allows me to compare the effectiveness of my transformation method. Here are some statistics from the comparison:

  • For the pairs that match, there is a very high correlation between the two (0.97938, see scatter below)
  • While 37.5% of the pairs do not match (50m of 185m), these pairs only account for ~10% of the total number of shared patient connections (~800m of ~7.5B)
  • It looks like most of the missing connections happen right near the 11 cutoff.
    • In fact, for 30% of missing pairs the present pair has a patient count of 11
    • It is 19% for 12, 13% for 13, 9% for 14
    • 90% have fewer than 22, 95% <33, and more than 99% fewer than 100
  • There seems to be a decently large, non-random set of providers that are in the new data, but not in the old. They all seem to be medical device related. Here are the top 10: Arriva Medical, Degc Enterprises, Lincare, All American Medical Supplies, United States Medical Supply, Med-Care Diabetic & Medical Supplies, Ocean Home Health Supply, Binson’s Hospital Supplies, Passaic Healthcare Services, DJO
  • There is not a similar pattern for providers in the old data, but not in the new. For consistency across years, I will probably exclude the above set of providers from my analyses.
  • Here is a comparison of the two datasets in terms of number of patients (the strength of the connection (RootNPI vs Constructed)
    • Median: 21 and 21
    • Mean: 41.6 vs 43.4
    • 95 percentile: 128 vs 134
    • 99 percentile: 347 vs 378
    • Standard Deviation: 104.5 vs 116.6
  • It may seem odd that the constructed data set has a larger average than the new, Root NPI data, since the new one is using the full year to define a connection, while the old data set used the 180 day window. I think what accounts for the discrepency is the fact that the old data set included connections that happen 6 months after Dec 2014, which the RootNPI omits.

Finally, to ensure that there is not any odd systematic variations between the two measure, I created a scatter plot comparing the patient count I calculated from the original teaming data with the new RootNPI patient count. I truncated the plot at 1000, both because there are just too many obs <1000 and because I am mainly interested in what happens to the relationship as both get large.

To me, this looks really reassuring. The two measures seem to be similar, with some noise, and this noise appears to get smaller as the number of patients gets larger.

Mendeley – Fixing Author and Journal Metadata

I use Mendeley to store, organize and manage my library of academic papers. It’s tagging and search features are excellent, and Mendeley helps keep the process of finding previously read literature manageable. At times I hear or see an author’s name and wonder what papers of theirs I have previously read. In Mendeley, you can select and view all of the papers by a single author quite easily. However, there is an issue with variations in names. When you filter by author, Mendeley has no way of knowing that “Mark Pauly”, “Mark V Pauly”, “MV Pauly” are all the same person. To fix this using the interface would be a painful, manual task involving selecting and editing each individual paper.

Fortunately, Mendeley stores the local library information on a relatively easy to access SQLite database, and I know SQL. What I did, and will show, was to find probable duplicate names and merge them together using SQL code.

Step 0: BACK UP YOUR DATABASE!

The main databases are stored in local AppData folder (“C:\Users\[Your_Windows_UserName]\AppData\Local\”) under “Mendeley Ltd\Mendeley Desktop”. What you are looking for is a file similar to “dl679@cornell.edu@www.mendeley.com.sqlite” . BACK THIS FILE UP BEFORE PROCEEDING.

Step 1: Get and install a program to read and write SQLite

There are a variety of tools out there, but I used, and can recommend, SQLiteStudio (https://sqlitestudio.pl/index.rvt).

Step 2: Connect to the database.

Open up SQLStudio and add the databse you located in step 0, then connect to the database (2). Once you have it open, you should be able to see the tables that Mendeley uses. The main one is “Documents”. This table has one entry for each of the articles in your database. The authors are stored in “DocumentContributors”.

Step 3: Find suspected duplicates

The following code generates a good list of authors that are probably duplicate entries.


SELECT DocumentContributors.LastName, DocumentContributors.firstNames, count(Distinct documents.id) as Num_Papers, max(Num_Papers_LN) as Num_Papers_LN
FROM DocumentContributors
INNER JOIN documents
on DocumentContributors.DocumentID = documents.id
INNER JOIN (
   SELECT LastName, count(Distinct documents.id) as Num_Papers_LN
   FROM DocumentContributors
   INNER JOIN documents
   on DocumentContributors.DocumentID = documents.id
   GROUP BY LastName
   ) as ln
on ln.lastName = DocumentContributors.lastName
GROUP BY DocumentContributors.firstNAmes, DocumentContributors.LastNAme
HAVING Num_Papers_LN > 8
ORDER BY Num_Papers_LN desc, DocumentContributors.LastNAme, Num_Papers desc;

I limited my list to authors (by last names) with more than 8 papers, as I did not want to spend too much time cleaning up smaller authors.

Step 4: Identify and update duplicates

Once I have my list I check to see if there are more than one author with the last name. The idea is to create a search string that uniquely identifies an author, than standardize the first name. For example, I searched for:


SELECT *
FROM DocumentContributors
WHERE lastName = 'Town'
and firstNames like "%";

In my database, all of the authors with a last name “Town” were in fact “Robert J Town”, so I could safely run the following update statement to standard the first name. I chose to omit periods, and use the first initial of the middle name, but your standardization procedure may differ:


UPDATE DocumentContributors
SET firstNAmes = "Robert J"
WHERE lastName = 'Town'
and firstNames like "%";

If there are multiple authors with the same last name, you can include a first initial before the % to filter out based on that. Always check to see what is going to be impacted by your query before running an update statement.

Step 5: Make Mendeley update the search index

Finally, to get Mendeley to update the search index (including the index of authors), with Mendeley closed, delete the files in “…\AppData\Local\Mendeley Ltd\Mendeley Desktop\www.mendeley.com\dl679@cornell.edu-3ab2” (your final subfolder will differ). BACKUP THESE FILES FIRST. Deleting (and having Mendeley rebuild) this file could very well speed up search if it has become slow.

One caveat, this approach will not update the information in your library on Mendeley.com, and will not sync to other computers. I think this can be accomplished by updating the “eventLog” and “eventAttributes” tables, but I didn’t have the time to write up a sufficiently automated process, but think something could be done fairly easily using Python.

Step 6: See which authors you read the most


SELECT DocumentContributors.LastName, DocumentContributors.firstNames, count(Distinct documents.id) as Num_Papers
FROM DocumentContributors
INNER JOIN documents
on DocumentContributors.DocumentID = documents.id
on ln.lastName = DocumentContributors.lastName
GROUP BY DocumentContributors.firstNAmes, DocumentContributors.LastNAme
HAVING Num_Papers > 5
ORDER BY Num_Papers desc;

A side benefit of this project is that I can see which authors are particularly important to my research by running the above code. Not surprisingly, my advisor’s advisor, Marty Gaynor, tops the list (along with the very prolific Larry Casalino). Amitabh Chandra, David Dranove, Bruce Landon and Robert Town are next up (can you tell I research health economics!?).

I also used a similar process to clean up the journal names, and then checked which journals I read most often. Health affairs topped the list followed by NBER Working Papers. There is a big gap between the next couple: Journal of Health Economics, Health Services Research, Journal of Economic Perspectives and American Economic Review.

Olympics 2018: Medals Recap

Some thoughts on the results of the 2018 Olympics:

Most pundits agree that this was a disappointing for Team USA. The haul of 23 medals is 5 fewer than 2014 and 14 down from 2010. The USOC had set a target of 37, with the expectation of at least 25, and the hope of up to 59. This decline is total medals is more severe when you consider that many of the medals were in US friendly events that previously did not exist (11 from snowy pursuits snowboarding and freestyle skiing). The USOC walked away with the same number of gold medals they have received since 2006 – 9, with 2002 only being one higher.


However, despite the shortfall in total medals, Team USA did have some notable victories: First, the women’s hockey team. As someone who attended two schools where hockey is the major sport (Colgate and Cornell… go Colgate!) I really enjoy hockey. I watched the utter heartbreak of the US women’s 2014 loss, made extra difficult by the fact that in the Olympics there is no “get ’em next year”. It’s get them in 4 years… This year the gold medal game lived up to the hype, including the final outcome. Second: the improbable victory in curling beating out both Canada and Sweden (which is another recent event predicted by the Simpsons more).

Here are some assorted notables:

  • Norway, Germany and Canada had great Olympics.
  • For Germany, it is return to form, after a bit of a slide from 2002-2014.
  • For Canada, it demonstrates that 2010 was not just a fluke and their rise to prominence is likely here to stay (unexpected losses in both Men and Women’s hockey, and Curling aside).
  • A historical note: It is crazy to think that in 1988 the sum total of golds for Canada, Norway and the USA was 2 (4% of the total). Lately it’s been around 33% (including this Olympics)
  • Russia/OAR dropped down to 2 golds. Some of it was undeniably the ban, but their haul of 13 in Sochi was a bit of an anomaly. They took home 3 in 2010 Vancouver
  • South Korea did not seem to receive much of a hosting bump. Their gold medal total was between their 2014 and 2010 count, and their total medal count has been steadily increasing since 2002.

Here is graph of the total medal count over time:

And here is the gold medal count over time:

1. The other prediction being the Trump presidency, which they predict will be followed by Lisa Simpson. Interestingly, Ted Cruz just argued that Lisa is a democrat.