An Example of Poor Domain Knowledge and Analytics
While at a recent conference I picked up a professional looking survey summary. It highlighted the importance of having domain knowledge when compiling research. Especially later for the Data Cleaning step. This one appeared high quality, having heavy glossy paper and close to 40 pages. Unfortunately a few errors made the analysis suspect.
Domain Knowledge is often mentioned in relation to knowing the right questions to ask. But it is very relevant to cleaning, interpreting and grouping the data. This survey had 50,000 data points, so it was no small effort.
First Problem
The first most notable error was a page that listed what languages are most in demand. It lists Python, Java and JavaScript, in order, as the most in demand. Which languages are most “used” or “dominant” is already up for heavy debate since there are so many methodologies for trying to track them. But the bottom part of this page really showed a lack of domain knowledge when compiling the research. The next sentence was, to effect, “These are followed by C#, Go, Golang, Scala . . . ” For those of you not aware, Go is often referred to as Golang. Would combining those two data points have pushed Go above C#? Probably not, but without the raw numbers, we don’t know. Some might point out that Java and Scala both run on a JVM and can be freely mixed for a seamless integration. They should be tracked separately, even though some might argue differently.
Second Problem
The next one might be more of semantics but it caught my eye. For education and technical skills it lists #4 as Scripting and #5 as Python. Personally I see Python as a form of scripting. Yes, “Scripting” is rather broad – but I thought it was odd to break Python out. Would combining those two metrics have changed the ranking of the prior listed skills? Or does the reader get more value by having the types of Scripting broken out? Recruiters sure do seem to latch onto specifics like Python or PHP more than “Scripting” abilities.
I suggest that domain knowledge is more important than just for knowing the right questions to ask. It should be also used when the data is cleansed and aggregated as well as when the final charts are reviewed prior to publishing. This expensive piece of marketing supports my point.