Social Sciences Datasets


Media Member Contact Information
$299.99
Looking to reach members of the media directly? This dataset contains the names, positions, media outlets, phone number, email, and address for over 2,000 of America's top journalists. The lists includes reporters, journalists, editors, publishers, and producers in print, online, academia, and TV. Find contact information for media members at the New York Times, Wall Street Journal, Bloomberg, CNBC, CNN, Politico, and Huffington Post amongst many others. Great list for public realtions, marketing, advocacy, and job seeking. List was developed in November, 2017

Category: Social Sciences

Keywords: media,journalists,journalism,reporters,contact

Rows: 2001

Sales: 0

Questions: 0

Game of Thrones Explore deaths and battles from this fantasy world
$0.00
Overview Game of Thrones is a hit fantasy tv show based on the equally famous book series "A Song of Fire and Ice" by George RR Martin. The show is well known for its vastly complicated political landscape, large number of characters, and its frequent character deaths. Data Sources This dataset combines three sources of data, all of which are based on information from the book series. Firstly, there is battles.csv which contains Chris Albon's "The War of the Five Kings" Dataset, which can be found here:https://github.com/chrisalbon/war_of_the_five_kings_dataset . Its a great collection of all of the battles in the series. Secondly we have character-deaths.csv from Erin Pierce and Ben Kahle. This dataset was created as a part of their Bayesian Survival Analysis which can be found here: http://allendowney.blogspot.com/2015/03/bayesian-survival-analysis-for-game-of.html Finally we have a more comprehensive character dataset with character-predictions.csv. This comes from the team at A Song of Ice and Data who scraped it from http://awoiaf.westeros.org/ . It also includes their predictions on which character will die, the methodology of which can be found here: https://got.show/machine-learning-algorithm-predicts-death-game-of-thrones What insights about the complicated political landscape of this fantasy world can you find in this data? Of course, it goes without saying that this dataset contains spoilers ;) Contributed by Myles O'Neill from Kaggle

Category: Social Sciences

Keywords: war,literature,social

Rows: 38

Sales: 0

Questions: 0

US Protests in 2017
$0.00
One of the most ambitious, interesting, and just plain cool data collection projects I have seen. The Crowd Counting Consortium pubishes a monthly dataset of every know crowd, protest, or public gathering the US. the location, date, crowd estimate, organizer, and purpose of the protest is porivded. This ambitious project was undertaken by Jeremy Pressman and Erica Chenoweth. Monthly updates available at Crowd Counting Consortium.

Category: Social Sciences

Keywords: protest,crowds,society,politics,grassroots

Rows: 11270

Sales: 0

Questions: 0

Extinct Languages Number of endangered languages in the world, and their likelihood of extinction
$0.00
Context A recent Guardian blog post asks: "How many endangered languages are there in the World and what are the chances they will die out completely?" The United Nations Education, Scientific and Cultural Organisation (UNESCO) regularly publishes a list of endangered languages, using a classification system that describes its danger (or completion) of extinction. Content The full detailed dataset includes names of languages, number of speakers, the names of countries where the language is still spoken, and the degree of endangerment. The UNESCO endangerment classification is as follows: Vulnerable: most children speak the language, but it may be restricted to certain domains (e.g., home) Definitely endangered: children no longer learn the language as a 'mother tongue' in the home Severely endangered: language is spoken by grandparents and older generations; while the parent generation may understand it, they do not speak it to children or among themselves Critically endangered: the youngest speakers are grandparents and older, and they speak the language partially and infrequently Extinct: there are no speakers left Acknowledgements Data was originally organized and published by The Guardian, and can be accessed via this Datablog post. Inspiration How can you best visualize this data? Which rare languages are more isolated (Sicilian, for example) versus more spread out? Can you come up with a hypothesis for why that is the case? Can you compare the number of rare speakers with more relatable figures? For example, are there more Romani speakers in the world than there are residents in a small city in the United States?

Category: Social Sciences

Keywords: languages,linguistics,small

Rows: 2573

Sales: 0

Questions: 0

World Happiness Report Happiness scored according to economic production, social support, etc.
$0.00
Context The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness. Content The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others. Inspiration What countries or regions rank the highest in overall happiness and each of the six factors contributing to happiness? How did country ranks or scores change between the 2015 and 2016 as well as the 2016 and 2017 reports? Did any country experience a significant increase or decrease in happiness? What is Dystopia? Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive width. The lowest scores observed for the six key variables, therefore, characterize Dystopia. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia. What are the residuals? The residuals, or unexplained components, differ for each country, reflecting the extent to which the six variables either over- or under-explain average 2014-2016 life evaluations. These residuals have an average value of approximately zero over the whole set of countries. Figure 2.2 shows the average residual for each country when the equation in Table 2.1 is applied to average 2014- 2016 data for the six variables in that country. We combine these residuals with the estimate for life evaluations in Dystopia so that the combined bar will always have positive values. As can be seen in Figure 2.2, although some life evaluation residuals are quite large, occasionally exceeding one point on the scale from 0 to 10, they are always much smaller than the calculated value in Dystopia, where the average life is rated at 1.85 on the 0 to 10 scale. What do the columns succeeding the Happiness Score(like Family, Generosity, etc.) describe? The following columns: GDP per Capita, Family, Life Expectancy, Freedom, Generosity, Trust Government Corruption describe the extent to which these factors contribute in evaluating the happiness in each country. The Dystopia Residual metric actually is the Dystopia Happiness Score(1.85) + the Residual value or the unexplained value for each country as stated in the previous answer. If you add all these factors up, you get the happiness score so it might be un-reliable to model them to predict Happiness Scores.  

Category: Social Sciences

Keywords: economics,social,science,emotion

Rows: 23

Sales: 0

Questions: 0

Credit Card Fraud Detection
$0.00
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.   It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.   Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.   The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. Credit to Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015

Category: Social Sciences

Keywords: Fraud detection,Card Fraud,Fraud prevention,Credit card fraud

Rows: 284658

Sales: 0

Questions: 0

Celebrity Deaths All wikipedia-listed celebrity deaths from 2006
$0.00
Context I created this dataset to investigate the claim that 2016 had an unnaturally large number of celebrity deaths. Content Points listed by Name, Age, Cause of death and Reason for fame Acknowledgements Lifted from: https://en.wikipedia.org/wiki/Deaths_in_2016 for all years   CREDIT: HugoDarwood at kaggle

Category: Social Sciences

Keywords: celebrity,entertainment

Rows: 21309

Sales: 0

Questions: 0

Credit Card Fraud Detection Anonymized credit card transactions labeled as fraudulent or genuine
$0.00
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise. Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification. The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on http://mlg.ulb.ac.be/BruFence and http://mlg.ulb.ac.be/ARTML Please cite: Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015 Contributed by Andrea from Kaggle.

Category: Social Sciences

Keywords: finance,crime,medium

Rows: 284809

Sales: 0

Questions: 0

World Bank Youth Unemployment Rates - Youth Unemployment rates by country from 2010-2014
$0.00
This dataset contains youth unemployment rates (% of total labor force ages 15-24) (modeled ILO estimate) Latest data available from 2010 to 2014.

Category: Social Sciences

Keywords: finance,employment

Rows: 70

Sales: 0

Questions: 0

Human Resources Analytics
$0.00
This dataset is simulated and from Kaggle. Why are our best and most experienced employees leaving prematurely? Have fun with this database and try to predict which valuable employees will leave next. 

Category: Social Sciences

Keywords: HR,Analytics,Employment

Rows: 14850

Sales: 0

Questions: 0

Culture and networks
$299.00
Participants were 122 individuals from a fire department. The total number of employees at the fire department were 184 people. Participants were at all levels of the organization. The data includes both cultural measures from O'Reilly, Chapman, and Caldwell (1991) and a network analysis where participants were asked to identify up to five individuals they prefer to turn to for work related advice. Participants were given the full list of values from the O'Reilly et al (1991) paper and asked to identify the most import two values, then the most important four values, etc. Just as the sorting methodology in the paper went through. However, the survey was administered online. For the network analysis, the data includes both a betweenness measure and a centrality measure (ndegree). The data was collected 1/15/2014-3/15/2014.

Category: Social Sciences

Keywords: culture,network

Rows: 123

Sales: 0

Questions: 0

Top Tier Management Journal Publications
$0.00
Complete list of all articles published in top tier management journals: Administrative Science Quarterly, Academy of Management Review, Academy of Management Journal, Journal of Management, and Organization Science.

Category: Social Sciences

Keywords: Management,journals,scholarship,articles

Rows: 2221

Sales: 0

Questions: 0