Data is the world's most valuable resource.
Now you can trade datasets on our
Upload any dataset onto our marketplace. Almost every spreadsheet, table, or list is valuable. Take a few minutes to describe your dataset, price it, and post it!
Your dataset is then listed on the Data & Sons market. Buyers can easily find your dataset and buy it in one click.
Once your dataset sells, we transfer your money to you instantly. Best of all, you can sell your dataset multiple times to any interested buyer.
Mailing address of every public company in the U.S....Purchase
Comprehensive list of every whiskey produced in the world. Great data for populating whiskey related databases or conducting market research in the distilling industry....Purchase
Contact information for over 20,000 restaurants across the US. All restaurants from the NAICS code 72251: Restaurants and Other Eating Places. This includes all set down, fast casual, fast food, and ethnic restaurants. List includes name, address, phone number, website, contact email address, ...Purchase
Complete list of public (including charter) and private schools in the State of Texas. Private school list incudes daycare, after school programs, and preschools. Complete contact information including address, point of contact (principal, owner, manager, etc.), email, website, and phone. Data was d...Purchase
At Data & Sons believe all people are entitled to their privacy and respect individual rights to protect personal information. We pledge not to allow the sale of personal information on our data marketplace. We safeguard personal privacy by reviewing all data before it is listed for sale on Data & Sons.
December 2017 Newsletter Happy New Year! Business aside, Greg and I want to thank everyone that has supported us through 2017 and on into 2018. We especially want to thank our very patient wives Bailey and Chaz as the make us and Data & Sons much, much better in so many ways. We are big believers that Data & Sons will enable millions to join the knowledge economy by selling data on our marketplace. The amount and diversity of information in a society are two of the biggest predictors of that society’s new knowledge development. By providing a more equitable way to acquire and transfer data, we genuinely believe making all of this new data available will speed innovation and social progress. Thanks to everyone that is making this a reality. 2017 was a major year for Data & Sons and we are proud to announce we finished up strong. In December, our site traffic was up over 1000% from November and we continue to see strong user growth with more datasets being added to our marketplace. Our initial sales and site traffic have resulted in our first two investors coming on board. Both investors are seasoned tech entrepreneurs and leaders and we are very excited about the rapid growth we can continue to foster with their capital and expertise! In November, we anticipated completing a partnership agreement with a data brokerage to provide a greater variety of data. I am happy to announce we now have two such partnerships. The partnerships will mean Data & Sons buyers will now have access to over 20 Million business and customer contacts! We think this will be a win for all involved in the Data & Sons marketplace. Look for the new data to be added to the marketplace in early January. We also announced our data request and affiliate partner management platform. We are pleased to report the data request feature is in full development and the affiliate management platform in now in beta. The data request feature provides buyers the ability to request datasets at a specified price. This effectively creates demand in our marketplace reducing some of the uncertainty around not knowing what data will sell and for what price. Our affiliate partner management system enables people that refer data sellers to Data & Sons to receive a commission whenever their affiliated seller sells any data. We think this will drive a lot of new content to the site and allow us to rapidly scale. What’s next in 2018 January will see our first marketing campaign directed at driving buyers to our marketplace. We will focus on Lead Generation datasets since this has been the most active category on Data & Sons. Lead Generation datasets provide buyers the contact information of perspective customers they can use for direct (email, phone, mail) and social media (Facebook Audiences) marketing. We think the start of a new year os the perfect time to help businesses find new customers. We will also continue to bring on investors and will be presenting on January 25th at the Wave Tampa Bay. Our presentation will provide a more detailed understanding of Data & Sons and our revolutionary business model. Please contact me if you would like to attend. We anticipate continued investor funding will allow us to continue to grow Data & Sons. We will be adding several new team members in 2018. Our new team members will be focused on (1) growing specific data categories by finding data sellers and buyers underserved by how data is developed, acquired, and transferred for these types of data; (2) developing outstanding marketing content that educates buyers and sellers on how to make money on Data & Sons; and (3) more development engineers. Adding to our development team enables us to stay adaptive and continue to roll out great new features. We anticipate adding a bidding function to the marketplace. Buyers can make a bid on any dataset and the Seller can than take or counter the bid. This will make our market more price efficient. We will also be adding a tutorial section. Buyers and Sellers can learn how they can make money on Data & Sons’ revolutionary marketplace either by developing and/or acquiring data for sale. We hope 2017 was as an exciting year for you as it was for us. Here’s to all of us being empowered, successful and safe in 2018!
What is a Data Scientist? I’ve been trying to answer this question for well over a year. As an academic turned entrepreneur, I was intrigued by the title data scientist. We were building Data & Sons at the time, and identifying core customers was a key part of the design process. It seemed that people that both developed and utilized data would be natural sellers and buyers on our marketplace. Sounded like data scientists might just be this type of person. Answering that question took a surprisingly long time. After spending over a year reading, researching and discussing with people across Fortune 500 companies, startups, data science centric social media, and data science training programs, I think I have working solution. A prototype data scientist definition if you will. Given the number of posts on DataTau, Medium, and Reddit asking this same question, I think taking the time to put together a solid working definition is value added for lots of people in the data science field (profession, community, industry?) especially for people interested in joining the profession. So first, what’s data science? As an organizational scientist, I learned and applied the traditional scientific method: review/observation, theory/hypotheses development, collect data, test hypotheses using statistical analysis, and hope to find something publishable. The idea is that the data you collected (your sample) was generalizable to the overall population. So if you found results that supported your hypothesis in a sample of 600 people, you would argue this would be the case in the greater population when you published the study. Then along came big data. You would no longer need a sample because you could plausibly have the entire population. Instead of 600 people, you now had 4 Million if you were Facebook. No need to mess with theory and hypothesis development, you simply ran statistical analysis of the population and the results told you everything you needed to know. This is what led Wired Editor Chris Andersen to observe that Theory is Dead in 2008. It is in this context that Jim Gray coined the term data science. Data science was accumulating enough data that you could skip theory and hypotheses development and rely on the statistical relationships you found in the data. Data science is essentially a science hack. So does that make data scientists science hackers? I’m going to say no for one primary reason: you need more skills for data science than you do for traditional science. So sure, it’s a hack of the scientific method, but it takes more dedicated learning, experience, and effort to be able to hack that process. Not a very good hack if it requires more effort. It is possessing these skills that I think makes someone a data scientist. Therefore: Data scientist are professionals competent in statistical analysis, computer programming, and applied problem solving in their domain of interest. The Venn diagram below illustrates how possessing different combinations of these skills makes people good at different data centric jobs. Because there are so many people running around calling themselves data scientists today, I think the diagram also does a good job of illustrating who is not a data scientist. Let’s review each. Statistical Competence. I put this at the top of the Venn diagram because understanding statistics is at the core of data science (or really any other data centric role). The whole point is to skip theorizing to rely on statistical relationships. If you cannot find these relationships in your data, cannot play in data science. This also means you will need to be proficient in R, SPPS, SAS, or Stata, and likely some of the method/model specific software packages. Applied Problem Solving. I think there are lot’s of people out there that have statistical competence and/or computer programming skills with “data scientist” in their current job title. I would however argue that they are not data scientists. Why? Remember the first part of the scientific process is review/observation, which is studying and trying to develop a basic understanding of some subject or phenomenon before you start asking your own research questions. What do people already know about this subject? What don’t we know yet? While big data may take away the need for developing new theory and hypotheses, you still need to know what it is you are studying. If you don’t, you’re going to spend a lot of time and resources to get obvious answers to stupid questions. There’s no faster way to get marginalized in an organization than making more money than most people in the room and presenting them with a detailed research project that tells them exactly what they already knew five years ago. A data scientist has to know what questions to ask. This requires that you develop a thorough understanding of whatever you are examining with data science (e.g. business, public policy, educational outcomes, etc.). The practitioners (business people, policy wonks, educators, etc) know their subject area, but they often do not understand the tools data scientists bring to the table and thus have no idea what to tell you to do. In the 2017 Kaggle State of Data Science Survey, the fifth most cited barrier at work (30.2% of respondents) was “Lack of a Clear Question to Answer.” If you don’t know what questions to ask, you cannot have scientist in your job title. Inquiry, whether done through thoughtful theory development or studying massive amounts of data, is at the heart of ALL science. All inquiry starts with asking the right questions. Computer Programming. Large amounts of well organized, accurate, and authentic data is the world’s most valuable resource. This means you are unlikely to just come across it anytime soon so you’ll need to develop it yourself. You will also need to do this on a repeated basis (i.e. not a one time data collection). This maybe a few times a year, once a day, or continuously in real time. To collect and analyze data on a repeated basis, you’ll need to build a system that (1) acquires and updates data; (2) organizes that data from different sources into a coherent structure; (3) can pass that data into some sort of statistical analysis; (4) presents results in a clear manner (often as visualization); (5) all on an automated basis. It’s this last part (the automation) that separates people proficient in statistical analysis who can accomplish tasks 1-4 from data scientists. Most academic researchers (PhD types like me) are highly proficient in tasks 1-4, but are completely clueless when asked to repeat that process on a ongoing basis. Automating that process requires being able to tell a computer to do it, and that requires proficiency in Python, SQL, C++, and/or some other programming language. While strong in statistical analysis and applied porblem solving, I would not identify as a data scientist until I had imporved upon my current Python and SQL skills...unless of course you had a lot of money to throw at me. Reality is the job market for data scientists is very, very hot right now. I realize there are and will continue to be more and more people calling themselves data scientists that do not possess all three of the skills identified. I do think the three skills provide a good educational progression for becoming a data scientist. Starting with stats, moving to programming, and then gaining a solid understanding of the area you are going to apply your craft is a good educational progression. Likely, you will be marketable with a solid statistics background (Data Incubator and Insight Data Science both exist to train you up on the programming side while getting you hired), you will be highly desired as someone with both statistics and programming skills, and once you have several years experience in a particular industry, you will be extremely sought after and courted as a full fledged data scientist.
List of Lists Data science, big data analytics, and machine learning are all fueled by datasets. Access to datasets is the first part of kicking off any project and the quality and amount of data determines how successful these projects will be. So where do you get datasets? Almost any person in the data science profession will tell you they had to acquire, collect, scrape, or aggregate data from other sources. In order to make finding datasets easier, several websites have been created to aggregate datasets in publicly available lists. Datasets on these sites are largely available for free with sites that offer datasets for purchase just starting to pop up. As a co-founder of Data & Sons, a new dataset marketplace, I found myself spending a lot of time identifying as many of these as I could. I thought it would be useful to share this list of lists with others seeking to find the datasets they need. Limitations. First, I stuck with broad listings and did not include ones that focused on a particular type of data (e.g. government, locations, mailing addresses, etc.). Second, I did not include lists that only allow access to certain groups or people (e.g. academics) or entities (e.g. businesses only). In alphabetical order, here’s my list of dataset lists. Amazon Web Services Decent sized list of well organized datasets you can search by several categories. A nice collection of government, scientific, and business data. However, it seems most of the data was posted back in 2015 with a limited amount of more recent datasets. Analytics Bodhi provides data science/analytics training and resources. They’ve included a list of publicly available datasets in their resources with brief descriptions. They link to both datasets and other dataset lists…an early version of the project I am undertaking. Awesome Public Datasets Authored by Xiaming Chen and posted on Github (caesar0301), Awesome Public Datasets is perhaps one of my favorites. Well organized into categories, the list identifies many publicly available scientific datasets. Simple, clean, elegant, a wonderful list for anyone in the sciences. BigML The very smart folks at BigML have put together a list of data sources they have across in their work. Some really interesting finds on here well organized by types of datasets/data providers. Data Circle allows people to buy, sell, and share datasets easily. The site is in beta with a limited number of datasets. I like the clean, happy interface, and am a big believer in their business model ;-) Datahub Basic listing page with around 200 datasets and a description of each. I found there were more unique datasets (i.e. data not available on other lists) on here than in other dataset locations. Data is Plural is a publicly available Google docs spreadsheet created by Jeremy Singer-Vine and curated by a team of diligent humans. I believe this is one the most extensive, diligent, and well maintained dataset lists available. As new sources of data are identified, they get updated on the spreadsheet with entries from 2015 to the present. Data & Sons enables people to buy, sell, and share datasets. Data seems to be available at two extremes: free and very expensive. With over 90% of data inaccessible, we thought creating a market would create a more equitable exchange of data leading to increased data accessibility. Datausa.io does perhaps the best job of data summation and presentation I've seen. There is a lot of information about the USA on here and they do a superb job of providing data driven averages a wide range of things in the US. They also provide many US related datasets for download and a slick shopping cart interface. Nod to Daniel Shorstein for bringing Datausa.io to my attention. Data.world provides individuals and businesses the ability to collaborate on data science projects with a suite of analytics tools, storage, and professional networking. They also provide a free datasets and other data types. The only issue I have with data.world is that it requires a user account to view and access their datasets. Google Public Data Datasets from governments around the world. Neat, clean, no fuss, but limited to around 130 sources. Kaggle is without a doubt the center of the data science universe. Acquired by Google in March, 2017, Kaggle provides data scientists a place to connect, learn, and earn some extra money through their competitions. Kaggle also provides perhaps the most extensive lists of free datasets I have come across. Cleanly organized and always interesting to browse, Kaggle has over 5,000 datasets. Based on user profiles, it seems Kaggle has several employees engaged in creating new datasets. I think those Kaggle generated datasets are some of the most interesting and well put together sets you can find for free. OpenDataSoft might just be the largest and most diverse offering of datasets I have seen with over 9,000 datasets from all over the world. Datasets range from government, busness, and social we see on other lists, but also include some really fun ones like workout data from Apple watches. I was part of a conversation recently on whether English was the language of datasets and you needed to be an English speaker to first get into data science. I was happy to see that there are many datasets in other languages on Open Data Soft and that over 4K of the datasets were in French. Thanks to Nicolas Terpolilli for directing my attention to Open Data Soft. Socrata has a basic interface searchable by category. Socrata also offers several types of data products including datasets, documents, forms, etc. There is a massive amount of data products on Socrata making it one of the largest sources available. Many of the datasets seem to be from government sources. ZeMiner is primarily offers datasets on web sites, traffic, and usage from Amazon, Google, and Facebook. Although they are fairly focused in their data offerings, there is a lot of datasets here that can provide a great deal of insight into a broad range of Internet activities. Please let me know of any dataset sources I may have missed and I will do my best to keep the list of lists updated!