Skip to content
This repository was archived by the owner on May 8, 2019. It is now read-only.

Commit 5de3bea

Browse files
committed
数据集网址集合
1 parent 0f450dc commit 5de3bea

File tree

2 files changed

+145
-0
lines changed

2 files changed

+145
-0
lines changed

markdown/数据集网址集合.md

+145
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# 数据集网址集合
2+
3+
---
4+
5+
http://archive.ics.uci.edu/ml/index.php
6+
http://aws.amazon.com/publicdatasets/
7+
http://www.kaggle.com/competitions
8+
http://www.kdnuggets.com/datasets/index.html
9+
https://mp.weixin.qq.com/s?__biz=MzI4ODU5NjQ3OQ==&mid=2247483972&idx=1&sn=c7f7bbb3312934468912705d74d7c07f&chksm=ec3d4ad4db4ac3c25b6dbf7ce002195d3086e075b4b8087a252735a71c370e78fce074e832ef&mpshare=1&scene=23&srcid=1212moFaKMWXuk3LOpfsACna#rd
10+
http://mp.weixin.qq.com/s/4eDan-7KNnwgVP0DT96gzQ
11+
http://mp.weixin.qq.com/s/tKc72xnqu4R4wkrVbK_bXA (偏国内,包含工具)
12+
https://www.quandl.com/
13+
14+
http://archive.ics.uci.edu/ml/datasets.html
15+
16+
## 20G的金融行业数据集
17+
http://mp.weixin.qq.com/s/_NS0UUDr84yq0rLg7jfr5g
18+
19+
## 图片数据
20+
http://labelme.csail.mit.edu/Release3.0/index.php?message=1
21+
http://www.image-net.org/index
22+
23+
## 吴恩达医学数据
24+
http://mp.weixin.qq.com/s/M3s3z3YnEBvUxpDVGFVKHw
25+
26+
## 影像数据
27+
http://www.91weitu.com/
28+
29+
## 气象
30+
http://172.16.14.141:9100/
31+
32+
## 爬虫工具
33+
https://www.oschina.net/p/beanbun
34+
https://mp.weixin.qq.com/s/5rtoVnhYcVZpuRszr88diQ
35+
https://gitee.com/xiyouMc/pornhubbot
36+
https://gitee.com/l-weiwei/spiderman
37+
https://gitee.com/flashsword20/webmagic
38+
39+
## 古诗
40+
https://github.com/chinese-poetry/chinese-poetry
41+
42+
## Datasets
43+
Neural Networks used for supervised learning are notoriously data hungry. That’s why open datasets are an incredibly important contribution to the research community. The following are a few datasets that stood out this year:
44+
45+
- Youtube Bounding Boxes
46+
- Google QuickDraw Data
47+
- DeepMind Open Source Datasets
48+
- Google Speech Commands Dataset
49+
- Atomic Visual Actions
50+
- Several updates to the Open Images data set
51+
- Nsynth dataset of annotated musical notes
52+
- Quora Question Pairs
53+
54+
55+
## Public Data Sets on Amazon Web Services (AWS)
56+
http://aws.amazon.com/datasets
57+
Amazon从2008年开始就为开发者提供几十TB的开发数据。
58+
59+
## Yahoo! Webscope
60+
http://webscope.sandbox.yahoo.com/index.php
61+
62+
## Konect is a collection of network datasets
63+
http://konect.uni-koblenz.de/
64+
65+
## Stanford Large Network Dataset Collection
66+
http://snap.stanford.edu/data/index.html
67+
68+
## 安全相关的数据集
69+
http://www.secrepo.com/
70+
71+
72+
## 几个跟互联网有关的数据集:
73+
1、Dataset for "Statistics and Social Network of YouTube Videos"
74+
http://netsg.cs.sfu.ca/youtubedata/
75+
76+
2、1998 World Cup Web Site Access Logs
77+
http://ita.ee.lbl.gov/html/contrib/WorldCup.html
78+
这个是1998年世界杯期间的数据集。从1998/04/26 到 1998/07/26 的92天中,发生了 1,352,804,107次请求。
79+
80+
3、Page view statistics for Wikimedia projects
81+
http://dammit.lt/wikistats/
82+
83+
4、AOL Search Query Logs - RP
84+
http://www.researchpipeline.com/mediawiki/index.php?title=AOL_Search_Query_Logs
85+
86+
5、livedoor gourmet
87+
http://blog.livedoor.jp/techblog/archives/65836960.html
88+
89+
## 海量图像数据集:
90+
1、ImageNet
91+
http://www.image-net.org/
92+
包含1400万的图像。
93+
94+
2、Tiny Images Dataset
95+
http://horatio.cs.nyu.edu/mit/tiny/data/index.html
96+
包含8000万的32x32图像。
97+
98+
3、 MirFlickr1M
99+
http://press.liacs.nl/mirflickr/
100+
Flickr中的100万的图像集。
101+
102+
4、 CoPhIR
103+
http://cophir.isti.cnr.it/whatis.html
104+
Flickr中的1亿600万的图像
105+
106+
5、SBU captioned photo dataset
107+
http://dsl1.cewit.stonybrook.edu/~vicente/sbucaptions/
108+
Flickr中的100万的图像集。
109+
110+
6、Large-Scale Image Annotation using Visual Synset(ICCV 2011)
111+
http://cpl.cc.gatech.edu/projects/VisualSynset/
112+
包含2亿图像
113+
114+
7、NUS-WIDE
115+
http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm
116+
Flickr中的27万的图像集。
117+
118+
8、SUN dataset
119+
http://people.csail.mit.edu/jxiao/SUN/
120+
包含13万的图像
121+
122+
9、MSRA-MM
123+
http://research.microsoft.com/en-us/projects/msrammdata/
124+
包含100万的图像,23000视频
125+
126+
10、TRECVID
127+
http://trecvid.nist.gov/
128+
129+
Stack Overflow Dump Files
130+
7.3G stackoverflow.com-Posts.7z
131+
573.1K stackoverflow.com-Tags.7z
132+
153.0M stackoverflow.com-Users.7z
133+
2.2G stackoverflow.com-Comments.7z
134+
135+
截止目前好像还没有国内的企业或者组织开放自己的数据集。希望也能有企业开发自己的数据集给研究人员使用,从而推动海量数据处理在国内的发展!
136+
137+
## 2014/07/07 雅虎发布超大Flickr数据集 1亿的图片+视频
138+
http://yahoolabs.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images-for
139+
140+
## 100多个有趣的数据集
141+
http://www.csdn.net/article/2014-06-06/2820111-100-Interesting-Data-Sets-for-Statistics
142+
143+
144+
145+

数据集网址集合.pdf

52.6 KB
Binary file not shown.

0 commit comments

Comments
 (0)