forked from sparklemotion/mechanize
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NOTES.txt
318 lines (243 loc) · 12.7 KB
/
NOTES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
= Mechanize Release Notes
== 0.6.4 (Gwendolyn)
Custom request headers can now be added to Mechanize by subclassing mechanize
and defining the Mechanize#set_headers method. For example:
class A < WWW::Mechanize
def set_headers(u, r, c)
super(uri, request, cur_page)
request.add_field('Cookie', 'name=Aaron')
request
end
end
The Mechanize#redirect_ok method has been added to that you can keep mechanize
from following redirects.
== 0.6.3 (Big Man)
Mechanize 0.6.3 (Big Man) has a few bug fixes and some new features added to
the Form class. The Form class is now more hash like. I've added an
Form#add_field! method that will add a field to your form. Form#[]= will now
add a field if the key doesn't exist. For example, your form doesn't have
an input field named 'foo', the following 2 lines of code are equivalent, and
will create a field named 'foo':
form['foo'] = 'bar'
or
form.add_field!('foo', 'bar')
To make forms more hashlike, has_value?, and has_key? methods.
== 0.6.2 (Bridget)
Mechanize 0.6.2 (Bridget) is a fairly small bug fix release. You can now
access the parsed page when a ResponseCodeError is thrown. For example, this
loads a page that doesn't exist, but gives you access to the parsed 404 page:
begin
WWW::Mechanize.new().get('http://google.com/asdfasdfadsf.html')
rescue WWW::Mechanize::ResponseCodeError => ex
puts ex.page
end
Accessing forms is now more DSL like. When manipulating a form, for example,
you can use the following syntax:
page.form('formname') { |form|
form.first_name = "Aaron"
}.submit
Documentation has also been updated thanks to Paul Smith.
== 0.6.1 (Chuck)
Mechanize version 0.6.1 (Chuck) is done, and is ready for you to use. This
post "my trip to europe" release includes many bug fixes and a handful of
new features.
New features include, a submit method on forms, a click method on links, and an
REXML pluggable parser. Now you can submit a form just by calling a method on
the form, rather than passing the form to the submit method on the mech object.
The click method on links lets you click the link by calling a method on the
link rather than passing the link to the click method on the mech object.
Lastly, the REXML pluggable parser lets you use your pre-0.6.0 code with
0.6.1. See the CHANGELOG for more details.
== 0.6.0 (Rufus)
WWW::Mechanize 0.6.0 aka Rufus is ready! This hpricot flavored pie has
finished cooling on the window sill and is ready for you to eat. But if you
don't want to eat it, you can just download it and use it. I would
understand that.
The best new feature in this release in my opinion is the hpricot flavoring
packed inside. Mechanize now uses hpricot as its html parser. This means
mechanize gets a huge speed boost, and you can use the power of hpricot for
scraping data. Page objects returned from mechanize will allow you to use
hpricot search methods:
agent.get('http://rubyforge.org').search("//strong")
or
agent.get('http://rubyforge.org')/"strong"
The click method on mechanize has been updated so that you can click on links
you find using hpricot methods:
agent.click (page/"a").first
Or click on frames:
agent.click (page/"frame").first
The cookie parser has been overhauled to be more RFC 2109 compliant and to
use WEBrick cookies. Dependencies on ruby-web and mime-types have been
removed in favor of using hpricot and WEBrick respectively.
attr_finder and REXML helper methods have been removed.
== 0.5.4 (Sylvester)
WWW::Mechanize 0.5.4 aka Sylvester is fresh out the the frying pan and in to
the fire! It is also ready for you to download and use.
New features include WWW::Mechanize#transact (thanks to Johan Kiviniemi) which
lets you maintain your history state between transactions. Forms can now be
accessed as a hash. For example, to set the value of an input field, you can
do the following:
form['name'] = "Aaron"
Doing this assumes that you are setting the first field. If there are multiple
fields with the same name, you must use a different method to set the value.
Form file uploads will now read the file specified by FileUpload#file_name.
The mime type will also be automatically determined for you! Take a look
at the EXAMPLES file for a new flickr upload script.
Lastly, gzip encoding is now supported! WWW::Mechanize now supports pages
being sent gzip encoded. This means less network bandwidth. Yay!
== 0.5.3 (Twan)
Here it is. Mechanize 0.5.3 also named the "Twan" release. There are a few
new features, a few fixed bugs, and some other stuff too!
First, new features. WWW::Mechanize#click has been updated to operate on the
first link if an array is passed in. How is this helpful? It allows
syntax like this:
agent.click page.links.first
to be like this:
agent.click page.links
This trick was actually implemented in WWW::Mechanize::List. If you send a
method to WWW::Mechanize::List, and it doesn't know how to respond, it will
try calling that method on the first element of the list. But it only does
that for methods with no arguments.
Radio buttons, check boxes, and select lists can now be ticked, unticked, and
clicked. Now to select the second radio button from a list, you can do this:
form.radiobuttons.name('color')[1].click
Mechanize will handle unchecking all of the other radio buttons with the same
name.
Pretty printing has been added so that inspecting mechanize objects is very
pretty. Go ahead and try it out!
pp page
Or even
pp page.forms.first
Now, bugfixes. A bug was fixed when spaces are passed in as part of the URL
to WWW::Mechanize#get. Thanks to Eric Kolve, a bug was fixed with methods
that conflict with rails. Thanks to Yinon Bentor for sending in a patch to
improve Log4r support and a slight speed increase.
== 0.5.2
This release comes with a few cool new features. First, support for select
lists which are "multi" has been added. This means that you can select
multiple values for a select list that has the "multiple" attribute. See
WWW::Mechanize::MultiSelectList for more information.
New methods for select lists have been added. You can use the select_all
method to select all options and select_none to select none to select no
options. Options can now be "selected" which selects an option, "unselected",
which unselects an option, and "clicked" which toggles the current status of
the option. What this means is that instead of having to select the first
option like this:
select_list.value = select_list.options.first.value
You can select the first option by just saying this:
select_list.options.first.select
Of course you can still set the select list to an arbitrary value by just
setting the value of the select list.
A new method has been added to Form so that multiple fields can be set at the
same time. To set 'foo', and 'name' at the same time on the form, you can do
the following:
form.set_fields( :foo => 'bar', :name => 'Aaron' )
Or to set the second fields named 'name' you can do the following:
form.set_fields( :name => ['Aaron', 1] )
Finally, attr_finder has been deprecated, and all syntax like this:
@agent.links(:text => 'foo')
needs to be changed to:
@agent.links.text('foo')
With this release you will just get a warning, and the code will be removed in
0.6.0.
== 0.5.1
This release is a small bugfix release. The main bug fixed in this release is
a problem with file uploading. I have also made some performance improvements
to cookie parsing.
== 0.5.0
Good News first:
This release has many new great features! Mechanize has been updated to
handle any content type a web server returns using a system called "Pluggable
Parsers". Mechanize has always been able to handle any content type
(sort of), but the pluggable parser system lets us cleanly handle any
content type by instantiating a class for the content type returned from the
server. For example, a web server returns type 'text/html', mechanize asks
the pluggable parser for a class to instantiate for 'text/html'. Mechanize
then instantiates that class and returns it. Users can define their own
parsers, and register them with the Pluggable Parser so that mechanize will
instantiate your class when the content type you specify is returned. This
allows you to easily preprocess your HTML, or even use other HTML parsers.
Content types that the pluggable parser doesn't know how to handle will
return WWW::Mechanize::File which has basic functionality like a 'save_as'
method. For more information, see the RDoc for
WWW::Mechanize::PluggableParser also see the EXAMPLES file.
A 'save_as' method has been added so that any page downloaded can be easily
saved to a file.
The cookie jar for mechanize can now be saved to disk and loaded back up at
another time. If your script needs to save cookie state between executions,
you can now use the 'save_as' and 'load' methods on WWW::Mechanize::CookieJar.
Form fields can now be treated as accessors. This means that if you have a
form with the fields 'username' and 'password', you could manipulate them like
this:
form.username = 'test'
form.password = 'testing'
puts "username: #{form.username}"
puts "password: #{form.password}"
Form fields can still be accessed in the usual way in case there are multiple
input fields with the same name.
Bad news second:
In this release, the name space has been altered to be more consistent. Many
classes used to be under WWW directly, they are now all under WWW::Mechanize.
For example, in 0.4.7 Page was WWW::Page, in this release it is now
WWW::Mechanize::Page. This may break your code, but if you aren't using
class names directly, everything should be fine.
Body filters have been removed in favor of Pluggable Parsers.
== 0.4.7
This release of mechanize comes with a few bug fixes including fixing a
bug when there is no action specified in a form.
In this version, a default user agent string is now set for mechanize. Also
a convenience method WWW::Mechanize#get_file has been added for fetching
non text/html files.
== 0.4.6
The 0.4.6 release comes with proxy support which can be enabled by calling
the set_proxy method on your WWW::Mechanize object. Once you have set your
proxy settings, all mechanize requests will go through the proxy.
A new "visited?" method has been added to WWW::Mechanize so that you can see
if any particular URL is in your history.
Image alt text support has been added to links. If a link contains an image
with no text, the alt text of the image will be used. For example:
<a href="foo.html><img src="foo.gif" alt="Foo Image"></a>
This link will contain the text "Foo Image", and can be found like this:
link = page.links.text('Foo Image')
Lists of things have been updated so that you can set a value without
specifying the position in the array. It will just assume that you want to
set the value on the first element. For example, the following two statements
are equivalent:
form.fields.name('q').first.value = 'xyz' # Old syntax
form.fields.name('q').value = 'xyz' # New syntax
This new syntax comes with a note of caution; make sure you know you want to
set only the first value. There could be multiple fields with the name 'q'.
== 0.4.5
This release comes with a new filtering system. You can now manipulate the
response body before mechanize parses it. This can be useful if you know that
the HTML you need to parse is broken, or if you want to speed up the parsing.
This filter can be done on a global basis, or on a per page basis. Check out
the new examples in the EXAMPLES file for usage.
This release is also starting to phase out the misspelled method
WWW::Mechanize#basic_authetication. If you are using that method, please
switch to WWW::Mechanize#basic_auth.
The 0.4.5 release has many bug fixes, most noteably better cookie parsing and
better form support.
== 0.4.4
This release of mechanize comes with a new "Option" object that can be
accessed from select fields on forms. That means that you can figure out
what option to set based on the text in the select field. For example:
selectlist = form.fields.name('selectlist').first
selectlist.value = selectlist.options.find { |o| o.text == 'foo'}.value
== 0.4.3
The new syntax for finding things like forms, fields, frames, etcetera looks
like this:
page.links.with.text 'Some Text'
The preceding statement will find all links in a page with the text
'Some Text'. This can be applied to form fields as well:
form.fields.with.name 'email'
These can be chained as well like this:
form.fields.with.name('email').and.with.value('[email protected]')
'with' and 'and' can be omitted, and the old way is still supported. The
following statements all do the same thing:
form.fields.find_all { |f| f.name == 'email' }
form.fields.with.name('email')
form.fields.name('email')
form.fields(:name => 'email')
Regular expressions are also supported:
form.fields.with.name(/email/)