You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Next step is to scrape the text data on those Service Area pages to enable us to emit it in an API endpoint, so that it can be rendered inline alongside the OCRB and KPM data. The data includes:
intro text just below the Service Area heading (e.g. under Parks, Recreation & Culture Service Area, the intro text is "The Parks, Recreation & Culture service area includes services for Portland Parks & Recreation, the only bureau in this service area. The bureau also administers the Golf program and Portland International Raceway."
Significant Issues and Major Projects (SIMP) - this section falls between OCRB table and KPM table, and includes one or more bullets of text
Acceptance Criteria
Assumption: the SIMP data must be emitted from the API in a format that ensures that bullets will render in the same order as in the Budget in Brief document.
The data probably varies from year to year, so API users must be able to request the appropriate text derived from a specified fiscal year (e.g. user requests all SIMP text from FY2016-17)
The data must be captured in a way that reading it directly from Django code, and importing into a database, won't noticeably change the readability of the text by the end user as compared to the experience of reading the Budget in Brief PDF. (e.g. if all bullets for a single Service Area are stored as a single record, then the bullet characters must be encoded in a way that they will automatically show up as bullets in the user's browser)
Any tool that works will do. The tool used to scrape tabular data was Tabula; unknown at the moment if this would work for text data, or if a simple cut-and-paste would work well enough.
Question for City Budget Office contacts: must the SIMP bullets be displayed every time in the same order as they are presented in the Budget in Brief PDF documents?
The text was updated successfully, but these errors were encountered:
MikeTheCanuck
changed the title
Capture text data from Budget In Brief - Services Areas
Endpoint for text data from Budget In Brief - Services Areas
Feb 21, 2017
We have scraped the tabular data for Service Areas from the past two years' Budget in Brief documents - you see that in the Data folder in this repo (https://github.com/hackoregon/team-budget/tree/master/Data).
Next step is to scrape the text data on those Service Area pages to enable us to emit it in an API endpoint, so that it can be rendered inline alongside the OCRB and KPM data. The data includes:
Acceptance Criteria
Any tool that works will do. The tool used to scrape tabular data was Tabula; unknown at the moment if this would work for text data, or if a simple cut-and-paste would work well enough.
Question for City Budget Office contacts: must the SIMP bullets be displayed every time in the same order as they are presented in the Budget in Brief PDF documents?
The text was updated successfully, but these errors were encountered: