Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter SP parameters #536

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Smarter SP parameters #536

wants to merge 1 commit into from

Conversation

breznak
Copy link
Member

@breznak breznak commented Jul 3, 2019

In this PR my aim is at improving constructor parameters of the SpatialPooler.

Goals

Make the SP (params) more:

  • fool proof
  • easier, more effective parameter optimization
  • encode more "ways how params affect each other"
  • provide high-level params useful for generic user (rather than neuroscientific details)
    • robustness, computation speed, adaptation speed, locality vs global view, ...

Implementation

  • secondary, "smart", constructor for SP
    • optionally provide also set/get methods
  • try to reduce existing SP's params if some are inferior in general case, maybe?:
    • wrapAround (always true),
    • numActiveColumnsPerInhArea (rather localAreaDensity),
    • connectedThreshold
  • make others "automated"
    • or atleast with stricter checks

In this PR I'd like to discuss feasibility, usefulness of the proposed changes, and then implement one by one with tests in separate PRs.

@breznak breznak added enhancement New feature or request question Further information is requested SP labels Jul 3, 2019
@breznak breznak self-assigned this Jul 3, 2019
Copy link
Member Author

@breznak breznak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help me discuss and brainstorm the proposed changes, relations between variables, candidates for removal, ...

@@ -99,7 +107,7 @@ class SpatialPooler : public Serializable
columns use 2000, or [2000]. For a three dimensional
topology of 32x64x16 use [32, 64, 16].

@param potentialRadius This parameter deteremines the extent of the
@param potentialRadius This parameter deteremines the extent of the //TODO change this to potentialRadiusPct 0.0..1.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with similar meaning but relative [0.0, 1.0] percentage of the input dimensions.

Current "receptive field radius is 16 [input bits]" is not transferable, while "recept field is 10% [of the input field]" will work well with any sizes.

Overall, everywhere move from absolute units to relative percentages.


@param numActiveColumnsPerInhArea An alternate way to control the sparsity of
@param numActiveColumnsPerInhArea An alternate way to control the sparsity of //TODO remove this method of operation?!
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to completely remove this param, and switch to using localAreaDensity only.

All optimized models (mnist, hotgym) use the localAreaDensity.

When using this method, as columns
learn and grow their effective receptive fields, the
inhibitionRadius will grow, and hence the net density of the
active columns will decrease. This is in contrast to the

I esp. dislike this part, density of SP should remain constant.
This would get us rid off of a mutex, making param optimization easier.

Are there any usecases where this mode of operation would be favorable?

active synapse is incremented in each round.

@param synPermConnected The default connected threshold. Any synapse
@param synPermConnected The default connected threshold. Any synapse //TODO remove, hard-coded in Connections, raise to 0.5 from 0.2?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely make hard-coded.

Propose changing to "0.5" (or middle of minPermanence, maxPerm). Is there a reason why everywhere this would be set unevenly closer to min? (0.2, 0.1 being common defaults. Performs well with 0.5 in MNIST)


@param boostStrength A number greater or equal than 0, used to
@param boostStrength A number greater or equal than 0, used to //TODO no biological background(?), remove
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • verify biological background for boosting, and remove altogether if none (boosting does help somewhat in MNIST, see if this can be mitigated with new param config?)

  • if not removed, make fixed (2.0), or automated on robustness (boost = 2.0 * <inverse ration of robustness>)

likely to oscillate. //TODO do not allow too small
//TODO make this to dutyCyclePeriodPct 0..1.0, which uses
//TODO new `samplesPerEpoch`, if known. For MNIST (image dataset) this would be #image samples,
//for stream with a weekly period this would be #samples per week.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • check for too small
  • make relative % of new "epochSize" (samplesPerEpoch, period)
  • epochSize is an estimate of periodicity of the data, eg:
    • weekly reccuring timeseries: = number of samples per week
    • mnist dataset = #samples on training set
    • unknown (timeseries -> 0 = infinity)

@@ -202,6 +214,7 @@ class SpatialPooler : public Serializable
@param wrapAround boolean value that determines whether or not inputs
at the beginning and end of an input dimension are considered
neighbors for the purpose of mapping inputs to columns.
//TODO does it hurt to set this to always true? We could rm NonWrappingNeighbourhood
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether or not inputs
at the beginning and end of an input dimension are considered
neighbors for the purpose of mapping inputs to columns

biologically, if we assume hierarchy, a Region we model with SP is a portion ("rectangle") on a 2D sheet. Its input field is another 2D sheet (or retina, ...) -> so inputs on one side are not close to the others. So we should leave this OFF?


@param stimulusThreshold This is a number specifying the minimum
@param stimulusThreshold This is a number specifying the minimum //TODO replace with `robustness` 0..1.0, which will affect this & synPermInc/Dec
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stimulusThreshold well represents "robustness" (to noise) of the segment.

  • bump default not to be too small (to 2,3,4,..?)
  • must not be too high, or no segment will be able to satisfy and no learning will occur -> auto check that num potential synapses on segment is x-times (2 times ?) bigger than the threshold
  • in "smart" replace with "robustness" [0.0..1.0]

@param potentialPct The percent of the inputs, within a column's
potential radius, that a column can be connected to. If set to
1, the column will be connected to every input within its
@param potentialPct The percent of the inputs, within a column's //TODO make this "automated" depending on #potentialRadius & numColumns.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • rename to columnInputOverlapPct

  • remove and make it a function of Fn(#columns, input area, potential radius,local area pct, *prefer-local-vs-global)

    • #columns + -> Fn -
    • area + -> Fn -
    • pot radius + -> Fn +
    • local area Pct + -> Fn +
    • prefer local + -> Fn +
  • the Fn represents "prefer local, details" (over global, holistic)

  • new smart param "prefer local" 0..1.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works well, see #533 for successful demonstration

inactive synapse is decremented in each learning step.

@param synPermActiveInc The amount by which the permanence of an
@param synPermActiveInc The amount by which the permanence of an //TODO ditto
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO how related ActiveInc, InactiveDec? Something as "prefer forgetting, or learning new?" Which is about relative ratio of the two

number of synapses that must be active in order for a column to
turn ON. The purpose of this is to prevent noisy input from
activating columns.

@param synPermInactiveDec The amount by which the permanence of an
@param synPermInactiveDec The amount by which the permanence of an //TODO make fixed and only depend on robustness?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make fixed? and depend only on robustness modifier (robustness + -> both changes - )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synPermActiveInc and synPermInactiveDec can be reformulated as learningRate and coincidenceThreshold where:

  • coincidenceThreshold = inc / dec
  • learningRate = 1 / inc which is the maximum number of cycles it takes for a synapses permanence to saturate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that might be better, for sure learningRate.
what is the meaning of coinc threshold? I take it it's whether it's easier to learn new (and "fill" mem faster), or forget when not repeated (so stable vs "one-shot" patterns?), vs balanced when equal (is this a golden middle?)

So for long timeseries, I'd choose to have more of forgetting, and for short, new, relatively rare events more learning?
The SP does not really unlearn (it could, but the capacity is just huge)?

@breznak breznak mentioned this pull request Jul 3, 2019
3 tasks
@ctrl-z-9000-times
Copy link
Collaborator

I think a better approach for this PR would be to make a parameter structure.

@breznak
Copy link
Member Author

breznak commented Jul 3, 2019

to make a parameter structure.

that's a good idea!

So a struct

{ robustness : 0.6,
  learningRate: 0.3,
  ...
}

and SP.applySmartParams(struct)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested SP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants