Skip to content

Commit

Permalink
[MINOR] Images
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Apr 7, 2017
1 parent 6ed7f1a commit 6560ca1
Show file tree
Hide file tree
Showing 46 changed files with 24 additions and 24 deletions.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added graffles/spark-streaming-jobscheduler.graffle
Binary file not shown.
Binary file not shown.
Binary file added graffles/spark-streaming-jobset-states.graffle
Binary file not shown.
Binary file added graffles/spark-streaming-receivertracker.graffle
Binary file not shown.
Binary file added graffles/spark-streaming-streamingcontext.graffle
Binary file not shown.
Binary file added graffles/spark-streaming-updateStateByKey.graffle
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-JobGenerator-start.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-JobScheduler-start.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-StateDStream-compute.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-StreamingContext-start.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-StreamingContext-stop.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-batch-processing-time.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-jobscheduler.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-jobset-states.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-kafka-webui-jobs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-receivertracker.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-streamingcontext.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-updateStateByKey.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spark-streaming-webui-streaming-tab.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion spark-streaming-InputInfoTracker.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -99,4 +99,4 @@ INFO InputInfoTracker: remove old batch metadata: [timesToCleanup]
NOTE: `Description` is used in `BatchPage` (Details of batch) in web UI for Streaming under `Input Metadata`.

.Details of batch in web UI for Kafka 0.10 direct stream with Metadata
image::../images/spark-streaming-kafka-0-10-webui-details-batch.png[align="center"]
image::images/spark-streaming-kafka-0-10-webui-details-batch.png[align="center"]
4 changes: 2 additions & 2 deletions spark-streaming-jobgenerator.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ start(): Unit
NOTE: `start` is called when link:spark-streaming-jobscheduler.adoc#starting[JobScheduler starts].

.JobGenerator Start (First Time) procedure (tip: follow the numbers)
image::../images/spark-streaming-JobGenerator-start.png[align="center"]
image::images/spark-streaming-JobGenerator-start.png[align="center"]

It first checks whether or not the internal event loop has already been created which is the way to know that the JobScheduler was started. If so, it does nothing and exits.

Expand Down Expand Up @@ -317,7 +317,7 @@ clearCheckpointData(time: Time)
In short, `clearCheckpointData` requests the link:spark-streaming-dstreamgraph.adoc#clearCheckpointData[DStreamGraph], link:spark-streaming-receivertracker.adoc#cleanupOldBlocksAndBatches[ReceiverTracker], and link:spark-streaming-InputInfoTracker.adoc#cleanup[InputInfoTracker] to do their cleaning and marks the current batch `time` as <<lastProcessedBatch, fully processed>>.

.JobGenerator and ClearCheckpointData event
image::../images/spark-streaming-JobGenerator-ClearCheckpointData-event.png[align="center"]
image::images/spark-streaming-JobGenerator-ClearCheckpointData-event.png[align="center"]

When executed, `clearCheckpointData` first requests link:spark-streaming-dstreamgraph.adoc#clearCheckpointData[DStreamGraph to clear checkpoint data for the given batch time].

Expand Down
8 changes: 4 additions & 4 deletions spark-streaming-jobscheduler.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
*Streaming scheduler* (`JobScheduler`) schedules streaming jobs to be run as Spark jobs. It is created as part of link:spark-streaming-streamingcontext.adoc#creating-instance[creating a StreamingContext] and starts with it.

.JobScheduler and Dependencies
image::../images/streaming-jobscheduler.png[align="center"]
image::images/spark-streaming-jobscheduler.png[align="center"]

It tracks jobs submitted for execution (as <<JobSet, JobSets>> via <<submitJobSet, submitJobSet>> method) in <<internal-registries, jobSets>> internal map.

Expand Down Expand Up @@ -40,7 +40,7 @@ DEBUG JobScheduler: Starting JobScheduler
It then goes over all the dependent services and starts them one by one as depicted in the figure.

.JobScheduler Start procedure
image::../images/spark-streaming-JobScheduler-start.png[align="center"]
image::images/spark-streaming-JobScheduler-start.png[align="center"]

It first starts <<eventLoop, JobSchedulerEvent Handler>>.

Expand Down Expand Up @@ -233,7 +233,7 @@ It reports an error if the job's result is a failure.
A `JobSet` represents a collection of link:spark-streaming.adoc#Job[streaming jobs] that were created at (batch) `time` for link:spark-streaming-dstreamgraph.adoc#generateJobs[output streams] (that have ultimately produced a streaming job as they may opt out).

.JobSet Created and Submitted to JobScheduler
image::../images/spark-streaming-jobset-generatejobs-event.png[align="center"]
image::images/spark-streaming-jobset-generatejobs-event.png[align="center"]

`JobSet` tracks what streaming jobs are in incomplete state (in `incompleteJobs` internal registry).

Expand All @@ -256,7 +256,7 @@ A `JobSet` changes state over time. It can be in the following states:
* *Completed* after `JobSet.handleJobCompletion` and no more jobs are incomplete (in `incompleteJobs` internal registry). `processingEndTime` is set.

.JobSet States
image::../images/spark-streaming-jobset-states.png[align="center"]
image::images/spark-streaming-jobset-states.png[align="center"]

Given the states a `JobSet` has *delays*:

Expand Down
6 changes: 3 additions & 3 deletions spark-streaming-kafka.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -65,17 +65,17 @@ If `zookeeper.connect` or `group.id` parameters are not set, they are added with
In this mode, you will only see jobs submitted (in the *Jobs* tab in link:spark-webui.adoc[web UI]) when a message comes in.

.Complete Jobs in web UI for batch time 22:17:15
image::../images/spark-streaming-kafka-webui-jobs.png[align="center"]
image::images/spark-streaming-kafka-webui-jobs.png[align="center"]

It corresponds to *Input size* larger than `0` in the *Streaming* tab in the web UI.

.Completed Batch in web UI for batch time 22:17:15
image::../images/spark-streaming-kafka-webui-streaming.png[align="center"]
image::images/spark-streaming-kafka-webui-streaming.png[align="center"]

Click the link in Completed Jobs for a batch and you see the details.

.Details of batch in web UI for batch time 22:17:15
image::../images/spark-streaming-kafka-webui-details-batch.png[align="center"]
image::images/spark-streaming-kafka-webui-details-batch.png[align="center"]

=== [[spark-streaming-kafka-0-10]] `spark-streaming-kafka-0-10` Library Dependency

Expand Down
2 changes: 1 addition & 1 deletion spark-streaming-operators-stateful.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ NOTE: Please consult https://issues.apache.org/jira/browse/SPARK-2629[SPARK-2629
The state update function `updateFn` scans every key and generates a new state for every key given a collection of values per key in a batch and the current state for the key (if exists).

.updateStateByKey in motion
image::../images/spark-streaming-updateStateByKey.png[align="center"]
image::images/spark-streaming-updateStateByKey.png[align="center"]

Internally, `updateStateByKey` executes link:spark-sparkcontext.adoc#closure-cleaning[SparkContext.clean] on the input function `updateFn`.

Expand Down
2 changes: 1 addition & 1 deletion spark-streaming-receivertracker.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
`ReceiverTracker` manages execution of all link:spark-streaming-receivers.adoc[Receivers].

.ReceiverTracker and Dependencies
image::../images/streaming-receivertracker.png[align="center"]
image::images/spark-streaming-receivertracker.png[align="center"]

It uses link:spark-rpc.adoc[RPC environment] for communication with link:spark-streaming-receiversupervisors.adoc[ReceiverSupervisors].

Expand Down
2 changes: 1 addition & 1 deletion spark-streaming-statedstreams.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ If however `parent` has not generated a RDD for the current batch but the state
NOTE: No input data for already-running input stream triggers (re)computation of the state RDD (per partition).

.Computing stateful RDDs (StateDStream.compute)
image::../images/spark-streaming-StateDStream-compute.png[align="center"]
image::images/spark-streaming-StateDStream-compute.png[align="center"]

If the state RDD has been found, which means that this is the first input data batch, `parent` stream is requested to link:spark-streaming-dstreams.adoc#getOrCompute[getOrCompute] the RDD for the current batch.

Expand Down
6 changes: 3 additions & 3 deletions spark-streaming-streamingcontext.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mo
====

.StreamingContext and Dependencies
image::../images/streaming-streamingcontext.png[align="center"]
image::images/spark-streaming-streamingcontext.png[align="center"]

A link:spark-streaming-dstreamgraph.adoc[DStreamGraph] is created.

Expand Down Expand Up @@ -200,7 +200,7 @@ java.lang.IllegalStateException: Only one StreamingContext may be started in thi
If no other StreamingContext exists, it performs <<validate, setup validation>> and link:spark-streaming-jobscheduler.adoc#start[starts `JobScheduler`] (in a separate dedicated daemon thread called *streaming-start*).

.When started, StreamingContext starts JobScheduler
image::../images/spark-streaming-StreamingContext-start.png[align="center"]
image::images/spark-streaming-StreamingContext-start.png[align="center"]

It enters <<states, ACTIVE>> state.

Expand Down Expand Up @@ -248,7 +248,7 @@ If a user requested to stop the underlying SparkContext (when `stopSparkContext`
It is only in <<states, ACTIVE>> state when `stop` does more than printing out WARN messages to the logs.

.StreamingContext Stop Procedure
image::../images/spark-streaming-StreamingContext-stop.png[align="center"]
image::images/spark-streaming-StreamingContext-stop.png[align="center"]

It does the following (in order):

Expand Down
16 changes: 8 additions & 8 deletions spark-streaming-webui.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
When you link:spark-streaming-streamingcontext.adoc#start[start a Spark Streaming application], you can use link:../spark-webui.adoc[web UI] to monitor streaming statistics in *Streaming* tab (aka _page_).

.Streaming Tab in web UI
image::../images/spark-streaming-webui-streaming-tab.png[align="center"]
image::images/spark-streaming-webui-streaming-tab.png[align="center"]

NOTE: The number of completed batches to retain to compute statistics upon is controlled by link:spark-streaming-settings.adoc[spark.streaming.ui.retainedBatches] (and defaults to `1000`).

Expand All @@ -16,7 +16,7 @@ NOTE: The Streaming page uses link:spark-streaming-streaminglisteners.adoc#Strea
*Basic Information* section is the top-level section in the Streaming page that offers basic information about the streaming application.

.Basic Information section in Streaming Page (with Receivers)
image::../images/spark-streaming-webui-streaming-statistics.png[align="center"]
image::images/spark-streaming-webui-streaming-statistics.png[align="center"]

The section shows the link:spark-streaming-dstreamgraph.adoc#batchDuration[batch duration] (in _Running batches of [batch duration]_), and the time it runs for and since link:spark-streaming-streamingcontext.adoc#creating-instance[StreamingContext was created] (_not_ when this streaming application has been started!).

Expand All @@ -35,7 +35,7 @@ The average event rate for all registered streams is displayed (as _Avg: [avg] e
*Scheduling Delay* is the time spent from link:spark-streaming-jobscheduler.adoc#submitJobSet[when the collection of streaming jobs for a batch was submitted] to link:spark-streaming-jobscheduler.adoc#JobStarted[when the first streaming job (out of possibly many streaming jobs in the collection) was started].

.Scheduling Delay in Streaming Page
image::../images/spark-streaming-webui-streaming-page-scheduling-delay.png[align="center"]
image::images/spark-streaming-webui-streaming-page-scheduling-delay.png[align="center"]

It should be as low as possible meaning that the streaming jobs in batches are scheduled almost instantly.

Expand All @@ -55,14 +55,14 @@ messages.foreachRDD { rdd =>
----

.Scheduling Delay Increased in Streaming Page
image::../images/spark-streaming-webui-scheduling-delay-increase.png[align="center"]
image::images/spark-streaming-webui-scheduling-delay-increase.png[align="center"]

==== [[processing-time]] Processing Time

*Processing Time* is the time spent to complete all the streaming jobs of a batch.

.Batch Processing Time and Batch Intervals
image::../images/spark-streaming-batch-processing-time.png[align="center"]
image::images/spark-streaming-batch-processing-time.png[align="center"]

==== [[total-delay]] Total Delay

Expand All @@ -79,12 +79,12 @@ image::../images/spark-streaming-batch-processing-time.png[align="center"]
NOTE: The number of retained batches is controlled by link:spark-streaming-settings.adoc[spark.streaming.ui.retainedBatches].

.Completed Batches (limited to 5 elements only)
image::../images/spark-streaming-webui-completed-batches.png[align="center"]
image::images/spark-streaming-webui-completed-batches.png[align="center"]

=== Example - Kafka Direct Stream in web UI

.Two Batches with Incoming Data inside for Kafka Direct Stream in web UI (Streaming tab)
image::../images/spark-streaming-webui-streaming-tab-kafka-directstream-two-batches.png[align="center"]
image::images/spark-streaming-webui-streaming-tab-kafka-directstream-two-batches.png[align="center"]

.Two Jobs for Kafka Direct Stream in web UI (Jobs tab)
image::../images/spark-streaming-webui-kafka-directinputstream-two-jobs.png[align="center"]
image::images/spark-streaming-webui-kafka-directinputstream-two-jobs.png[align="center"]

0 comments on commit 6560ca1

Please sign in to comment.