Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(logging)!: add client & worker option to pass a custom logger #1171

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nikoksr
Copy link

@nikoksr nikoksr commented Jan 9, 2025

Description

We're actively using and building on Hatchet and enjoy it a lot. Not being able to plug our logger into the system is an issue for us though. This PR aims to provide a solution that seamlessly fits into the existing Hatchet SDK.

Adds WithLogger() option to the client and worker. The option replaces the WithLogLevel() option as its possible to supply a logger with the desired log-level and pass it to WithLogger().

Note: Since the removal of WithLogLevel technically is a breaking change, I'd be fine with rolling this part back despite the functional redundancy.

Note: PR is incomplete on purpose; wanted to validate the contribution before completing it.

Relates to #1170

Examples

Setting a custom logger:

logger := zerolog.New(os.Stdout)

// Client
hatchet, err := client.New(
    client.WithLogger(&logger),
    // other options ...
)

// Worker
worker, err := worker.NewWorker(
    worker.WithLogger(&logger),
    // other options ...
)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation change (pure documentation change)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking changes to code which doesn't change any behaviour)
  • CI (any automation pipeline changes)
  • Chore (changes which are not directly related to any business logic)
  • Test changes (add, refactor, improve or change a test)
  • This change requires a documentation update

What's Changed

  • Add client.WithLogger option
  • Remove client.WithLogLevel option
  • Add worker.WithLogger option
  • Remove worker.WithLogLevel option

Extra

Thank you for building Hatchet, really enjoy working with it and I'd love to contribute more in the future. Cheers to you guys :)

Adds WithLogger() option to the client and worker. The option replaces
the WithLogLevel() option as its possible to supply a logger with the
desired log-level and pass it to WithLogger().
Copy link

vercel bot commented Jan 9, 2025

@nikoksr is attempting to deploy a commit to the Hatchet Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@abelanger5 abelanger5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nikoksr, thank you for this PR!

I just wanted to confirm: is the intention for this PR to provide a custom logger for Hatchet internal messages, or to provide a custom logger to automatically send application logs to Hatchet (which are displayed in the Hatchet UI)?

If it's the former, I believe we can work on this in conjunction with #1170, as these are targeting separate concerns. Otherwise we'll need to find a way to consolidate the PRs.

worker.WithMaxRuns(200),
)

w, err := worker.NewWorker(worker.WithClient(c), worker.WithMaxRuns(200))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would definitely pass in a zerolog logger configured to log at a warn level -- otherwise this test becomes very noisy.

@nikoksr
Copy link
Author

nikoksr commented Jan 14, 2025

@abelanger5 the purpose of this PR is to satisfy our requirement of passing a pre-configured logger to our Hatchet workers and clients so that the logs produced by them are contextualized, identifiable and satisfy our log structure requirements.

So for example if I set up a logger like this:

logger := zerolog.New(os.Stdout).With().
    Timestamp().
    Str("service", "bananas").
    // Other important fields
    Logger()

I'd like to be able to set up my Hatchet worker like this:

worker, err := worker.NewWorker(
    worker.WithName(name),
    worker.WithClient(client),
    worker.WithMaxRuns(1),
    worker.WithLogger(logger), // Pass my custom logger
)

Now logs created by the worker would be structured and presented based on our logger config, not based on the general purpose logger created by Hatchet's opinionated logger.NewDefaultLogger() function.


Besides, I think, and I'm really unsure about this because it's not a problem I ever had to solve, that you can now even add a hook to that logger before passing it to the worker which would enable a behavior similar that from #1170 while keeping the API of the HatchetContext clean.

I'm not sure if this works, but doing

logger := context.NewCombinedLogrusLogger(logrusLogger)

// and

logger  := context.NewCombinedZerologLogger(zerologLogger)

is just not a scalable solution IMHO and it will swamp the API over time as more demands will need to be met. @grutt seems to be sharing that sentiment.

Taking inspiration from #1170, something like this might work:

type zerologHook struct {
	zeroLogger zerolog.Logger
}

func (h *zerologHook) Levels() []logrus.Level {
	return logrus.AllLevels
}

func (h *zerologHook) Fire(entry *logrus.Entry) error {
	// Map Logrus level to Zerolog level
	var event *zerolog.Event
	switch entry.Level {
	case logrus.DebugLevel:
		event = h.zeroLogger.Debug()
	case logrus.InfoLevel:
		event = h.zeroLogger.Info()
		// Add other levels...
	}

	// Add fields
	for key, value := range entry.Data {
		event = event.Interface(key, value)
	}

	event.Msg(entry.Message)
	return nil
}


func main() {
	// Your zerolog instance
	logger := zerolog.New(os.Stdout).With().
		Timestamp().
		Str("service", "bananas").
		Logger()

	// Create Logrus logger that forwards to Zerolog
	logrusLogger := logrus.New()
	logrusLogger.AddHook(&zerologHook{zeroLogger: logger})

	logger.Info().Msg("zerolog: starting hatchet worker")
	logrusLogger.Info("logrus: starting hatchet worker") // Writes to Zerolog and can be distributed throughout your app code

	// Now pass the zerolog instance with logrus forwarding to hatchet
	worker, err := worker.NewWorker(
    	worker.WithLogger(logger), 
    )
}

Again, I'm rather sure that something like this would be possible, however, my "solution" should be treated like pseudo-code as its merely meant to demonstrate my thought process.

If something like this were possible, I think a solid implementation combined with a couple documentation examples for the most popular loggers would do the trick while keeping the exposed interfaces clean.


Brainstorm of alternative ideas:

I think the core question to answer, is, whether Hatchet - as a library - should support a "bring your own logger" feature. And I don't mean take control over how hatchet's logger behaves (purpose of this PR), but in the sense of, do we want to support for other types of loggers that are not zerolog.

The simple answer of course being: no. In that case, we could:

  • Make the HatchetContext.Log() function more flexible. Make it accept a log level, a message and fields, for example.
  • Add a HatchetContext.Logger() function which gives access to the underlying zerolog instance allowing for native logging inside workflow-jobs and similar.

If the answer is yes, here's a couple ideas that would come to my mind:

  • Move away from zerolog as the fixed logger and take an interface approach similar to what libraries like pgx do. Combined with a WithLogger() option this would add permanent support for any logger the user might want to use.
  • Move to log/slog which, as described in this comment, is mostly just a logging front-end. Meaning, while you use slog in Hatchet, users can provide a different backend for the logger (e.g. logrus, zerolog, zap), again, through a WithLogger type option. This approach is similar - potentially these days more idiomatic - than the previously described interface approach. There's even a collection of libraries that implement these different backends for slog: samber/slog-zerolog, samber/slog-logrus, samber/slog-zap.
  • Document with examples for a zerologHook type approach as described in the section above.

I hope these suggestions are helpful. Hatchet has significant potential, and I really enjoy working with it. I'm interested in maintaining and improving the quality of the Hatchet experience :)

@abelanger5
Copy link
Contributor

I think the core question to answer, is, whether Hatchet - as a library - should support a "bring your own logger" feature. And I don't mean take control over how hatchet's logger behaves (purpose of this PR), but in the sense of, do we want to support for other types of loggers that are not zerolog.

This is a definite yes -- and I actually would prefer the hook approach outlined above, as we could provide standard hooks for common libraries, and for people using zerolog the integration would be seamless.


Somewhat related to the above, the most difficult technical challenge to overcome is how we can most easily associate step runs with log lines that are displayed in the Hatchet UI. The reason this is difficult is that Go doesn't support the equivalent of contextvars in Python, which is the easiest way to do auto-instrumentation. There are packages like gls, but that's not a path we'd like to go down.

So we're stuck doing some kind of clunky initialization at the top of every step run -- and while this is the main issue that @grutt flagged in the other PR, there's simply no way to avoid it without doing something like GLS.

That said, I do prefer the interface of hatchetCtx.Logger() which returns a *zerolog.Logger, which is scoped to that step run and sends any logs produced by that logger automatically to Hatchet. I prefer this because it prevents the user from maintaining a reference to a global logger that needs to be passed in as an argument to context.NewCombinedZerologLogger, which is currently the drawback of the implementation in #1170.

So if @reillyse agrees, I think we can combine efforts here and modify the API with the following:

  1. Accept a WithLogger(l *zerolog.Logger) argument to worker.New.
  2. Add a context.Logger() *zerolog.Logger method to the Hatchet context.

Users are most likely not using WithLogLevel, so I'm comfortable removing that method.

If users are coming from logrus or other logger, they will need to:

  1. Register a hook with the *zerolog.Logger that they pass into worker.New
  2. Register a transformer for *zerolog.Logger -> logrus when they call context.Logger().

Again, both the hook/transformer are helper methods that we can provide for common libraries (though I'm not sure we want a bunch of logging backends as part of the in-tree build, so perhaps providing these as separate packages would make sense here).

@reillyse
Copy link
Contributor

I think this makes sense as an approach, only issue I see is clunkiness for people who are not using zero logger as their logger.
So to sumarize

We use zerologger internally for all logging. People can pass their own zerologger with a WithLogger() arg to worker.New

any and all internal hatchet client logs will be logged to this logger.

If users want to write logs in a hatchet worker they get a zerologger with

logger := context.Logger()

and they can then log as normal these logs will be sent to Hatchet as well as logged locally.

People are free to use their default logger but it will not log to hatchet.

We only interface with zero logger if people want to use a different logger e.g. logrus it is on them to provide us with a loggger compatible implementation that will forward logs to their logger.

@nikoksr
Copy link
Author

nikoksr commented Jan 14, 2025

Somewhat related to the above, the most difficult technical challenge to overcome is how we can most easily associate step runs with log lines that are displayed in the Hatchet UI. The reason this is difficult is that Go doesn't support the equivalent of contextvars in Python, which is the easiest way to do auto-instrumentation. There are packages like gls, but that's not a path we'd like to go down.

So we're stuck doing some kind of clunky initialization at the top of every step run -- and while this is the main issue that @grutt flagged in the other PR, there's simply no way to avoid it without doing something like GLS.

I'm anxious to understand this problem you're describing better. I'm not a python dev nor do I know Hatchet's internals well enough, but reading the first couple sentences of the contextvars documentation does make it sound a lot like Python's equivalent to Go's standard library context package. We use context all the time to pass request-scoped values across API boundaries and between processes. It's rather common to attach something like a step_id to a context, for example.

// Create a context with association fields
ctx = context.WithValue(ctx, "job_id", jobID)

Again, most likely I don't understand Hatchet's internals nor the issue you guys are facing well enough yet to really make a suggestion here.


I think going with either the log/slog or the interface solution will be more flexible and easier to maintain in the grand scheme of things, however, I do understand that you guys want to keep your pace up and don't want to revamp your logging solution now. I think validating the hooks approach in combination with a WithLogger option and offering decent documentation on it, then eventually switching to an interface based solution in the future is an okay approach considering that you guys want to maintain momentum.


Regarding hatchetCtx.Logger(), I'm not taking my idea back, however, just to put it out there, zerolog also supports context-binding natively: see. This allows for building a contextualized logger, e.g.:

// Just before kicking off a workflow...
// Assuming l is the internal logger.
workflowLogger := l.With().Str("workflow_id", workflowID)

// Then either pass this newly created logger to all following, relevant functions, or, bind the logger to the request.
ctx = workflowLogger.WithContext(ctx)

// Now continue using the ctx

// In a function downstream, e.g. a workflow job handler, you can now extract the (still) contextualized logger from the context, e.g.:

logger := zerolog.Ctx(ctx)

// logger will include the workflow-id in all log statements.

The loggers context can be expanded further. You could, for example when a workflow gets triggered, begin by creating a logger that binds a workflow id:

workflowLogger := l.With().Str("workflow_id", workflowID)
ctx = workflowLogger.WithContext(ctx)

Then per job that gets executed, you could expand the context by doing:

jobLogger := zerolog.Ctx(ctx).With().Str("job_id", jobID)
ctx = jobLogger.WithContext(ctx)

And per step that gets executed:

stepLogger := zerolog.Ctx(ctx).With().Str("step_id", stepID)
ctx = stepLogger.WithContext(ctx)

If you'd then extract the logger from the context in say a step handler and use it:

worker.Fn(func(ctx worker.HatchetContext) (*event.SendMessageOutput, error) {
    // Logger already has workflow_id and job_id from earlier context
    logger := zerolog.Ctx(ctx)
    
    input := &event.SendMessageInput{}
    if err := ctx.WorkflowInput(input); err != nil {
        logger.Error().Err(err).Msg("failed to get workflow input")
        return nil, err
    }
    // ...
})

The resulting log message would look like:

2025-01-14T20:45:31.746173419Z INF somewhere/in/hatchet.go:63 > failed to get workflow input workflow_id="wf_123" job_id="wfj_456" step_id="wfjs_789" error="invalid input format"

Of course this assumes that the context was properly distributed downstream before. Just for completeness, you could also bind those values directly to the context instead of to the logger.

Again, I wasn't sure if you were coming from a Python background and wanted to share my experience with similar patterns in Go. If you're already familiar with all this, please feel free to ignore - I just wanted to make sure I'm helping to bridge any potential knowledge gaps between the Python and Go ecosystems.

@nmetaintro
Copy link

Hey, I implemented a workaround that forces all output (stdout & stderr) into my structured logger (zap) while preserving formatting. It’s not the cleanest solution, but it works.

package logger

import (
	"bufio"
	"encoding/json"
	"fmt"
	"log"
	"os"
	"sync"
	"time"

	"github.com/rs/zerolog"
	"go.uber.org/zap"
	"go.uber.org/zap/zapcore"
)

var (
	Logger *zap.Logger
	once   sync.Once
)

type zapWriter struct{}

func (w zapWriter) Write(p []byte) (n int, err error) {
	var logEntry map[string]interface{}

	err = json.Unmarshal(p, &logEntry)
	if err != nil {
		Logger.Info(string(p))
		return len(p), nil
	}

	levelStr, _ := logEntry["level"].(string)
	service, _ := logEntry["service"].(string)
	message, _ := logEntry["message"].(string)
	timestamp, _ := logEntry["time"].(string)

	parsedTime, err := time.Parse(time.RFC3339Nano, timestamp)
	if err != nil {
		parsedTime = time.Now()
	}

	var zapLevel zapcore.Level
	switch levelStr {
	case "trace", "debug":
		zapLevel = zap.DebugLevel
	case "info":
		zapLevel = zap.InfoLevel
	case "warn":
		zapLevel = zap.WarnLevel
	case "error":
		zapLevel = zap.ErrorLevel
	case "fatal":
		zapLevel = zap.FatalLevel
	case "panic":
		zapLevel = zap.PanicLevel
	default:
		zapLevel = zap.InfoLevel
	}

	logMsg := fmt.Sprintf("%s\t%s\t[%s]\t%s",
		parsedTime.Format("2006/01/02 15:04:05"),
		zapLevel.CapitalString(),
		service,
		message,
	)

	Logger.Check(zapLevel, logMsg).Write()
	return len(p), nil
}

func InitLogger() {
	once.Do(func() {
		cfg, err := config.LoadConfig()
		if err != nil {
			log.Fatalf("Config validation failed: %v", err)
		}

		logLevel, err := zapcore.ParseLevel(cfg.LOG_LEVEL)
		if err != nil {
			log.Fatalf("Invalid log level: %v", err)
		}

		customTimeEncoder := func(t time.Time, enc zapcore.PrimitiveArrayEncoder) {
			enc.AppendString(t.Format("2006/01/02 15:04:05"))
		}

		encoderConfig := zapcore.EncoderConfig{
			TimeKey:      "timestamp",
			LevelKey:     "level",
			MessageKey:   "message",
			CallerKey:    "caller",
			EncodeTime:   customTimeEncoder,
			EncodeLevel:  zapcore.CapitalColorLevelEncoder,
			EncodeCaller: zapcore.ShortCallerEncoder,
		}

		core := zapcore.NewCore(
			zapcore.NewConsoleEncoder(encoderConfig),
			zapcore.Lock(os.Stderr),
			logLevel,
		)

		Logger = zap.New(core, zap.AddCaller(), zap.AddStacktrace(zapcore.ErrorLevel))

		zerolog.SetGlobalLevel(zerolog.DebugLevel)
		logger := zerolog.New(zapWriter{}).With().Timestamp().Logger()
		zerolog.DefaultContextLogger = &logger

		redirectStdoutToZap()
		redirectStderrToZap()
	})
}

func GetLogger() *zap.Logger {
	if Logger == nil {
		InitLogger()
	}

	return Logger
}

func parseAndLogJSON(logLine string) {
	var logEntry map[string]interface{}

	err := json.Unmarshal([]byte(logLine), &logEntry)
	if err != nil {
		Logger.Error("Failed to parse log entry", zap.String("log", logLine), zap.Error(err))
		return
	}

	message, ok := logEntry["message"].(string)
	if !ok {
		Logger.Error("Log entry missing 'message' field or invalid type", zap.Any("log", logEntry))
		return
	}
	level, _ := logEntry["level"].(string)

	switch level {
	case "error":
		Logger.Error(message)
	case "warn":
		Logger.Warn(message)
	case "info":
		Logger.Info(message)
	case "debug":
		Logger.Debug(message)
	default:
		Logger.Info(message)
	}
}

func redirectOutputToZap(reader *os.File) {
	scanner := bufio.NewScanner(reader)
	for scanner.Scan() {
		parseAndLogJSON(scanner.Text())
	}
}

func redirectStdoutToZap() {
	r, w, _ := os.Pipe()
	os.Stdout = w
	go redirectOutputToZap(r)
}

func redirectStderrToZap() {
	r, w, _ := os.Pipe()
	os.Stderr = w
	go redirectOutputToZap(r)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants