datalake логгер в кликхаус
Go to file
skeris a43552caae
All checks were successful
Deploy / CreateImage (push) Successful in 43s
Deploy / DeployService (push) Successful in 25s
add ports exposing rules for main
2025-02-22 13:13:37 +03:00
.gitea/workflows debug lint action 2024-11-20 01:55:15 +03:00
app add ports exposing rules for main 2025-02-22 13:13:37 +03:00
clients upd with dev into workWithConfig 2024-12-06 16:33:36 +03:00
cmd upd after review 2024-12-09 12:46:14 +03:00
controllers/http/health_check remove themakers/hlog 2024-12-10 21:56:14 +03:00
dal remove themakers/hlog 2024-12-10 21:56:14 +03:00
deployments add ports exposing rules for main 2025-02-22 13:13:37 +03:00
model add storing 2021-02-21 21:13:13 +03:00
proto modify protobuf 2024-11-11 13:09:37 +03:00
server - 2024-11-11 14:53:43 +03:00
sink add ports exposing rules for main 2025-02-22 13:13:37 +03:00
tlanaliz rename succes 2024-11-07 11:58:57 +03:00
version create minimum service 2021-02-16 15:18:41 +03:00
wrappers remove themakers/hlog 2024-12-10 21:56:14 +03:00
.gitignore feat: taskfile for linting, with rules from repo 2024-07-25 21:23:04 +03:00
Dockerfile update staging envs to split staging table from production 2024-12-09 21:26:52 +03:00
generate.go add point to add columns to table 2021-03-01 19:36:52 +03:00
go.mod remove themakers/hlog 2024-12-10 21:56:14 +03:00
go.sum remove themakers/hlog 2024-12-10 21:56:14 +03:00
LICENSE Initial commit 2021-01-25 11:56:44 +03:00
main.go update test and now sendMutex is pointer 2024-11-19 16:58:34 +03:00
Makefile Add linter stage to pipeline 2024-07-27 12:42:52 +00:00
README.md Update README.md 2022-07-10 18:49:54 +03:00
Taskfile.dist.yml update test and now sendMutex is pointer 2024-11-19 16:58:34 +03:00

trashlog(TL)

subj is a log collecting service for large microservice systems. It provides an grpc inteface for communicating with log emitters. I've choose grpc because it pretty nice, does't need text data overhead due to data transporting and most of backend languages has own protoc to generate clients using provided .proto file. For statistics database TL use clickhouse as one of the best OLAP DB. On the other hand was Apache Cassandra but ClickHouse a way more flexible. If ClickHouse is down TL store unprocessed records to local bbolt storage. Clickhouse has some constrains on inserting such as it may crush on multiple discrete inserts, so I implement batch inserting every second

Potential troubles

  • with large amount of incoming data batch may retard. I mean that there are may be more than one second to batch iteration. I have implementation of zero allocation workerpool and I can use it there but it is static workerpool without growing and it may create the same bottleneck but wider(n times wider, where n is workerpool size). On the other hand I can create goroutine for each batch to avoid retarding, but it consume more RAM. The worst of is is that the more retarding batch is than larger next batch

Requirements

  • docker.io
  • ClickHouse server version 22.1.3.7 (official build).

you can either install them manually or use make install command

HOW TO USE

grpc client generation

just get protoc compiler for your language, go to the root of project and... protoc -I ./proto ./proto/trashlog.proto --<YOUR LOVEABLE LANGUAGE>_out=plugins=grpc:<PATH TO STORE GENERATED DATA>

DISCLAIMER: clickhouse table has strict structure of fields so I don't recommend to try to store fields that are absent.

ADDING FIELDS

to add fields use GetFields method for obtain current fields with their types and Modyfy method for add fields. Thank Gooooooood for proto files, they are pretty impressive and I don't need to describe structure of every request. I just need describe the right way to use them.

First of all you should know that there are no convenient way to remove or to modify table field in clickhouse. if you'll try to make it automatically you will have chance to broke table, reach deadlock or stuck table for hours. so for avoid this there are no request for removing or modifying table. if you sure that you need to do it - do it youself in clickhouse-client

So if you want to add field you should first GetFields to be sure that field with such name and type doesn't exists. Than you can fire Modify request. there are list of acceptable types for fields

			"String",
			"Int64",
			"Float64",
			"UInt8",

Storing logrecords

I think that it is bad idea to send an unary request for every logrecord to store when we can use streams isn't it? so just create one Valve stream and send data in it every time when you need to store log event

WRAPPERS

TL provide some zap.LoggerCore wrappers to make using of trashlog more convinient and simple. There are two ways of logging - to telegram channel by BotAPI or to trashlog server. Strongly recommended use github.com/themakers/hlog as log formatter

ZAPTG

to use it just create a new zaptg core and wrap old logger with new core. ctx - current app context zap.InfoLevel - min loglevel to store - token priveded by BotFather version.Release - version of app. should use git tag version.Commit - current commit 0 - build time

tel, err := zaptg.NewCore(
	ctx,
	zap.InfoLevel,
	"<bot token>",
	version.Release,
	version.Commit,
	0,
	<channel id>,
)

if err != nil {
	panic(err)
}
logger = logger.WithOptions(zap.WrapCore(func(core zapcore.Core) zapcore.Core {
	return zapcore.NewTee(core, tel)
}))

ZAPTRASHLOG

Wrapper to send data to trashlog server to use it just create a new zaptg core and wrap old logger with new core. ctx - current app context zap.InfoLevel - min loglevel to store - token priveded by BotFather v.Release - version of app. should use git tag v.Commit - current commit time.Now().Unix() - run time

tl, err := zaptrashlog.NewCore(
	ctx,
	zapcore.DebugLevel,
	"<trashlog uri>",
	v.Release,
	v.Commit,
	time.Now().Unix(),
)

if err != nil {
	panic(err)
}
logger = logger.WithOptions(zap.WrapCore(func(core zapcore.Core) zapcore.Core {
	return zapcore.NewTee(core, tel)
}))