Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Placeholder Replacement in Runs: Text Gets Split, Leading to Formatting Issues #530

Open
hq-zhonger opened this issue Jan 14, 2025 · 2 comments

Comments

@hq-zhonger
Copy link

Description:
Hello Unioffice Team,

I have purchased a Unioffice license and I'm developing an automated report generation system based on placeholders ({{variable}}). However, I'm encountering an issue where Unioffice splits text into multiple runs, causing placeholders to be split into separate pieces and making it difficult to correctly replace them.

If I try to concatenate all runs into a single string, perform the replacements, and then write them back, it leads to formatting loss or broken text.

Reproduction Steps:
1.Create a Word template (template.docx) with placeholders like {{application}}.
2.Open the document with Unioffice and read the paragraph runs.
3.Observe that {{application}} might be split into multiple runs ({{app, lic, ation}}).
4.Attempt to replace the placeholder by merging runs, modifying the text, and writing it back.
5.The formatting is lost, and sometimes the text becomes corrupted.

Current Code (Go)

func (mbafc *MBAFCTemplate) FillTemplate() {
for _, para := range mbafc.template.Paragraphs() {
runs := para.Runs()
if len(runs) == 0 {
continue
}

	// **1️⃣ 逐步拼接 Run,找到完整的 `{{变量}}`**
	var buffer string
	var runIndex []int // 记录属于同一个 `{{变量}}` 的 Run 索引
	runMap := make(map[int]string)

	for i, run := range runs {
		text := run.Text()
		buffer += text
		runIndex = append(runIndex, i)
		runMap[i] = text

		// **检查是否有 `{{变量}}` 完整匹配**
		matches, _ := ExtractPlaceholders(buffer)
		if len(matches) > 0 {
			for _, placeholder := range matches {
				replacement := mbafc.getReplacement(placeholder)
				buffer = strings.ReplaceAll(buffer, "{{"+placeholder+"}}", replacement)
			}

			// **2️⃣ 按原 Run 结构写回**
			remainingText := buffer
			for _, idx := range runIndex {
				runs[idx].ClearContent()

				// ✅ **确保不会越界**
				if len(remainingText) > 0 {
					writeLen := min(len(runMap[idx]), len(remainingText))
					runs[idx].AddText(remainingText[:writeLen])
					remainingText = remainingText[writeLen:]
				}
			}

			// **3️⃣ 清空缓存,继续匹配下一个**
			buffer = ""
			runIndex = []int{}
		}
	}
}

}

Problem:
Unioffice splits text into multiple runs, which makes placeholder replacement difficult.
Merging runs to perform replacements leads to formatting loss and potential text corruption.
Expected Behavior (Similar to Python’s docxtpl)
In Python, docxtpl allows me to replace placeholders without breaking formatting:

from docxtpl import DocxTemplate

doc = DocxTemplate("template.docx")
context = {'application': 'My Application', 'version': 'V1.0'}
doc.render(context)
doc.save("output.docx")

Question:
How can I achieve similar behavior in Unioffice?
Is there a way to replace text inside runs without losing formatting, or prevent Unioffice from splitting text into multiple runs in the first place?

Screenshots and Output Files
image
image
image

Thank you! 🚀

@hq-zhonger
Copy link
Author

Update:
I have successfully resolved most of the issues, and the placeholder replacement now works correctly without losing formatting or causing duplicate content. However, a small number of paragraphs still experience encoding issues (garbled text) after replacement.

Possible causes:

Encoding issue: Some Run.Text() values may not be in UTF-8, or unioffice might handle character encoding inconsistently.
Run splitting issue: Some paragraphs might have text spread across multiple Run elements, causing problems with text concatenation or splitting.
Any insights or suggestions on how to handle these remaining encoding issues would be greatly appreciated!

func (mbafc *MBAFCTemplate) FillTemplate() {
for _, para := range mbafc.template.Paragraphs() {
runs := para.Runs()
if len(runs) == 0 {
continue
}

	var buffer strings.Builder
	var runIndex []int
	inPlaceholder := false

	for i, run := range runs {
		text := run.Text()
		buffer.WriteString(text)
		runIndex = append(runIndex, i)

		// 检测 `{{` 开头
		if strings.Contains(buffer.String(), "{{") {
			if !inPlaceholder {
				inPlaceholder = true
				runIndex = []int{i} // 记录 `{{` 开始位置
			}
		}

		// 检测 `}}` 结束
		if inPlaceholder && strings.Contains(buffer.String(), "}}") {
			fullText := buffer.String()

			// **提取 `{{变量}}`**
			start := strings.Index(fullText, "{{")
			end := strings.Index(fullText, "}}") + 2
			placeholder := fullText[start+2 : end-2] // 获取 `变量`

			// **获取替换值**
			replacement := mbafc.getReplacement(placeholder)
			fmt.Printf("替换占位符: %s -> %s\n", placeholder, replacement)

			// **替换 `{{变量}}`**
			newText := strings.ReplaceAll(fullText, "{{"+placeholder+"}}", replacement)
			fmt.Printf("newText: %s\n", newText)

			// **清空并写回原 Run**
			remainingText := newText
			for _, idx := range runIndex {
				runs[idx].ClearContent()
				if len(remainingText) > 0 {
					writeLen := min(len(remainingText), 20) // 每个 `Run` 最多写 20 字符,避免意外截断
					runs[idx].AddText(remainingText[:writeLen])
					remainingText = remainingText[writeLen:] // 剩余部分留给下一个 `Run`
				}
			}

			// **如果 `remainingText` 还有内容,写入最后一个 Run**
			if len(remainingText) > 0 && len(runIndex) > 0 {
				lastRun := runs[runIndex[len(runIndex)-1]]
				lastRun.AddText(remainingText)
			}

			// **检查是否正确写入**
			fmt.Printf("写入后的内容: %#v\n", runs[runIndex[0]].Text())

			// **重置状态**
			buffer.Reset()
			runIndex = nil
			inPlaceholder = false
		}
	}
}

}

// ✅ 新增一个 min 函数
func min(a, b int) int {
if a < b {
return a
}
return b
}

@Detecon-China
Copy link

Hi All,

I previously read an article mentioning that when using unioffice for {{var}} template placeholder replacement, issues arise where the string gets split into multiple segments, causing mismatches. This has led some developers to reluctantly switch to other Office libraries. Replacing placeholders using raw XML is also cumbersome.

After continuous attempts, I am bringing you one of my personal solutions to this issue. It allows placeholder replacement while preserving the original content and formatting.

In FillTemplate, placeholders are identified in segments, recording runs and runIndex. Then, the updateRunsV2 method is used to replace content within EG_RunInnerContent, ensuring that both formatting and content are retained.

`
// FillTemplate 处理文本类型的渲染
func (p *DocProcessor) FillTemplate(doc *document.Document, replacements map[string]string) error {
for _, para := range p.doc.Paragraphs() {
runs := para.Runs()
if len(runs) == 0 {
continue
}

	var buffer strings.Builder
	var runIndex []int
	inPlaceholder := false

	for i, run := range runs {
		text := run.Text()
		buffer.WriteString(text)
		runIndex = append(runIndex, i)

		if strings.Contains(buffer.String(), "{{") && !inPlaceholder {
			inPlaceholder = true
			runIndex = []int{i}
		}

		if inPlaceholder && strings.Contains(buffer.String(), "}}") {
			fullText := buffer.String()
			isBreak := false
			pla := p.extractPlaceholder(fullText)
			// 打印占位符,确保提取的正确性
			fmt.Println("Extracted placeholder:", pla)

			for placeholder, replacement := range replacements { // 遍历需要替换的占位符
				if placeholder == pla {
					//p.updateRuns(runs, runIndex, replacement)
					p.updateRunsV2(runs, runIndex, placeholder, replacement)
					buffer.Reset()
					runIndex = nil
					inPlaceholder = false
					isBreak = true
				}
			}

			if isBreak {
				continue
			}

			switch pla {
			case "vulnerabilityOverview":
				p.handleVulnerabilityOverview(para)
			case "vulnerabilitiesContent":
				p.vulnerabilitiesContent(para)
			case "managementSummaryBar":
				p.managementSummaryBar(para)
			case "portScanResults":
				p.portScanResults(para)
			case "vulnerabilityDetails":
				p.handleVulnerabilityDetails(para)
			default:
				break
			}

			buffer.Reset()
			runIndex = nil
			inPlaceholder = false
			isBreak = true
		}
	}
}

if err := p.doc.Validate(); err != nil {
	return fmt.Errorf("validate document failed: %w", err)
}

return nil

}
`

// updateRunsV2 更新段落中的文本内容 func (p *DocProcessor) updateRunsV2(runs []document.Run, runIndex []int, placeholder string, replacement string) { for _, idx := range runIndex { for _, EgRuninnerContent := range runs[idx].X().EG_RunInnerContent { if EgRuninnerContent == nil || EgRuninnerContent.RunInnerContentChoice.T == nil { continue } if strings.Contains(EgRuninnerContent.RunInnerContentChoice.T.Content, "{{") { EgRuninnerContent.RunInnerContentChoice.T.Content = strings.ReplaceAll(EgRuninnerContent.RunInnerContentChoice.T.Content, "{{", "") } if strings.Contains(EgRuninnerContent.RunInnerContentChoice.T.Content, "}}") { EgRuninnerContent.RunInnerContentChoice.T.Content = strings.ReplaceAll(EgRuninnerContent.RunInnerContentChoice.T.Content, "}}", "") } if strings.Contains(EgRuninnerContent.RunInnerContentChoice.T.Content, placeholder) { EgRuninnerContent.RunInnerContentChoice.T.Content = strings.ReplaceAll(EgRuninnerContent.RunInnerContentChoice.T.Content, placeholder, replacement) } } } }

go version: v1.24.0
unioffice version: v2.0.0

Hope this helps you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants