Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KYLIN-5371 fix segment prune #2049

Open
wants to merge 1 commit into
base: kylin4
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,17 @@
package org.apache.kylin.common.util;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;
import java.util.Map;
import java.util.TimeZone;
import java.util.concurrent.ConcurrentHashMap;

import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang3.time.FastDateFormat;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class DateFormat {

Expand All @@ -40,6 +45,8 @@ public class DateFormat {
public static final String YYYYMMDDHH = "yyyyMMddHH";
public static final String ISO_8601_24H_FULL_FORMAT = "yyyy-MM-dd'T'HH:mm:ss.SSSZZ";

private static final Logger logger = LoggerFactory.getLogger(DateFormat.class);

public static final String[] SUPPORTED_DATETIME_PATTERN = {
DEFAULT_DATE_PATTERN,
DEFAULT_DATETIME_PATTERN_WITHOUT_MILLISECONDS,
Expand Down Expand Up @@ -123,17 +130,18 @@ private static String formatToStrWithTimeZone(TimeZone timeZone, long mills, Str
public static long stringToMillis(String str) {
// try to be smart and guess the date format
if (isAllDigits(str)) {
if (str.length() == 8 && isInputFormatDate(str, COMPACT_DATE_PATTERN))
if (str.length() == 8 && isInputFormatDate(str, COMPACT_DATE_PATTERN)) {
//TODO: might be prolematic if an actual ts happends to be 8 digits, e.g. 1970-01-01 10:00:01.123
return stringToDate(str, COMPACT_DATE_PATTERN).getTime();
else if (str.length() == 10 && isInputFormatDate(str, YYYYMMDDHH))
} else if (str.length() == 10 && isInputFormatDate(str, YYYYMMDDHH)) {
return stringToDate(str, YYYYMMDDHH).getTime();
else if (str.length() == 12 && isInputFormatDate(str, YYYYMMDDHHMM))
} else if (str.length() == 12 && isInputFormatDate(str, YYYYMMDDHHMM)) {
return stringToDate(str, YYYYMMDDHHMM).getTime();
else if (str.length() == 14 && isInputFormatDate(str, YYYYMMDDHHMMSS))
} else if (str.length() == 14 && isInputFormatDate(str, YYYYMMDDHHMMSS)) {
return stringToDate(str, YYYYMMDDHHMMSS).getTime();
else
} else {
return Long.parseLong(str);
}
} else if (str.length() == 10) {
return stringToDate(str, DEFAULT_DATE_PATTERN).getTime();
} else if (str.length() == 13) {
Expand Down Expand Up @@ -194,4 +202,18 @@ public static boolean isDatePattern(String ptn) {
return COMPACT_DATE_PATTERN.equals(ptn) || YYYYMMDDHH.equals(ptn) || YYYYMMDDHHMM.equals(ptn)
|| YYYYMMDDHHMMSS.equals(ptn);
}

public static Long getFormatTimeStamp(long time, String pattern) {
try {
if (StringUtils.isNotBlank(pattern)) {
SimpleDateFormat sdf = new SimpleDateFormat(pattern, Locale.getDefault(Locale.Category.FORMAT));
sdf.setTimeZone(TimeZone.getTimeZone("GMT"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard code for timezone, please reference to org.apache.kylin.common.KylinConfigBase#getTimeZone

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b834587586ad4ff0235117f3887cbb5

String timeFormat = sdf.format(new Date(time));
time = sdf.parse(timeFormat).getTime();
}
} catch (Exception e) {
logger.warn("format time error", e);
}
return time;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -366,7 +366,9 @@ class FilePruner(cubeInstance: CubeInstance,
val pruned = segDirs.filter {
e => {
val tsRange = cubeInstance.getSegment(e.segmentName, SegmentStatusEnum.READY).getTSRange
SegFilters(tsRange.startValue, tsRange.endValue, pattern)
// tsRange: 20221219000000_20221219010000、20221219010000_20221219020000, pattern: yyyy-MM-dd
val start = DateFormat.getFormatTimeStamp(tsRange.startValue, pattern)
SegFilters(start, tsRange.endValue, pattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about tsRange.endValue? is it same as start?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsRange.endValue 不需要考虑,DateFormat.getFormatTimeStamp(tsRange.startValue, pattern) 这个的作用是将 startValue 格式化成天的时间戳值(忽略小时),原因是 SegFilters 的 foldFilter 中是按天格式化where 分区字段值的(记作ts),后续比较也是 “ts >= start && ts < end” end 如果是到时间其对应的时间戳一定是大于天的时间戳值因而可以不用处理。

2e8dbc69f57fa37e8734d713d2317f2
3b63ed8ab495cedb6de7628b9c27d0e
f2fa18fd622f6a54a4eeaa3f8501557

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for clearance, please comment on the place of tsRange.endValue then help others to understand.

.foldFilter(reducedFilter) match {
case AlwaysTrue => true
case AlwaysFalse => false
Expand Down