Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hudi表通过spark sql添加字段后,用Hive Sql查询原有数据发生列错位问题 #12771

Open
liucongjy opened this issue Feb 5, 2025 · 2 comments
Labels
hive Issues related to hive schema-evolution

Comments

@liucongjy
Copy link

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

  1. create table dll
    CREATE TABLE zdyj_ehr.ehr_etbj_csyxzm (
    wdsyh STRING ,
    hzjcsyh STRING ,
    jgdm STRING ,
    jgmc STRING ,
    csyxzmbh STRING ,
    xsexm STRING ,
    jlsj TIMESTAMP)
    using hudi location '/user/hive/warehouse/ws.db/ehr_etbj_csyxzm' TBLPROPERTIES(type='cow' ,primaryKey='wdsyh','hoodie.datasource.write.recordkey.field'='wdsyh' ,preCombineField='gxsj' ,'hoodie.table.partition.fields'='jlsj',
    'hoodie.table.keygenerator.class' = 'org.apache.hudi.keygen.TimestampBasedKeyGenerator',
    'hoodie.datasource.write.keygenerator.class' = 'org.apache.hudi.keygen.TimestampBasedKeyGenerator',
    'hoodie.keygen.timebased.timestamp.type' = 'DATE_STRING',
    'hoodie.keygen.timebased.timezone' = 'GMT+8:00',
    'hoodie.keygen.timebased.input.dateformat' = 'yyyy-MM-dd hh:mm:ss',
    'hoodie.keygen.timebased.output.dateformat' = 'yyyyMM',
    'hoodie.schema.on.read.enable' = 'true',
    'hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled' = 'true')
    partitioned by(jlsj);

2.using spark sql add data to ehr_etbj_csyxzm
insert into ehr_etbj_csyxzm values('t001','h001','j01','zhongyiyuan','ttt','tom',current_timestamp());

3.query data using hive shell
select * from ehr_etbj_csyxzm
jlsj field value is '2018-08-14 00:00:00'

4.using spark sql alter table add new column
alter table ehr_etbj_csyxzm add columns(ext1 string comment '扩展字段1');

5.query data using hive shell
select * from ehr_etbj_csyxzm
新加字段ext1 值为'2018-08-14 00:00:00',原有字段jlsj的值为空
正确结果应该是 字段ext1的值为空,jlsj值为'2018-08-14 00:00:00'

Image

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.15

  • Spark version : 3.3.0

  • Hive version : 2.3.9

  • Hadoop version : 3.0

  • Storage (HDFS/S3/GCS..) :hdfs

  • Running on Docker? (yes/no) :no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

@danny0405 danny0405 added hive Issues related to hive schema-evolution labels Feb 5, 2025
@cshuo
Copy link
Contributor

cshuo commented Feb 5, 2025

@liucongjy Hi, what about the result of other queries besides select *, such as select wdsyh, ext1, jlsj from zdyj_ehr.ehr_etbj_csyxzm

@rangareddy
Copy link

Hi @liucongjy,

After altering the table using Spark, you may not be able to query the Hive table data, and you might encounter a following exception.

hive> SELECT * FROM ehr_etbj_csyxzm;
OK
Failed with exception java.io.IOException:org.apache.hudi.exception.HoodieException: Field ext1 not found in log schema. Query cannot proceed! Derived Schema Fields: [_hoodie_commit_time, _hoodie_partition_path, jgmc, _hoodie_record_key, jgdm, csyxzmbh, jlsj, wdsyh, _hoodie_commit_seqno, hzjcsyh, _hoodie_file_name, xsexm]
Time taken: 23.787 seconds

The above issue is fixed in Hudi 1.0.1 and upstream jira is HUDI-8880

@github-project-automation github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Feb 7, 2025
@ad1happy2go ad1happy2go moved this from ⏳ Awaiting Triage to 🏁 Triaged in Hudi Issue Support Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hive Issues related to hive schema-evolution
Projects
Status: 🏁 Triaged
Development

No branches or pull requests

4 participants