Apache CarbonData 1.6.1 Release - Carbon

Apache CarbonData 1.6.1 Release - CarbonData - Apache Software Foundation
DUE TO SPAM, SIGN-UP IS DISABLED. Goto
Selfserve wiki signup
and request an account.
CarbonData
Pages
Blog
Page tree
Browse pages
tachments (0)
Page History
Resolved comments
Page Information
View in Hierarchy
View Source
Export to PDF
Export to Word
Copy Page Tree
Jira links
Apache CarbonData 1.6.1 Release
Created by
Raghunandan S
, last modified by
Liang Chen
on
Mar 19, 2022
Apache CarbonData community is pleased to announce the release of the Version 1.6.1 in The Apache Software Foundation (ASF).
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenarios, it supports queries on a single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
We encourage you to use the release
, and
feedback through the CarbonData user mailing lists
This release note provides information on the new features, improvements, and bug fixes of this release.
What’s New in CarbonData Version 1.6.1?
CarbonData 1.6.1 intention was to move closer to unified analytics and improve the stability. In this version of CarbonData, around 40 JIRA tickets related to improvements, and bugs have been resolved. Following are the summary.
Index Server performance improvements for Full Scan and TPCH Queries
Carbon currently prunes and caches all block/blocklet datamap index information into the driver. If the cache size becomes huge(70-80% of the driver memory) then there can be excessive GC in the driver which can slow down the queries and the driver may even go OutOfMemory. Moving out the indexes to separate JDBCServer reduced the overhead on the primary JDBCServer, but introduced delay in fetching the bulk pruning blocks list from the Index server. This is improved in this release and performance is same as running without Index Server.
Behaviour Change
None
Please find the detailed JIRA list:
Sub-task
CARBONDATA-3454
] - Optimize the performance of select count(*) for index server
CARBONDATA-3462
] - Add usage and deployment document for index server
Bug
CARBONDATA-3452
] - select query failure when substring on dictionary column with join
CARBONDATA-3474
] - Fix validate mvQuery having filter expression and correct error message
CARBONDATA-3476
] - Read time and scan time stats shown wrong in executor log for filter query
CARBONDATA-3477
] - Throw out exception when use sql: 'update table select\n...'
CARBONDATA-3478
] - Fix ArrayIndexOutOfBoundsException issue on compaction after alter rename operation
CARBONDATA-3480
] - Remove Modified MDT and make relation refresh only when schema file is modified.
CARBONDATA-3481
] - Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning
CARBONDATA-3482
] - Null pointer exception when concurrent select queries are executed from different beeline terminals.
CARBONDATA-3483
] - Can not run horizontal compaction when execute update sql
CARBONDATA-3485
] - data loading is failed from S3 to hdfs table having ~2K carbonfiles
CARBONDATA-3486
] - Serialization/ deserialization issue with Datatype
CARBONDATA-3487
] - wrong Input metrics (size/record) displayed in spark UI during insert into
CARBONDATA-3490
] - Concurrent data load failure with carbondata FileNotFound exception
CARBONDATA-3493
] - Carbon query fails when enable.query.statistics is true in specific scenario.
CARBONDATA-3494
] - Nullpointer exception in case of drop table
CARBONDATA-3495
] - Insert into Complex data type of Binary fails with Carbon & SparkFileFormat
CARBONDATA-3499
] - Fix insert failure with customFileProvider
CARBONDATA-3502
] - Select query fails with UDF having Match expression inside IN expression
CARBONDATA-3505
] - Fixed drop database cascade issue when 2 database point to same location.
CARBONDATA-3506
] - Alter table add, drop, rename and datatype change fails with hive compatile property
CARBONDATA-3507
] - Create Table As Select Fails in Spark-2.3
CARBONDATA-3508
] - Select query fails when the cg datamap is dropped concurrently while running the select query on filter column on which datamap is created
CARBONDATA-3513
] - can not run major compaction when using hive partition table
CARBONDATA-3520
] - CTAS should fail if select query contains duplicate columns
CARBONDATA-3526
] - Cache issue and select query failure with multiple updates
CARBONDATA-3527
] - Throw 'String length cannot exceed 32000 characters' exception when load data with 'GLOBAL_SORT' from csv which include big complex type data
Improvement
CARBONDATA-3488
] - Check the file size after move local file to carbon path
CARBONDATA-3489
] - Optimizing the performance of sorting
CARBONDATA-3491
] - Return updated/deleted rows count when execute update/delete sql
CARBONDATA-3501
] - Support to execute update sql on table with long_string field (Not update long_string field)
CARBONDATA-3511
] - Query time improvement by reducing the number of NameNode calls while having carbonindex files in the store
CARBONDATA-3515
] - Limit local dictionary size to 10% of allowed blocklet size
CARBONDATA-3523
] - Should store file size into index file
CARBONDATA-3524
] - support compaction by GLOBAL_SORT
CARBONDATA-3528
] - refactor java checkstyle rules
CARBONDATA-3540
] - Delete all external segments when dropping table
CARBONDATA-3544
] - CLI should support a option to show statistics for all columns
No labels
Overview
Content Tools
Atlassian Confluence Open Source Project License
granted to Apache Software Foundation.
Evaluate Confluence today
Atlassian Confluence
8.5.31
Printed by Atlassian Confluence 8.5.31
Report a bug
Atlassian News
Atlassian
{"serverDuration": 96, "requestCorrelationId": "0ca755a44ab0bf6f"}