Pentaho Premium Platform Pentaho Data Integration - subscription license (1 month) - 16 cores

$12,578.99

Mfg # PREM-PDI-S1M-16.P CDW # 5407660 | UNSPSC 43232806

Pentaho Premium Platform Pentaho Data Integration - subscription license (1 month) - 16 cores

Software Details

Subscription license (1 month)

16 cores

View Tech Specs

Know your gear

With Pentaho from Hitachi Vantara, managing the enormous volumes and increased variety and velocity of data entering organizations is simplified. Pentaho Data Integration (PDI) delivers analytics-ready data to end users faster with visual tools that reduce time and complexity. Without writing SQL or coding in Java or Python, organizations immediately gain real value from their data, from sources like files, relational databases, Hadoop and more, that are in the cloud or on-premises.

$12,578.99

Lease Pricing Available

Info

Reviews

Availability: In Stock

Add to Compare

Save to Favorites

Enhance your purchase

Show More Information (if available)

Storage Networking Top Rated Products

Microsoft Office SharePoint Server - license & software assurance - 1 serve

(5577)

$544.99

Advertised Price
Microsoft Office SharePoint Server - license & software assurance - 1 serve

(4956)

$544.99

Advertised Price
Microsoft Office SharePoint Server - license & software assurance - 1 serve

(5228)

$479.99

Advertised Price
Microsoft SharePoint Server Standard CAL - license & software assurance - 1

(7934)

$8.99

Advertised Price
Microsoft Office SharePoint Server Enterprise CAL - license & software assu

(4956)

$10.99

Advertised Price
OneDrive for Business - subscription license (1 month) - 1 user - with Offi

$59.99

Advertised Price
Microsoft SharePoint Server - license & software assurance - 1 server

(8517)

$12,209.99

Advertised Price
Microsoft SharePoint Server - license & software assurance - 1 user CAL

(8517)

$219.99

Advertised Price
Microsoft SharePoint Server - software assurance

(8517)

$1,744.99

Advertised Price
Microsoft SharePoint Server - software assurance

(8517)

$35.99

Advertised Price

Reviews

4.1/5

32 Total

Reviews by Ratings

5 Stars 21%

4 Stars 75%

3 Stars 0%

2 Stars 3%

1 Stars 0%

1-8 of 32 reviews

Clear All Filters

Written by a user while visiting PeerSpot

Has drag-and-drop functionality and good integration while being easy to use

Jefferson Hernandez | December 30, 2024

What is our primary use case? I use Pentaho Data Integration for data integration and ETL processes. I developed with Pentaho from CoproSema. I work on machine learning projects using Pentaho in different projects, such as forecasting for clients who have not paid their credit. What is most valuable? I find the drag and drop feature in Pentaho Data Integration very useful for integration. I can use JavaScript and Java in some notes for ETL development. It's easy to use and friendly, especially for larger data sets. I use Pentaho for ETLs while relying on other tools like Power BI for data visualization and Microsoft Fabric for other tasks. What needs improvement? While Pentaho Data Integration is very friendly, it is not very useful when there isn't a lot of data to handle. Communicating with the vendor is challenging, and this hinders its performance in free tool setups. What do I think about the stability of the solution? It's pretty stable, however, it struggles when dealing with smaller amounts of data. What do I think about the scalability of the solution? Pentaho Data Integration handles larger datasets better. It's not very useful for smaller datasets. How are customer service and support? Communication with the vendor is challenging, which makes customer service less satisfactory despite being a free tool. How would you rate customer service and support? Neutral Which solution did I use previously and why did I switch? I use Pentaho for data integration, however, for machine learning and business intelligence, I rely on other tools such as Power BI and Microsoft Fabric. How was the initial setup? The initial setup of Pentaho is easy and straightforward. What about the implementation team? Deploying Pentaho usually requires around two people, possibly with roles such as server administrator or technical lead. Which other solutions did I evaluate? I use Power BI for business intelligence, Microsoft Fabric for other tasks, and AWS Glue for data processing in other projects. I do not have experience with Azure Data Box. What other advice do I have? On a scale of one to ten, I would rate Pentaho Data Integration around an eight. Disclaimer: I am a real user, and this review is based on my own experience and opinions.

Written by a user while visiting PeerSpot

Transform data efficiently with rich features but there's challenges with large datasets

Aqeel UR Rehman | November 28, 2024

What is our primary use case? Currently, I am using Pentaho Data Integration for transforming data and then loading it into different platforms. Sometimes, I use it in conjunction with AWS, particularly S3 and Redshift, to execute the copy command for data processing. What is most valuable? Pentaho Data Integration is easy to use, especially when transforming data. I can find the necessary steps for any required transformation, and it is very efficient for pivoting, such as transforming rows into columns. It is also free of cost and rich in available transformations, allowing extensive data manipulations. What needs improvement? I experience difficulties when handling millions of rows, as the data movement from one source to another becomes challenging. The processing speed slows down significantly, especially when using a table output for Redshift. The availability of Python code integration as an inbuilt function would be beneficial. For how long have I used the solution? I have been using Pentaho Data Integration since 2018. What do I think about the stability of the solution? I would rate the stability of Pentaho Data Integration as eight out of ten. What do I think about the scalability of the solution? Pentaho Data Integration has a scalability rating around 8.5 out of ten, as noted from our experience. How are customer service and support? I have contacted customer support once or twice, however, did not receive a response. Therefore, I have not had much interaction with the support team, and their assistance does not seem frequent. How would you rate customer service and support? Neutral Which solution did I use previously and why did I switch? Pentaho Data Integration's main competitor is Talend. Many companies are moving towards cloud-based ETL solutions. How was the initial setup? The initial setup is simple. It involves downloading the tool, installing necessary libraries, like the JDBC library for your databases, and then creating a connection to start working. What's my experience with pricing, setup cost, and licensing? Pentaho Data Integration is low-priced, especially since it is free of cost. What other advice do I have? I rate Pentaho Data Integration seven out of ten. I definitely recommend it for small to medium organizations, especially if you are looking for a cost-effective product. Disclaimer: I am a real user, and this review is based on my own experience and opinions.

Written by a user while visiting PeerSpot

Efficient data integration with cost savings but may be less efficient

MARIA PILAR CANDA | September 23, 2024

What is our primary use case? I have a team who has experience with integration. We are service providers and partners. Generally, clients buy the product directly from the company. How has it helped my organization? It is easy to use, install, and start working with. This is one of the advantages compared to other key vaulting products. The relationship between price and functionality is excellent, resulting in time and money savings of between twenty-five and thirty percent. What is most valuable? One of the advantages is that it is easy to use, install, and start working with. For certain volumes of data, the solution is very efficient. What needs improvement? Pentaho may be less efficient for large volumes of data compared to other solutions like Talend or Informatica. Larger data jobs take more time to execute. Pentaho is more appropriate for jobs with smaller volumes of data. For how long have I used the solution? I have used the solution for more than ten years. What do I think about the stability of the solution? The solution is stable. Generally, one person can manage and maintain it. What do I think about the scalability of the solution? Sometimes, for large volumes of data, a different solution might be more appropriate. Pentaho is suited for smaller volumes of data, while Talend is better for larger volumes. How are customer service and support? Based on my experience, the solution has been reliable. How would you rate customer service and support? Positive Which solution did I use previously and why did I switch? We did a comparison between Talend and Pentaho last year. How was the initial setup? The initial setup is straightforward. It is easy to install and start working with. What about the implementation team? A team with experience in integration manages the implementation. What was our ROI? The relationship between price and functionality is excellent. It results in time and money savings of between twenty-five and thirty percent. What's my experience with pricing, setup cost, and licensing? Pentaho is cheaper than other solutions. The relationship between price and functionality means it provides good value for money. Which other solutions did I evaluate? We evaluated Talend and Pentaho. What other advice do I have? I'd rate the solution seven out of ten. Which deployment model are you using for this solution? On-premises If public cloud, private cloud, or hybrid cloud, which cloud provider do you use? Other Disclaimer: My company has a business relationship with this vendor other than being a customer:MSP

Written by a user while visiting PeerSpot

Loads data into the required tables and can be plug-and-played easily

KrishnaBorusu | July 30, 2024

What is our primary use case? The use cases involve loading the data into the required tables based on the transformations. We do a couple of transformations, and based on the business requirement, we load the data into the required tables. What is most valuable? It's a very lightweight tool. It can be plug-and-played easily and read data from multiple sources. It's a very good tool for small to large companies. People or customers can learn very easily to do the transformations for loading and migrating data. It's a fantastic tool in the open-source community. When compared to other commercial ETL tools, this is a free tool where you can download and do multiple things that the commercial tools are doing. It's a pretty good tool when compared to other commercial tools. It's available in community and enterprise editions. It's very easy to use. What needs improvement? It is difficult to process huge amounts of data. We need to test it end-to-end and conclude how much is the processing of data. If it is an enterprise edition, we can process the data. For how long have I used the solution? I have been using Pentaho Data Integration and Analytics for 11-12 years. What do I think about the stability of the solution? We process a small amount of data, but it's pretty good. What do I think about the scalability of the solution? It's scalable across any machine, How are customer service and support? Support is satisfactory. A few of my colleagues are also there, working with Hitachi to provide solutions whenever a ticket or Jira is raised for them. How would you rate customer service and support? Positive How was the initial setup? Installation is very simple. When you go to the community and enterprise edition, it's damn simple. Even you can install it very easily. One person is enough for the installation What's my experience with pricing, setup cost, and licensing? The product is quite cheap. What other advice do I have? It can quickly implement slowly changing dimensions and efficiently read flat files, loading them into tables quickly. Additionally, "several copies to the stat h enables parallel partitioning. In the Enterprise Edition, you can restart your jobs from where they left off, a valuable feature for ensuring continuity. Detailed metadata integration is also very straightforward, which is an advantage. It is lightweight and can work on various systems. Any technical guy can do everything end to end. Overall, I rate the solution a ten out of ten. Disclaimer: I am a real user, and this review is based on my own experience and opinions.

Written by a user while visiting PeerSpot

Offers features for data integration and migration

Ahad Ahmed | May 29, 2024

What is our primary use case? I have used the solution to gather data from multiple sources, including APIs, databases like Oracle, and web servers. There are a bunch of data providers available who can provide you with datasets to export in JSON format from clouds or APIs. What is most valuable? The solution offers features for data integration and migration. Pentaho Data Integration and Analytics allows the integration of multiple data sources into one. The product is user-friendly and intuitive to use for almost any business. What needs improvement? The solution should provide additional control for the data warehouse and reduce its size, as our organization's clients have expressed concerns regarding it. The vendor can focus on reducing capacity and compensate for it by enhancing product efficiency. For how long have I used the solution? I have been using Pentaho Data Integration and Analytics for a year. How are customer service and support? I have never encountered any issues with Pentaho Data Integration and Analytics. What's my experience with pricing, setup cost, and licensing? I believe the pricing of the solution is more affordable than the competitors. Which other solutions did I evaluate? I have worked with IBM DataStage along with Pentaho Data Integration and Analytics. The found the IBM DataStage interface to seem outdated in comparison to the Pentaho tool. IBM DataStage demands the user to drag and drop the services as well as the pipelines, similar to the process in SSIS platforms. Pentaho Data Integration and Analytics is also easier to comprehend from the first use than IBM DataStage. What other advice do I have? The solution's ETL capabilities make data integration tasks easier and are used to export data from a source to a destination. At my company, I am using IBM data switches and the overall IBM tech stack for compatibility among the integrations, pipelines and user levels. I would absolutely recommend Pentaho Data Integration and Analytics to others. I would rate the solution a seven out of ten. Disclaimer: I am a real user, and this review is based on my own experience and opinions.

Written by a user while visiting PeerSpot

Enterprise Edition pricing and reduced Community Edition functionality are making us look elsewhere

Eric Smets | October 12, 2022

What is our primary use case? We use it for two major purposes. Most of the time it is for ETL of data. And based on the loaded and converted data, we are generating reports out of it. A small part of that, the pivot tables and the like, are also on the web interface, which is the more interactive part. But about 80 percent of our developers' work is on the background processes for running and transforming and changing data. How has it helped my organization? Before, a lot of manual work had to be done, work that isn't done anymore. We have also given additional reports to the end-users and, based upon them, they have to take some action. Based on the feedback of the users, some of the data cleaning tasks that were done manually have been automated. It has also given us a fast response to new data that is introduced into the organization. Using the solution we were able to reduce our ETL deployment time by between 10 and 20 percent. And when it comes to personnel costs, we have gained 10 percent. What is most valuable? The graphical user interface is quite okay. That's the most important feature. In addition, the different types of stores and data formats that can be accessed and transferred are an important component. We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic. It's more about the business logic and less about the programming logic and that's really important. Another important feature is that you can deploy it in any environment, whether it's on-premises or cloud, because you can reuse your steps. When it comes to adding to your data processing capacity dynamically that's key because when you have new workflows you have to test them. When you have to do it on a different environment, like your production environment, it's really important. What needs improvement? I would like to see better support from one version to the next, and all the more so if there are third-party elements that you are using. That's one of the differences between the Community Edition and the Enterprise Edition. In addition to better integration with third-party tools, what we have seen is that some of the tools just break from one version to the next and aren't supported anymore in the Community Edition. What is behind that is not really clear to us, but the result is that we can't migrate, or we have to migrate to other parts. That's the most inconvenient part of the tool. We need to test to see if all our third-party plugins are still available in a new version. That's one of the reasons we decided we would move from the tool to the completely open-source version for the ETL part. That's one of the results of the migration hassle we have had every time. The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is. The Enterprise Edition is okay, and there is a clear path for it. You will not use a lot of external plugins with it because, with every new version, a lot of the most popular plugins are transferred to the Enterprise Edition. But the Community Edition is almost not supported anymore. You shouldn't start in the Community Edition because, really early on, you will have to move to the Enterprise Edition. Before, you could live with and use the Community Edition for a longer time. For how long have I used the solution? I have been working with Hitachi Lumada Data Integration for seven or eight years. What do I think about the stability of the solution? The stability is okay. In the transfer from before it was Hitachi to Hitachi, it was two years of hell, but now it's better. What do I think about the scalability of the solution? At the scale we are using it, the solution is sufficient. The scalability is good, but we don't have that big of a data set. We have a couple of billion data records involved in the integration. We have it in one location across different departments with an outside disaster recovery location. It's on a cluster of VMs and running on Linux. The backend data store is PostgreSQL. Maybe our design wasn't quite optimal for reloading the billions of records every night, but that's probably not due to the product but to the migration. The migration should have been done in a bit of a different way. How are customer service and support? I had contact with their commercial side and with the technical side for the setup and demos, but not after we implemented it. That is due to the fact that the documentation and the external consultant gave us a lot of information about it. Which solution did I use previously and why did I switch? We came from the Microsoft environment to Hitachi, but that was 10 years back. We switched due to the licensing costs and because there wasn't really good support for the PostgreSQL database. Now, I think the Microsoft environment isn't that bad, and there is also better support for open-source databases. How was the initial setup? I was involved in the initial migration from Microsoft to Hitachi. It was rather straightforward, not too complex. Granted, it was a new toolset, but that is the same with every new toolset. The learning curve wasn't too steep. The maintenance effort is not significant. From time to time we have an error that just pops up without our having any idea where it comes from. And then, the next day, it's gone. We get that error something like three times a year. Nobody cares about it or is looking into the details of it. The migrations from one version to the next that we did were all rather simple. During that process, users don't have it available for a day, but they can live with that. The migration was done over a weekend and by the following Monday, everything was up and running again. What about the implementation team? We had some external help from someone who knows the product and had already had some experience with implementing the tool. What was our ROI? In terms of ROI, over the years it was a good step to make the move to Hitachi. Now, I don't think it would be. Now, it would be a different story. What's my experience with pricing, setup cost, and licensing? We are using the Community Edition. We have been trying to use and sell the Enterprise version, but that hasn't been possible due to the budget required for it. Which other solutions did I evaluate? When we made the choice, it was between Microsoft, Hitachi, and Cognos. The deciding factor in going with Hitachi was its better support for open-source databases and data stores. Also, the functionality of the Community version was what was needed by most of our customers. What other advice do I have? Our experience with the query performance of Lumada on large data sets is that Lumada is not what determines performance. Most of the time, the performance comes from the database or the data store underneath Lumada. Depending on how big your data set is, you have to change or optimize your data store and then you can work with large data sets. The fine-tuning of the database that is done outside of Lumada is okay because a tool can't provide every insight into every type of data store or dataset. If you are looking into optimization, you have to use your data store optimization tools. Hitachi isn't designed for that, and we were not expecting to have that. I'm not really that impressed with Hitachi's ability to quickly and effectively solve issues we have brought up, but it's not that bad either. It's halfway, not that good and not that bad. Overall, our Hitachi solution was quite good, but over the last couple of years, we have been trying to move away from the product due to a number of things. One of them is the price. It's really expensive. And the other is that more and more of what used to be part of the Community Edition functionality is moving to the Enterprise Edition. The latter is okay and its functions are okay, but then we are back to the price. Some of our customers don't have the deeper pockets that Hitachi is aiming for. Before, it was more likely that I would recommend Hitachi Ventara to a colleague. But now, if you are starting in an environment, you should move to other solutions. If you have the money for the Enterprise Edition, then I would say my likelihood of recommending it, on a scale of one to 10, would be a seven. Otherwise, it would be a one out of 10. If you are going with Hitachi, go for the Enterprise version or stay away from Hitachi. It's also really important to think in great detail about your loading process at the start. Make sure that is designed correctly. That's not directly related to the tool itself, but it's more about using the tool and how the loads are transferred. Which deployment model are you using for this solution? On-premises Disclaimer: I am a real user, and this review is based on my own experience and opinions.

Written by a user while visiting PeerSpot

An affordable solution that makes it simple to do some fairly complicated things, but it could be improved in terms of consistency of different transformation steps

Tobias Johnson | August 7, 2022

What is our primary use case? Our primary use case is to populate a data warehouse and data marts, but we also use it for all kinds of data integration scenarios and file movement. It is almost like middleware between different enterprise solutions. We take files from our legacy app system, do some work on them, and then call SAP BAPIs, for example. It is deployed on-premises. It gives you the flexibility to deploy it in any environment, whether on-premises or in the cloud, but this flexibility is not that important to us. We could deploy it on the cloud by spinning up a new server in AWS or Azure, but as a manufacturing facility, it is not important to us. Our customer preference is primarily to deploy things on-premises. We usually stay one version behind the latest one. We're a manufacturing facility. So, we're very sensitive to any bugs or issues. We don't do automatic upgrades. They're a fairly manual process. How has it helped my organization? We've had it for a long time. So, we've realized a lot of the improvements that anybody would realize from almost any data integration product. The speed of developing solutions has been the best improvement. It has reduced the development time and improved the speed of getting solutions deployed. The reduced ETL development time varies by the size and complexity of the project. We probably spend days or weeks less than then if we were using a different tool. It is tremendously flexible in terms of adding custom code by using a variety of different languages if you want to, but we had relatively few scenarios where we needed it. We do very little custom coding. Because of the tool we're using, it is not critical. We have developed thousands of transformations and jobs in the tool. What is most valuable? It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there. Its performance is a pretty close second. It is a pretty highly performant system. Its query performance on large data sets is very good. What needs improvement? Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step. For how long have I used the solution? We have been using this solution for more than 10 years. What do I think about the stability of the solution? Its stability is very good. What do I think about the scalability of the solution? Its scalability is very good. We've been running it for a long time, and we've got dozens, if not hundreds, of jobs running a day. We probably have 200 or 300 people using it across all areas of the business. We have people in production control, finance, and what we call materials management. We have people in manufacturing, procurement, and of course, IT. It is very widely and extensively used. We're increasing its usage all the time. How are customer service and support? They are very good at quickly and effectively solving the issues we have brought up. Their support is well structured. They're very responsive. Because we're very experienced in it, when we come to them with a problem, it is usually something very obscure and not necessarily easy to solve. We've had cases where when we were troubleshooting issues, they applied just a remarkable amount of time and effort to troubleshoot them. Support seems to have very good access to development and product management as a tier-two. So, it is pretty good. I would give their technical support an eight out of ten. How would you rate customer service and support? Positive Which solution did I use previously and why did I switch? We didn't have another data integration product before Pentaho. How was the initial setup? I installed it. It was straightforward. It took about a day and a half to get the production environment up and running. That was probably because I was e-learning as I was going. With a services engagement, I bet you would have everything up in a day. What about the implementation team? We used Pentaho services for two days. Our experience was very good. We worked with Andy Grohe. I don't know if he is still there or not, but he was excellent. What was our ROI? We have absolutely seen an ROI, but I don't have the metrics. There are analytic cases that we just weren't able to do before. Due to the relatively low cost compared to some of the other solutions out there, it has been a no-brainer. What's my experience with pricing, setup cost, and licensing? We did a two or three-year deal the last time we did it. As compared to other solutions, at least so far in our experience, it has been very affordable. The licensing is by component. So, you need to make sure you only license the components that you really intend to use. I am not sure if we have relicensed after the Hitachi acquisition, but previously, multi-year renewals resulted in a good discount. I'm not sure if this is still the case. We've had the full suite for a lot of years, and there is just the initial cost. I am not aware of any additional costs. What other advice do I have? If you haven't used it before, it is worth engaging services with Pentaho for initial implementation. They'll just point out a number of small foibles related to perhaps case sensitivity. They'll just save you a lot of runs through the documentation to identify different configuration points that might be relevant to you. I would highly recommend the Data Integration product, particularly for anyone with a Java background. Most of our BI developers at this point do not have a Java background, which isn't really that important. Particularly, if you're a Java business and you're looking for extensibility, the whole solution is built in Java, which just makes certain aspects of it a little more intuitive at first. On the data integration side, it is really a good tool. A lot of investment dollars go into big data and new tech, and often, those are not very compelling for us. We're in an environment where we have medium data, not big data. It provides a single end-to-end data management experience from ingestion to insights, but at this point, that's not critical to us. We mostly do the data integration work in Pentaho, and then we do the visualization in another tool. The single data management experience hasn't enabled us to discontinue the use of other data management analysis delivery tools just because we didn't really have them. We take an existing job or transformation and use that as a test. It is certainly easy enough to copy one object to another. I am not aware of a specific templating capability, but we are not really missing anything there. It is very easy for us to clone a job or transformation just by doing a Save As, and we do that extensively. Vantara's roadmap is a little fuzzy for me. There has been quite a bit of turnover in the customer-facing roles over the last five years. We understand that there is a roadmap to move to a pure web-based solution, but it hasn't been well communicated to us. In terms of our decision to purchase Hitachi's product services or solutions, our satisfaction level is average or on balance. I would rate this solution a seven out of ten. Which deployment model are you using for this solution? On-premises Disclaimer: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.

Written by a user while visiting PeerSpot

Good abstraction and useful drag-and-drop functionality but can't handle very large data amounts

Ridwan Saeful Rohman | August 1, 2022

What is our primary use case? I still use this tool on a daily basis. Comparing it to my experience with other EPL tools, the system that I created for the solution was quite simple. It is just as simple as extracting the data from MySQL, exporting it on the CSV, and then putting it on the S3 for the sales button. It is as simple as extracting the data from the MySQL Center and exporting it to the ASB. We still use this solution due to the fact that there are a lot of old systems that still use it. The new solution that we use is mostly Airflow. We are still in the transition phase. To be clear, Airflow is a data orchestration tool that mainly uses Python. Everything from the ETL, all the way to the scheduling and the monitoring of any issues. It's in one system and entirely on Airflow. How has it helped my organization? In my current company, it does not have a major impact. It's not really that useful. We use it for simple ETLs only. In terms of setting the ETL tools that we out on the panel, it's quite useful. However, this kind of functionality that we currently put on the solution can be easily switched by other tools that exist on the market. It's time to change it entirely to Airflow. We'll likely change in the next six months. What is most valuable? This solution offers tools that are drag and drop. The script is quite minimal. Even if you do not come from IT or your background is not in software engineering, it is possible. It is quite intuitive. You can drag and drop S3 functions. The abstraction is quite good. Also, if you're familiar with the product itself, they have transformational abstractions and job abstractions. Here, we can create a smaller transformation in the Kettle transformation, and then the bigger ones on the Kettle job. For someone who has familiarity with Python or someone who has no scripting background at all, the product is useful. For larger data, we are using Spark. The solution enables us to create pipelines with minimal manual, or custom coding efforts. Even if you have no advanced experience in scripting, it is possible to create ETL tools. I have a recent graduate coming from a management major who has no experience with SQL. I trained him for three months, and within that time he became quite fluent, with no prior experience using ETL tools. Whether or not it's important to handle the creation of pipelines with minimal coding depends on the team. If I change the solution to Airflow, then I will need more time to teach them to become fluent in the ETL tool. By using these kinds of abstractions in the product, I can compress it to just three months. With Airflow, it will take longer than six months to get new users to the same point. We use the solutions' ability to develop and deploy data pipeline templates and reuse them. The old system was created by someone prior to me in my organization and we still use it. It was developed by him a long time ago. We also use the solution for some ad hoc reporting. The ability to develop and deploy data pipeline templates once and reuse them is really important to us. There are some requests to create the pipelines. I create them and then deploy them on our server. It then has to be as robust as when we do the scheduling so that it does not fail. We like the automation. I cannot imagine how the data teams will work if everything was done on an ad hoc basis. Everything should be automated. Using my organization as an example, I can with confidence say that 95% of our data distributions are automated and only 5% ad hoc. With this solution, we query the data manually. We process the data on the spreadsheets manually and then distribute it to the organization. It’s important to be robust and be able to automate. So far, we can deploy the solution easily on the cloud, which is on AWS. I haven't really tried it on another server. We deploy it on our EC2, on the Uber2, however, we develop it on our local computer, which consists of people who use Windows. There are some people who also use MacBooks. I personally have used it on Bot. I have to develop Bot on Windows and MacBook. I can say that Windows is easier to navigate. On the MacBook, if you don't have any familiarity with anything, it will be quite difficult. The solution did reduce our ETL development time if you compare it to the scripting. However, this will really depend on your experience. What needs improvement? Five years ago, I was confident that I would use this product more than Airflow, as it will be easier for me with the abstraction being quite intuitive. Five years ago, I would choose the product over the other tools using pure scripting as it would reduce most of my time in terms of developing ETL tools. This isn't the case anymore. When I first joined my organization, I was still using Windows and it is quite a step forward to develop the ETL system on it. However, when I changed my laptop to MacBook, it was quite a hassle. When we tried to open the application, we had to open the terminal first, go to the solution's directory, and then run the executable file. Therefore, if you develop it on MacBook, it'll be quite a hassle, however, when you develop it on Windows, it's not really different from other ETL tools on the market, like the SQL Server Integration Services, Informatica, et cetera. For how long have I used the solution? I have been using this tool since I moved to my current company, which is about one year ago. What do I think about the stability of the solution? The performance is good. I have not done a test on the bleeding edge of the product. We only do simple jobs. In terms of data, we extract it and then exported it from MySQL to the CSV. There were only millions of data points, not billions of data points. So far, it has met our expectations. It's quite good for a smaller number of data points. What do I think about the scalability of the solution? I'm not sure that the product could keep up with the data growth. It can be useful for millions of data points. However, for billions of data points, I cannot really trust my system with the solution. There are better solutions that are on the market. For example, Apache Spark. It's also applied to the other drag-and-drop ETL tools. SQL Server Integration Service or Informatica would be other options too. How are customer service and support? We don't really use technical support. The current session that we are using is no longer supported by their representatives. We didn't update it yet to the newer session. Which solution did I use previously and why did I switch? We're moving to Airflow. The reason for the switch was mostly due to the problem when we are debugging. If you're familiar with the SQLs for integration services, the ETL tools from Microsoft and the debugging function are quite intuitive. You can exactly spot which transformation has failed or which transformation has an error. However, in the solution, from what my colleagues told me, it is hard to do that. When there is an error, we cannot directly spot where the error is coming from. Airflow is quite customized and it's not as rigid as this product. We can deploy the simple ETL tools all the way to the machine learning systems on Airflow. Airflow mainly uses Python, which our team is quite familiar with. This solution is still handled by only two people out of 27 people on our team. Not enough people know it. How was the initial setup? There are no separations between the deployment and other teams. Each of our teams acts like an individual contributor. We handle the implementation process all the way from face-to-face business meetings, setting timelines, developing the tools, and defining the requirements, to the production deployment. The initial setup is straightforward. Currently, the first set of control in our organization is quite loose. We are not using any faster control. The way we deploy it is just as simple as putting the Kettle transformation file into our EC2 server and rewriting the old file. What's my experience with pricing, setup cost, and licensing? I'm not really sure what the price for the product is. I don't handle that aspect of the solution. What other advice do I have? We put it on our Uber2/EC2 server, however, when we developed it, it was put on our local server. We deploy it onto our EC2 server. We bundle it on our sales scripts and then the sales script will are run by Jenkins. I'd rate the solution a seven out of ten. Disclaimer: I am a real user, and this review is based on my own experience and opinions.

Load 8 More Reviews

To Top